Description: This research investigates using the web browser as a platform for distributed computing. This is made possible by various, ubiquitous client-side scripting and applications such as Javascript and Flash. The research will investigate the feasibility and implemtation of such a system for a large prime number distributed algorithm.
Motivation: Grid computing (a subset of distributed computing) has become more prevalent and feasible because of the permeation of the internet for large distributed computing projects. A browser-based distributed model seems like a feasible choice, considering the massive amount of "untapped" computing power available. With "Web 2.0" applications quickly becoming the norm, and client-side scripting ability ubiquitous for any modern browser, a distributed platform is readily available with in-browser technologies such as AJAX (Asynchronous Javascript), Json (Javascript Object notation), XML, and even Flash and Silverlight. However, distributed algorithms themselves establish a difficulty. Therefore, most distributed projects only deal with easily parallelizable algorithms, an elegant example being Google's "MapReduce".
References:
Pierrick Gaudry, Alexander Kruppa, Paul Zimmermann. A gmp-based implementation of schönhage-strassen's large integer multiplication algorithm. In Proceedings of the International Conference on Symbolic and Algebraic Computation.
Analyzes a well-known algorithm for multiplying large integers. Very useful algorithm for the Mersenne Prime number test, the Lucas-Lehmer test. Efficiency is needed in multiplying very large integers, using the FFT (Fourier Fast Transform) method in addition to the Strassen matrix multiplication method. It also contains optimization tips for calculating a large integer modulo 2^n+1. These are all optimizations to the algorithms present in GIMPS (Great Internet Mersenne Prime Search) distributed computing project.
University of California, San Diego. MapReduce. CSE 123 Undergraduate course in distributed systems.
These lectures contain information about applications and theory of distributed computing and parallel computing problems. MapReduce is a framework for distributed problem solving and data processing. Parallelization involved dividing the problem up (mapping them to worker nodes), then reducing the problem in paralle, and lastly putting the result back togehter again. Applications, test environment, and experiments are available for distributed grep, distributed sort, distributed dijkstra graph sort, and inverted index. All of these were done with the MapReduce API framework, written in C++, and available from Google.
Great Internet Mersenne Prime Search PrimeNet source code. Mersenne.org, PrimeNet
Open source software with optimized implementations for distributed prime number generation and checking. While the mathematical implemtations involving FFTs (Fast Fourier Transforms) are complex, the head node (server) typically sends out lists of mersenne prime candidates (p values) to worker nodes, who compile and test these individually. The chances of finding a large prime number are 1 in 60,000. The latest calculation to test Mersenne number 41 took 29 days on an Intel Pentium 4 3.0 GHz processor.
CPE 458-Parallel Programming. California Polytechnic State University
Introduction to parallel frameworks, specifically Google's Map-Reduce paradigm. "We are providing a simple, one-week module that can be used to introduce MapReduce in a wide variety of courses. The first lecture provides an introduction to parallel programming and sets the stage for the second lecture which specifically describes MapReduce. To reinforce the lectures, we include a laboratory exercise (hadoopcodelab.pdf) complete with supporting java code (AverageValueReducer.java, HadoopDriver.java, IntArrayWritable.java, UserRatingMapper.java). This lab has the student perform various queries on the Netflix dataset."
Gabriel Paillard. A Fully Distrubuted Prime Numbers Generation Using the Wheel Sieve. Laboratoire d'Informatique de l'Universite Paris-Nord.
The sieve method is the quickest, and one of the oldest ways to search for prime numbers. The sieve of Eratosthenes was developed in 200 BC. The technique involves using an array representing every base-10 number (usually a bit-array to save space). Values can be narrowed-down by crossing out every multiple of 2 (e.g. every second row), 3, 5, 7, and so on... until a bit array representing only prime numbers is left. The wheel sieve differs only in the fact that instead of being represented by a bit array, it generates the next prime based on the factors of the previously-sieved elements. There is an upper-limit in memory with the sieve technique, because it has to store a large bit array in memory. This can be overcome with paging, however, for large (non Mersenne) primes this is still a barrier. The distributed wheel sieve algorithm involves electing new nodes to generate discrete sets of prime numbers. It is built upon the lam-mpi distributed framework.
Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, Andrew S. A Comparison of Two Distributed Systems: Amoeba and Sprite. ACM Transactions on Computer Systems, Vol 4, Fall 1991
A cornerstone paper on distributed systems. Two methods of distributed systems are compared. Sprite is a whole disributed system that "hides" the distribution. E.g. distributed processes and file system. The operating system is distributed, but not the individual processes. Inter-process communication is distributed upon the file system. Amoeba, on the other hand, is a distributed framework for programming parallel and concurrent applications. They are built on the Unix platform, and are processors and computers connected by a high speed network. They differ on philosophical grounds. Sprite takes a more traditional Unix approach, by allowing processes to be distributed, while Amoeba allows for concurrent solutions that use massive amounts of distributed processing power. Each system is faster in regards to specific applications. These two divergent philosophies are good in their respective measures. For a "load balancing" situation, Sprite is ideal. For a massively concurrent processing solution, involving shared memory and a "processor poop", Amoeba wins.
Richard Fujimoto, Michael Hunter, Jason Sirichoke, Mahesh Palekar, Hoe Kim, Wonhu Sun. Ad Hoc Distributed Simluations. 21st International Workshop on Principles of Advanced and Distributed Simulation (PADS'07), pp. 15-24, 2007.
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, Daniel Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. Annual ACM symposium on Theory of Computing, pp. 654-663, 1997.
Perfectly Scientific, The Algorithm Company. Free Software
Algorithm pseudocode for many complex, fully optimized algorithms. Some are implemented directly in the GIMPS project.