Parallel computing
From Wikipedia, the free encyclopedia
Parallel computing is the simultaneous execution of the same task (split up and specially adapted) on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination.
Contents |
[edit] Parallel computing systems
A parallel computing system is a computer with more than one processor for parallel processing. In the past, each processor of a multiprocessing system always came in its own processor packaging, but recently-introduced multicore processors contain multiple logical processors in a single package.
There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors (known as "processing elements" or PEs) and memory.
Flynn's taxonomy, one of the most accepted taxonomies of parallel architectures, classifies parallel (and serial) computers according to
- whether all processors execute the same instructions at the same time (single instruction/multiple data -- SIMD) or
- each processor executes different instructions (multiple instruction/multiple data -- MIMD).
One major way to classify parallel computers is based on their memory architectures. Shared memory parallel computers have multiple processors accessing all available memory as global address space. They can be further divided into two main classes based on memory access times: Uniform Memory Access (UMA), in which access times to all parts of memory are equal, or Non-Uniform Memory Access (NUMA), in which they are not. Distributed memory parallel computers also have multiple processors, but each of the processors can only access its own local memory; no global memory address space exists across them.
Parallel computing systems can also be categorized by the numbers of processors in them. Systems with thousands of such processors are known as massively parallel. Subsequently there are what are referred to as "large scale" vs. "small scale" parallel processors. This depends on the size of the processor, eg. a PC based parallel system would generally be considered a small scale system.
Parallel processor machines are also divided into symmetric and asymmetric multiprocessors, depending on whether all the processors are the same or not (for instance if only one is capable of running the operating system code and others are less privileged).
A variety of architectures have been developed for parallel processing. For example a Ring architecture has processors linked by a ring structure. Other architectures include Hypercubes, Fat trees, systolic arrays, and so on.
[edit] Theory and practice
Parallel computers can be modelled as Parallel Random Access Machines (PRAMs). The PRAM model ignores the cost of interconnection between the constituent computing units, but is nevertheless very useful in providing upper bounds on the parallel solvability of many problems. In reality the interconnection plays a significant role.
The processors may communicate and cooperate in solving a problem or they may run independently, often under the control of another processor which distributes work to and collects results from them (a "processor farm").
Processors in a parallel computer may communicate with each other in a number of ways, including shared (either multiported or multiplexed) memory, a crossbar, a shared bus or an interconnect network of a myriad of topologies including star, ring, tree, hypercube, fat hypercube (a hypercube with more than one processor at a node), an n-dimensional mesh, etc. Parallel computers based on interconnect network need to employ some kind of routing to enable passing of messages between nodes that are not directly connected. The communication medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines. Similarly, memory may be either private to the processor, shared between a number of processors, or globally shared. Systolic array is an example of a multiprocessor with fixed function nodes, local-only memory and no message routing.
Approaches to parallel computers include:
- Multiprocessing
- Computer cluster
- Parallel supercomputers
- Distributed computing
- NUMA vs. SMP vs. massively parallel computer systems
- Grid computing
[edit] Performance vs. cost
While a system of x parallel processors is less efficient than one x-times-faster processor, the parallel system is often cheaper to build. Parallel computation is used for tasks which require very large amounts of computation, take a lot of time, and can be divided into x independent subtasks. In recent years, most high performance computing systems, also known as supercomputers, have parallel architectures.
[edit] Terminology in parallel computing
Some frequently used terms in parallel computing are:
- Efficiency
- is the execution time using a single processor divided by the quantity of the execution time using a multiprocessor and the number of processors.
- Parallel Overhead
- the extra work associated with parallel version compared to its sequential code, mostly the extra CPU time and memory space requirements from synchronization, data communications, parallel environment creation and cancellation, etc.
- Synchronization
- the coordination of simultaneous tasks to ensure correctness and avoid unexpected race conditions.
- Speedup
- also called parallel speedup, which is defined as wall-clock time of best serial execution divided by wall-clock time of parallel execution. Amdahl's law can be used to give a maximum speedup factor.
- Scalability
- a parallel system's ability to gain proportionate increase in parallel speedup with the addition of more processors. Also, see this Parallel Computing Glossary
- Task
- a logically high level, discrete, independent section of computational work. A task is typically executed by a processor as a program
[edit] Algorithms
Parallel algorithms can be constructed by redesigning serial algorithms to make effective use of parallel hardware. However, not all algorithms can be parallelized. This is summed up in a famous saying:
- One woman can have a baby in nine months, but nine women can't have a baby in one month.
In practice, linear speedup (i.e., speedup proportional to the number of processors) is very difficult to achieve. This is because many algorithms are essentially sequential in nature (Amdahl's law states this more formally).
Certain workloads can benefit from pipeline parallelism when extra processors are added. This uses a factory assembly line approach to divide the work. If the work can be divided into n stages where a discrete deliverable is passed from stage to stage, then up to n processors can be used. However, the slowest stage will hold up the other stages so it is rare to be able to fully use n processors.
[edit] Parallel problems
Well known parallel software problem sets include embarrassingly parallel and Grand Challenge problems.
[edit] Parallel programming
Parallel programming is the design, implementation, and tuning of parallel computer programs which take advantage of parallel computing systems. It also refers to the application of parallel programming methods to existing serial programs (parallelization).
Parallel programming focuses on partitioning the overall problem into separate tasks, allocating tasks to processors and synchronizing the tasks to get meaningful results. Parallel programming can only be applied to problems that are inherently parallelizable, mostly without data dependence. A problem can be partitioned based on domain decomposition or functional decomposition, or a combination.
There are two major approaches to parallel programming.
- implicit parallelism -- the system (the compiler or some other program) partitions the problem and allocates tasks to processors automatically (also called automatic parallelizing compilers) -- or
- explicit parallelism where the programmer must annotate their program to show how it is to be partitioned.
Many factors and techniques impact the performance of parallel programming:
- Load balancing attempts to keep all processors busy by moving tasks from heavily loaded processors to less loaded ones.
Some people consider parallel programming to be synonymous with concurrent programming. Others draw a distinction between parallel programming, which uses well-defined and structured patterns of communications between processes and focuses on parallel execution of processes to enhance throughput, and concurrent programming, which typically involves defining new patterns of communication between processes that may have been made concurrent for reasons other than performance. In either case, communication between processes is performed either via shared memory or with message passing, either of which may be implemented in terms of the other.
Programs which work correctly in a single CPU system may not do so in a parallel environment. This is because multiple copies of the same program may interfere with each other, for instance by accessing the same memory location at the same time. Therefore, careful programming (synchronization) is required in a parallel system.
[edit] Parallel programming models
A parallel programming model is a computing architecture and language designed to express parallelism in software systems and applications. The software to support these models include compilers, libraries and other tools that enable the application to use parallel hardware.
Parallel models are implemented in several ways: as libraries invoked from traditional sequential languages, as language extensions, or complete new execution models. They are also roughly categorized for two kinds of systems: shared memory systems and distributed memory systems, though the lines between them are largely blurred nowadays.
[edit] Topics in parallel computing
Generic:
Computer science topics:
- Lazy evaluation vs strict evaluation
- Complexity class NC
- Communicating sequential processes
- Dataflow architecture
- Parallel graph reduction
Practical problems:
- Parallel computer interconnects
- Parallel computer I/O
- Reliability problems in large systems
Programming languages/models:
- OpenMP
- Message Passing Interface/MPICH
- Charm++
- Ease
- Occam
- Linda
- *Lisp
- Cilk
- BMDFM: Binary Modular Dataflow Machine - Parallel Runtime Environment (BMDFM)
Specific:
- Atari Transputer Workstation
- BBN Butterfly computers
- Beowulf cluster
- Blue Gene
- Connection Machine
- Deep Blue
- Fifth generation computer systems project
- ILLIAC III
- ILLIAC IV
- Parallel Element Processing Ensemble
- Meiko Computing Surface
- NCUBE
- Teramac
- Transputer
Parallel computing to increase fault tolerance:
Companies (largely historical):
- Thinking Machines
- Convex Computer Corporation
- Meiko
- Control Data Corporation
- Myrias Research Corporation
[edit] See also
- MPI
- Stream processing
- Computer cluster
- Concurrent computing
- Distributed computing
- DNA computing
- Grid computing
- Important publications in parallel computing
- Parallel rendering
- Degree of parallelism
- Symmetric multiprocessing
- Concurrency
- Threads
- Xputer
- PARS
[edit] References
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.
- http://www.llnl.gov/computing/tutorials/parallel_comp/ Introduction to Parallel Computing
- http://www-unix.mcs.anl.gov/dbpp/ Designing and Building Parallel Programs, by Ian Foster
[edit] Further reading
- Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, Introduction to Parallel Computing (2003), ISBN 0-201-64865-2 (companion book site)
- Timothy G. Mattson, Beverly A. Sanders, and Berna L Massingill, Patterns for Parallel Computing (2005), ISBN 0-321-22811-1 (companion book site)
- Barry Wilkinson, Michael Allen, Parallel Programming (2005), ISBN 0-13-140563-2 (companion book site)
[edit] External links
- A Berkeley View on the Parallel Computing Landscape Argues for the desperate need to innovate around "manycore".
- RAMP: Research Accelerator for Multiple Processors A multi-university open source project to create inexpensive, flexible, large scale multiprocessors for the research community.
- Introduction to Parallel Computing
- "Multiprocessor Optimizations: Fine-Tuning Concurrent Access to Large Data Collections" by Ian Emmons
- Rogue Wave on Software Pipelines
- Internet Parallel Computing Archive
- National HPCC Software Exchange
- Parallel processing topic area at IEEE Distributed Computing Online
- "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software" by Herb Sutter
- Parallel programming citations from CiteSeer
- COPACOBANA (Cost-Optimized Parallel COde Breaker), an FPGA-based parallel computer
- WebSphere Advisor on Software Pipelines
Topics in Parallel Computing | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|