Knowledge

Parallel computing

Source đź“ť

38: 356: 3035:(PDF). University of California, Berkeley. Technical Report No. UCB/EECS-2006-183. "Old : Increasing clock frequency is the primary method of improving processor performance. New : Increasing parallelism is the primary method of improving processor performance… Even representatives from Intel, a company generally associated with the 'higher clock-speed is better' position, warned that traditional approaches to maximizing performance through maximizing clock speed have been pushed to their limits." 2674: 100: 1317: 1470: 5920: 1423: 1921: 5119: 2240: 2821:, which claims that "mind is formed from many little agents, each mindless by itself". The theory attempts to explain how what we call intelligence could be a product of the interaction of non-intelligent parts. Minsky says that the biggest source of ideas about the theory came from his work in trying to create a machine that uses a robotic arm, a video camera, and a computer to build with children's blocks. 344: 2102: 1387: 567:= 0.9), we can get no more than a 10 times speedup, regardless of how many processors are added. This puts an upper limit on the usefulness of adding more parallel execution units. "When a task cannot be partitioned because of sequential constraints, the application of more effort has no effect on the schedule. The bearing of a child takes nine months, no matter how many women are assigned." 571: 1840: 563:, it shows that a small part of the program which cannot be parallelized will limit the overall speedup available from parallelization. A program solving a large mathematical or engineering problem will typically consist of several parallelizable parts and several non-parallelizable (serial) parts. If the non-parallelizable part of a program accounts for 10% of the runtime ( 1616: 2486:—; this information can be used to restore the program if the computer should fail. Application checkpointing means that the program has to restart from only its last checkpoint rather than the beginning. While checkpointing provides benefits in a variety of situations, it is especially useful in highly parallel systems with a large number of processors used in 2344:) have been created for programming parallel computers. These can generally be divided into classes based on the assumptions they make about the underlying memory architecture—shared memory, distributed memory, or shared distributed memory. Shared memory programming languages communicate by manipulating shared memory variables. Distributed memory uses 4538:. Bibliothèque Universelle de Genève. Retrieved on November 7, 2007. quote: "when a long series of identical computations is to be performed, such as those required for the formation of numerical tables, the machine can be brought into play so as to give several results at the same time, which will greatly abridge the whole amount of the processes." 1654:—small and fast memories located close to the processor which store temporary copies of memory values (nearby in both the physical and logical sense). Parallel computer systems have difficulties with caches that may store the same value in more than one location, with the possibility of incorrect program execution. These computers require a 2783:. However, ILLIAC IV was called "the most infamous of supercomputers", because the project was only one-fourth completed, but took 11 years and cost almost four times the original estimate. When it was finally ready to run its first real application in 1976, it was outperformed by existing commercial supercomputers such as the 4333:." University of California, San Diego. "Future design for manufacturing (DFM) technology must reduce design cost and directly address manufacturing —the cost of a mask set and probe card—which is well over $ 1 million at the 90 nm technology node and creates a significant damper on semiconductor-based innovation." 216:
can be diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above. Historically parallel computing was used for scientific computing and the simulation of scientific problems, particularly in the natural and
1662:
is one of the most common methods for keeping track of which values are being accessed (and thus should be purged). Designing large, high-performance cache coherence systems is a very difficult problem in computer architecture. As a result, shared memory computer architectures do not scale as well as
215:
Parallel computing, on the other hand, uses multiple processing elements simultaneously to solve a problem. This is accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others. The processing elements
1888:
Because grid computing systems (described below) can easily handle embarrassingly parallel problems, modern clusters are typically designed to handle more difficult problems—problems that require nodes to share intermediate results with each other more often. This requires a high bandwidth and, more
1112:
locks multiple variables all at once. If it cannot lock all of them, it does not lock any of them. If two threads each need to lock the same two variables using non-atomic locks, it is possible that one thread will lock one of them and the second thread will lock the second variable. In such a case,
1717:
Parallel computers can be roughly classified according to the level at which the hardware supports parallelism. This classification is broadly analogous to the distance between basic computing nodes. These are not mutually exclusive; for example, clusters of symmetric multiprocessors are relatively
1148:
Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; it exhibits coarse-grained parallelism if they do not communicate many
911:
Violation of the first condition introduces a flow dependency, corresponding to the first segment producing a result used by the second segment. The second condition represents an anti-dependency, when the second segment produces a variable needed by the first segment. The third and final condition
334:
can ensure that different tasks and user programs are run in parallel on the available cores. However, for a serial software program to take full advantage of the multi-core architecture the programmer needs to restructure and parallelize the code. A speed-up of application software runtime will no
2217:) tend to wipe out these gains in only one or two chip generations. High initial cost, and the tendency to be overtaken by Moore's-law-driven general-purpose computing, has rendered ASICs unfeasible for most parallel computing applications. However, some have been built. One example is the PFLOPS 1805:
prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors. Because of the small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely
1533:
Task parallelisms is the characteristic of a parallel program that "entirely different calculations can be performed on either the same or different sets of data". This contrasts with data parallelism, where the same calculation is performed on the same or different sets of data. Task parallelism
581:
Amdahl's law only applies to cases where the problem size is fixed. In practice, as more computing resources become available, they tend to get used on larger problems (larger datasets), and the time spent in the parallelizable part often grows much faster than the inherently serial work. In this
4391:
However, the holy grail of such research—automated parallelization of serial programs—has yet to materialize. While automated parallelization of certain classes of algorithms has been demonstrated, such success has largely been limited to scientific and numeric applications with predictable flow
4519:
Patterson and Hennessy, pp. 749–50: "Although successful in pushing several technologies useful in later projects, the ILLIAC IV failed as a computer. Costs escalated from the $ 8 million estimated in 1966 to $ 31 million by 1972, despite the construction of only a quarter of the
2213:. This process requires a mask set, which can be extremely expensive. A mask set can cost over a million US dollars. (The smaller the transistors required for the chip, the more expensive the mask will be.) Meanwhile, performance increases in general-purpose computing over time (as described by 1939:
A massively parallel processor (MPP) is a single computer with many networked processors. MPPs have many of the same characteristics as clusters, but MPPs have specialized interconnect networks (whereas clusters use commodity hardware for networking). MPPs also tend to be larger than clusters,
1135:
Not all parallelization results in speed-up. Generally, as a task is split up into more and more threads, those threads spend an ever-increasing portion of their time communicating with each other or waiting on each other for access to resources. Once the overhead from resource contention or
322:
it can be predicted that the number of cores per processor will double every 18–24 months. This could mean that after 2020 a typical processor will have dozens or hundreds of cores, however in reality the standard is somewhere in the region of 4 to 16 cores, with some designs having a mix of
392:
from parallelization would be linear—doubling the number of processing elements should halve the runtime, and doubling it a second time should again halve the runtime. However, very few parallel algorithms achieve optimal speedup. Most of them have a near-linear speedup for small numbers of
2657:
if the results differ. These methods can be used to help prevent single-event upsets caused by transient errors. Although additional measures may be required in embedded or specialized systems, this method can provide a cost-effective approach to achieve n-modular redundancy in commercial
1893:
interconnection network. Many historic and current supercomputers use customized high-performance network hardware specifically designed for cluster computing, such as the Cray Gemini network. As of 2014, most current supercomputers use some off-the-shelf standard network hardware, often
242:
of a program is equal to the number of instructions multiplied by the average time per instruction. Maintaining everything else constant, increasing the clock frequency decreases the average time it takes to execute an instruction. An increase in frequency thus decreases runtime for all
1363:
microprocessors were replaced with 8-bit, then 16-bit, then 32-bit microprocessors. This trend generally came to an end with the introduction of 32-bit processors, which has been a standard in general-purpose computing for two decades. Not until the early 2000s, with the advent of
1850:
A cluster is a group of loosely coupled computers that work together closely, so that in some respects they can be regarded as a single computer. Clusters are composed of multiple standalone machines connected by a network. While machines in a cluster do not have to be symmetric,
351:. The speedup of a program from parallelization is limited by how much of the program can be parallelized. For example, if 90% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 10 times no matter how many processors are used. 1825:", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them. The same system may be characterized both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel. 1770:
is the best known) was an early form of pseudo-multi-coreism. A processor capable of concurrent multithreading includes multiple execution units in the same processing unit—that is it has a superscalar architecture—and can issue multiple instructions per clock cycle from
1534:
involves the decomposition of a task into sub-tasks and then allocating each sub-task to a processor for execution. The processors would then execute these sub-tasks concurrently and often cooperatively. Task parallelism does not usually scale with the size of a problem.
652:
Both Amdahl's law and Gustafson's law assume that the running time of the serial part of the program is independent of the number of processors. Amdahl's law assumes that the entire problem is of fixed size so that the total amount of work to be done in parallel is also
1418:
and combined into groups which are then executed in parallel without changing the result of the program. This is known as instruction-level parallelism. Advances in instruction-level parallelism dominated computer architecture from the mid-1980s until the mid-1990s.
681:), since calculations that depend upon prior calculations in the chain must be executed in order. However, most algorithms do not consist of just a long chain of dependent calculations; there are usually opportunities to execute independent calculations in parallel. 1340:—the amount of information the processor can manipulate per cycle. Increasing the word size reduces the number of instructions the processor must execute to perform an operation on variables whose sizes are greater than the length of the word. For example, where an 1277:
The single-instruction-single-data (SISD) classification is equivalent to an entirely sequential program. The single-instruction-multiple-data (SIMD) classification is analogous to doing the same operation repeatedly over a large data set. This is commonly done in
1093:—unable to proceed until V is unlocked again. This guarantees correct execution of the program. Locks may be necessary to ensure correct program execution when threads must serialize access to resources, but their use can greatly slow a program and may affect its 1297:, "Some machines are hybrids of these categories, of course, but this classic model has survived because it is simple, easy to understand, and gives a good first approximation. It is also—perhaps because of its understandability—the most widely used scheme." 371:
takes roughly 25% of the time of the whole computation. By working very hard, one may be able to make this part 5 times faster, but this only reduces the time for the whole computation by a little. In contrast, one may need to perform less work to make part
2082:
technology to third-party vendors has become the enabling technology for high-performance reconfigurable computing. According to Michael R. D'Amour, Chief Operating Officer of DRC Computer Corporation, "when we first walked into AMD, they called us 'the
302:(CPU or processor) manufacturers started to produce power efficient processors with multiple cores. The core is the computing unit of the processor and in multi-core processors each core is independent and can access the same memory concurrently. 2993:(PDF). Parallel@Illinois, University of Illinois at Urbana-Champaign. "The main techniques for these performance benefits—increased clock frequency and smarter but increasingly complex architectures—are now hitting the so-called power wall. The 2208:
Because an ASIC is (by definition) specific to a given application, it can be fully optimized for that application. As a result, for a given application, an ASIC tends to outperform a general-purpose computer. However, ASICs are created by
1136:
communication dominates the time spent on other computation, further parallelization (that is, splitting the workload over even more threads) increases rather than decreases the amount of time required to finish. This problem, known as
1030:. A lock is a programming language construct that allows one thread to take control of a variable and prevent other threads from reading or writing it, until that variable is unlocked. The thread holding the lock is free to execute its 2251:
A vector processor is a CPU or computer system that can execute the same instruction on large sets of data. Vector processors have high-level operations that work on linear arrays of numbers or vectors. An example vector operation is
506: 2371:
for hybrid multi-core parallel programming. The OpenHMPP directive-based programming model offers a syntax to efficiently offload computations on hardware accelerators and to optimize data movement to/from the hardware memory using
1820:
A distributed computer (also known as a distributed memory multiprocessor) is a distributed memory computer system in which the processing elements are connected by a network. Distributed computers are highly scalable. The terms
1600:. This model allows processes on one compute node to transparently access the remote memory of another compute node. All compute nodes are also connected to an external shared memory system via high-speed interconnect, such as 1940:
typically having "far more" than 100 processors. In an MPP, "each CPU contains its own memory and copy of the operating system and application. Each subsystem communicates with the others via a high-speed interconnect."
1591:
combine the two approaches, where the processing element has its own local memory and access to the memory on non-local processors. Accesses to local memory are typically faster than accesses to non-local memory. On the
1034:(the section of a program that requires exclusive access to some variable), and to unlock the data when it is finished. Therefore, to guarantee correct program execution, the above program can be rewritten to use locks: 921:
In this example, instruction 3 cannot be executed before (or even in parallel with) instruction 2, because instruction 3 uses a result from instruction 2. It violates condition 1, and thus introduces a flow dependency.
1174:. Flynn classified programs and computers by whether they were operating using a single set or multiple sets of instructions, and whether or not those instructions were using a single set or multiple sets of data. 2412:
is the "holy grail" of parallel computing, especially with the aforementioned limit of processor frequency. Despite decades of work by compiler researchers, automatic parallelization has had only limited success.
2130:
In the early days, GPGPU programs used the normal graphics APIs for executing programs. However, several new programming languages and platforms have been built to do general purpose computation on GPUs with both
5109: 4651: 1744:
and can issue multiple instructions per clock cycle from one instruction stream (thread); in contrast, a multi-core processor can issue multiple instructions per clock cycle from multiple instruction streams.
1583:(in which each processing element has its own local address space). Distributed memory refers to the fact that the memory is logically distributed, but often implies that it is physically distributed as well. 1355:
from the lower order addition; thus, an 8-bit processor requires two instructions to complete a single operation, where a 16-bit processor would be able to complete the operation with a single instruction.
1760:, is a prominent multi-core processor. Each core in a multi-core processor can potentially be superscalar as well—that is, on every clock cycle, each core can issue multiple instructions from one thread. 4392:
control (e.g., nested loop structures with statically determined iteration counts) and statically analyzable memory access patterns. (e.g., walks over large multidimensional arrays of float-point data).
1709:
to enable the passing of messages between nodes that are not directly connected. The medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines.
2824:
Similar models (which also view the biological brain as a massively parallel computer, i.e., the brain is made up of a constellation of independent or semi-independent agents) were also described by:
1351:, the processor must first add the 8 lower-order bits from each integer using the standard addition instruction, then add the 8 higher-order bits using an add-with-carry instruction and the 2733:. In 1967, Amdahl and Slotnick published a debate about the feasibility of parallel processing at American Federation of Information Processing Societies Conference. It was during this debate that 4128:
All simulated circuits were described in very high speed integrated circuit (VHSIC) hardware description language (VHDL). Hardware modeling was performed on Xilinx FPGA Artix 7 xc7a200tfbg484-2.
4220:"Systematic Generation of Executing Programs for Processor Elements in Parallel ASIC or FPGA-Based Systems and Their Transformation into VHDL-Descriptions of Processor Element Control Units". 335:
longer be achieved through frequency scaling, instead programmers will need to parallelize their software code to take advantage of the increasing computing power of multicore architectures.
2286:
computers became famous for their vector-processing computers in the 1970s and 1980s. However, vector processors—both as CPUs and as full computer systems—have generally disappeared. Modern
5107: 906: 856: 806: 647: 1987:(software that sits between the operating system and the application to manage network resources and standardize the software interface). The most common grid computing middleware is the 119:, each core performing a task independently. On the other hand, concurrency enables a program to deal with multiple tasks even on a single CPU core; the core switches between tasks (i.e. 64:
are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing:
6051: 146:
use multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks.
287:
is the processor frequency (cycles per second). Increases in frequency increase the amount of power used in a processor. Increasing processor power consumption led ultimately to
1477:
processor with two execution units. In the best case scenario, it takes one clock cycle to complete two instructions and thus the processor can issue superscalar performance (
4189:
D'Amour, Michael R., Chief Operating Officer, DRC Computer Corporation. "Standard Reconfigurable Computing". Invited speaker at the University of Delaware, February 28, 2007.
979:
that is shared between them. Without synchronization, the instructions between the two threads may be interleaved in any order. For example, consider the following program:
5108: 2756:
in the 1970s, was among the first multiprocessors with more than a few processors. The first bus-connected multiprocessor with snooping caches was the Synapse N+1 in 1984.
2997:
has accepted that future performance increases must largely come from increasing the number of processors (or cores) on a die, rather than making a single core go faster."
2779:. The key to its design was a fairly high parallelism, with up to 256 processors, which allowed the machine to work on large datasets in what would later be known as 931:
Bernstein's conditions do not allow memory to be shared between different processes. For that, some means of enforcing an ordering between accesses is necessary, such as
4721: 4239:
Shimokawa, Y.; Fuwa, Y.; Aramaki, N. (18–21 November 1991). "A parallel ASIC VLSI neurocomputer for a large number of neurons and billion connections per second speed".
2796: 1402:
A computer program is, in essence, a stream of instructions executed by a processor. Without instruction-level parallelism, a processor can only issue less than one
88:. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in 4005: 1623:(NUMA) architecture. Processors in one directory can access that directory's memory with less latency than they can access memory in the other directory's memory. 3959: 1018:
If instruction 1B is executed between 1A and 3A, or if instruction 1A is executed between 1B and 3B, the program will produce incorrect data. This is known as a
405: 5957: 5246: 1976:
to work on a given problem. Because of the low bandwidth and extremely high latency available on the Internet, distributed computing typically deals only with
1286:), few applications that fit this class materialized. Multiple-instruction-multiple-data (MIMD) programs are by far the most common type of parallel programs. 1132:, altogether avoids the use of locks and barriers. However, this approach is generally difficult to implement and requires correctly designed data structures. 6021: 2482:
is a technique whereby the computer system takes a "snapshot" of the application—a record of all current resource allocations and variable states, akin to a
2325: 2320: 1282:
applications. Multiple-instruction-single-data (MISD) is a rarely used classification. While computer architectures to deal with this were devised (such as
4565:. 1988. p. 8 quote: "The earliest reference to parallelism in computer design is thought to be in General L. F. Menabrea's publication in… 1842, entitled 4033: 5970: 1988: 4520:
planned machine . It was perhaps the most infamous of supercomputers. The project started in 1965 and ran its first real application in 1976."
3823: 3794: 1666:
Processor–processor and processor–memory communication can be implemented in hardware in several ways, including via shared (either multiported or
1462:
processor, with five stages: instruction fetch (IF), instruction decode (ID), execute (EX), memory access (MEM), and register write back (WB). The
4626: 4358: 3497: 3371: 2498:
As parallel computers become larger and faster, we are now able to solve problems that had previously taken too long to run. Fields as varied as
1503:
in that the several execution units are not entire processors (i.e. processing units). Instructions can be grouped together only if there is no
1430:
processor. In the best case scenario, it takes one clock cycle to complete one instruction and thus the processor can issue scalar performance (
6120: 3692: 2986: 6055: 5336: 1442:. Each stage in the pipeline corresponds to a different action the processor performs on that instruction in that stage; a processor with an 123:) without necessarily completing each one. A program can have both, neither or a combination of parallelism and concurrency characteristics. 4326: 3208: 2717:
In April 1958, Stanley Gill (Ferranti) discussed parallel programming and the need for branching and waiting. Also in 1958, IBM researchers
5188: 4096:
Valueva, Maria; Valuev, Georgii; Semyonova, Nataliya; Lyakhov, Pavel; Chervyakov, Nikolay; Kaplun, Dmitry; Bogaevskiy, Danil (2019-06-20).
912:
represents an output dependency: when two segments write to the same location, the result comes from the logically last executed segment.
2449: 2036:(FPGA) as a co-processor to a general-purpose computer. An FPGA is, in essence, a computer chip that can rewire itself for a given task. 5950: 3571: 1640: 1572: 2887: 310:. Thus parallelization of serial programs has become a mainstream programming task. In 2012 quad-core processors became standard for 4441: 4280:
Acken, Kevin P.; Irwin, Mary Jane; Owens, Robert M. (July 1998). "A Parallel ASIC Architecture for Efficient Fractal Image Coding".
4202: 1336:(VLSI) computer-chip fabrication technology in the 1970s until about 1986, speed-up in computer architecture was driven by doubling 126:
Parallel computers can be roughly classified according to the level at which the hardware supports parallelism, with multi-core and
6184: 149:
In some cases parallelism is transparent to the programmer, such as in bit-level or instruction-level parallelism, but explicitly
6534: 6396: 2714:
and a "Program Distributor" to dispatch and collect data to and from independent processing units connected to a central memory.
2510:) and economics have taken advantage of parallel computing. Common types of problems in parallel computing applications include: 2202: 2196: 3671: 6540: 5317: 2882: 2768: 2116:(GPGPU) is a fairly recent trend in computer engineering research. GPUs are co-processors that have been heavily optimized for 4731: 5943: 4980: 4952: 4807: 4776: 4384: 4256: 4161: 3778: 3455: 3339: 3191: 3166: 3091: 2963: 1797:
A symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a
1153:
if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.
173:
between the different subtasks are typically some of the greatest obstacles to getting optimal parallel program performance.
5357: 1779:
on the other hand includes a single execution unit in the same processing unit and can issue one instruction at a time from
6725: 6473: 5584: 3621: 3073: 2018: 702:
be two program segments. Bernstein's conditions describe when the two are independent and can be executed in parallel. For
212:
on one computer. Only one instruction may execute at a time—after that instruction is finished, the next one is executed.
5607: 1220: 2759:
SIMD parallel computers can be traced back to the 1970s. The motivation behind early SIMD computers was to amortize the
6193: 5496: 4002: 1129: 1117: 964: 940: 239: 170: 5352: 1170:
created one of the earliest classification systems for parallel (and sequential) computers and programs, now known as
5602: 5579: 4898: 4840: 3304: 3267: 3242: 3141: 3053: 2364:, where one part of a program promises to deliver a required datum to another part of a program at some future time. 2333: 1215: 1199: 295:
processors, which is generally cited as the end of frequency scaling as the dominant computer architecture paradigm.
5058:
Rodriguez, C.; Villagra, M.; Baran, B. (29 August 2008). "Asynchronous team algorithms for Boolean Satisfiability".
2167:
specification, which is a framework for writing programs that execute across platforms consisting of CPUs and GPUs.
1972:
Grid computing is the most distributed form of parallel computing. It makes use of computers communicating over the
6565: 6245: 6189: 5181: 2360:(MPI) is the most widely used message-passing system API. One concept used in programming parallel programs is the 1658:
system, which keeps track of cached values and strategically purges them, thus ensuring correct program execution.
1643:
system, in which the memory is not physically distributed. A system that does not have this property is known as a
1597: 1519:) are two of the most common techniques for implementing out-of-order execution and instruction-level parallelism. 862: 812: 762: 592: 3471: 1489:. They usually combine this feature with pipelining and thus can issue more than one instruction per clock cycle ( 6425: 6298: 6229: 6164: 6087: 5574: 5389: 2718: 2017:
Within parallel computing, there are specialized parallel devices that remain niche areas of interest. While not
1394:. It takes five clock cycles to complete one instruction and thus the processor can issue subscalar performance ( 1194: 972: 17: 6798: 6603: 6366: 5996: 5681: 5595: 5544: 5124: 4411: 3916: 3733: 2877: 2638: 2632: 2524: 2433: 1915: 1290: 1246: 139: 5161: 3956: 3932: 3928: 3912: 6793: 6381: 6371: 6149: 5905: 5739: 5590: 5277: 2040: 2033: 1450:
different instructions at different stages of completion and thus can issue one instruction per clock cycle (
1381: 1262: 1241: 69: 28: 4430:(PDF). University of California, Berkeley. Technical Report No. UCB/EECS-2006-183. See table on pages 17–19. 6765: 6745: 6675: 6618: 6580: 6570: 6530: 6455: 6391: 6361: 6288: 6277: 6174: 6154: 6129: 6092: 5016: 1333: 2767:
over multiple instructions. In 1964, Slotnick had proposed building a massively parallel computer for the
6788: 6720: 6483: 6450: 6345: 6321: 6283: 6263: 6159: 6068: 6046: 6031: 5924: 5870: 5330: 5174: 4702:. Virginia Tech/Norfolk State University, Interactive Learning with a Digital Library in Computer Science 3597: 2703: 1874: 1763: 1236: 1105: 928:
In this example, there are no dependencies between the instructions, so they can all be run in parallel.
541:
is the percentage of the execution time of the whole task concerning the parallelizable part of the task
188:, which states that it is limited by the fraction of time for which the parallelization can be utilised. 3820: 3646: 2144: 1550:
and basic block vectorization. It is distinct from loop vectorization algorithms in that it can exploit
393:
processing elements, which flattens out into a constant value for large numbers of processing elements.
208:
is constructed and implemented as a serial stream of instructions. These instructions are executed on a
6667: 6653: 6560: 6520: 6445: 6351: 6331: 6198: 6077: 6011: 5849: 5644: 5529: 5491: 5341: 5231: 4496:
Dobel, B., Hartig, H., & Engel, M. (2012) "Operating system support for redundant multithreading".
4355: 3831: 3505: 2907: 2753: 2729:
introduced the D825 in 1962, a four-processor computer that accessed up to 16 memory modules through a
2487: 2475: 2337: 2120:
processing. Computer graphics processing is a field dominated by data parallel operations—particularly
1852: 1121: 936: 225: 81: 4030: 3409: 6760: 6525: 6435: 6415: 6401: 5865: 5844: 5789: 5676: 5666: 5639: 5501: 5143: 2547: 2537: 2479: 2469: 2405: 2400: 2357: 2303: 1792: 1644: 1620: 1584: 968: 127: 4427: 3700: 3556: 3388: 3032: 2983: 6740: 6700: 6643: 6575: 6313: 6144: 5819: 5445: 5384: 5297: 2912: 2646: 2113: 2029: 1860: 963:. However, "threads" is generally accepted as a generic term for subtasks. Threads will often need 956: 1995:
software makes use of "spare cycles", performing computations at times when a computer is idling.
6750: 6730: 6671: 6658: 6638: 6465: 6202: 6106: 6064: 5935: 5880: 5875: 5734: 5325: 4529: 4323: 2922: 2748:
system, a symmetric multiprocessor system capable of running up to eight processors in parallel.
2685: 2557: 2425: 2291: 1977: 1776: 1733: 1551: 1543: 1150: 1125: 1023: 932: 925:
1: function NoDep(a, b) 2: c := a * b 3: d := 3 * b 4: e := a + b 5: end function
677:. No program can run more quickly than the longest chain of dependent calculations (known as the 299: 209: 275:
being switched per clock cycle (proportional to the number of transistors whose inputs change),
6710: 6685: 6679: 6623: 6585: 6273: 6268: 6220: 6115: 6016: 5988: 5979: 5619: 5551: 5455: 5347: 5302: 5000: 4890: 3383: 2892: 2056: 1885:
supercomputers are clusters. The remaining are Massively Parallel Processors, explained below.
1474: 1439: 1427: 1415: 1403: 1391: 1094: 976: 5082:
Sechin, A.; Parallel Computing in Photogrammetry. GIM International. #1, 2016, pp. 21–23.
4926: 3289:
Proceedings of the April 18-20, 1967, spring joint computer conference on - AFIPS '67 (Spring)
6612: 6608: 6550: 6502: 6072: 5711: 5671: 5624: 5614: 5409: 5272: 5211: 2897: 2872: 2726: 2373: 1890: 1815: 1750: 1636: 1632: 1588: 1311: 955:. Some parallel computer architectures use smaller, lightweight versions of threads known as 526: 89: 65: 5060:
Bio-Inspired Models of Network, Information and Computing Systems, 2007. Bionetics 2007. 2nd
4832: 3725: 946: 6755: 6735: 6695: 6497: 6356: 6225: 6212: 5966: 5651: 5539: 5534: 5524: 5511: 5307: 5139: 4878: 4856: 4289: 3285:"Validity of the single processor approach to achieving large scale computing capabilities" 2667: 2642: 2619: 2421: 2417: 2361: 2341: 2124: 1822: 1727: 1500: 678: 501:{\displaystyle S_{\text{latency}}(s)={\frac {1}{1-p+{\frac {p}{s}}}}={\frac {s}{s+p(1-s)}}} 303: 235: 217: 154: 116: 93: 4799: 4793: 4768: 4762: 8: 6690: 6628: 6440: 6420: 6406: 6138: 6006: 6001: 5814: 5769: 5569: 5435: 5151: 4472: 4199: 3111: 2711: 2603: 2588: 2568: 2329: 1992: 1856: 1844: 1691: 1609: 1596:, distributed shared memory space can be implemented using the programming model such as 1180: 1171: 1162: 960: 166: 61: 4293: 3863: 3527: 6460: 6430: 6376: 6235: 6134: 6026: 5839: 5688: 5661: 5486: 5450: 5440: 5399: 5241: 5221: 5071: 4973:
Divided consciousness: multiple controls in human thought and action (expanded edition)
4919: 4883: 4608: 4305: 4262: 4167: 4146:
2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS)
4078: 3788: 3544: 3401: 3310: 2837: 2814: 2572: 2563: 2222: 1870: 1627:
Computer architectures in which each element of main memory can be accessed with equal
1580: 1512: 952: 674: 657:, whereas Gustafson's law assumes that the total amount of work to be done in parallel 583: 575: 315: 201: 150: 131: 120: 6663: 6555: 6410: 6386: 6326: 6293: 6255: 6240: 6179: 5885: 5561: 5519: 5414: 4976: 4948: 4894: 4836: 4825: 4803: 4772: 4600: 4407: 4380: 4252: 4171: 4157: 4153: 4119: 4070: 3774: 3451: 3335: 3314: 3300: 3263: 3238: 3187: 3162: 3137: 3087: 3081: 3049: 2994: 2959: 2932: 2831: 2760: 2706:
announced the first computer architecture specifically designed for parallelism, the
2582: 2553: 2507: 2117: 1516: 1279: 1137: 311: 307: 292: 231: 197: 85: 5075: 4612: 4266: 4082: 3821:
The Microprocessor Ten Years From Now: What Are The Challenges, How Do We Meet Them?
3405: 3334:(Anniversary ed., repr. with corr., 5.  ed.). Reading, Mass. : Addison-Wesley. 3006: 6545: 6477: 6341: 6082: 5895: 5694: 5629: 5476: 5292: 5287: 5282: 5251: 5063: 4860: 4592: 4501: 4464: 4456: 4309: 4297: 4244: 4149: 4109: 4062: 3742: 3721: 3428: 3423:
Bernstein, A. J. (1 October 1966). "Analysis of Programs for Parallel Processing".
3393: 3292: 3069: 2855: 2780: 2654: 2607: 2593: 2234: 2210: 1903: 1834: 1679: 1528: 1369: 1345: 1337: 1294: 1167: 1101: 1090: 1031: 1027: 915:
Consider the following functions, which demonstrate several kinds of dependencies:
396:
The potential speedup of an algorithm on a parallel computing platform is given by
331: 135: 108: 77: 73: 4098:"Construction of Residue Number System Using Hardware Efficient Diagonal Function" 3944: 3329: 2059:, with which most programmers are familiar. The best known C to HDL languages are 535:
is the speedup in latency of the execution of the parallelizable part of the task;
376:
twice as fast. This will make the computation much faster than by optimizing part
6595: 6469: 6335: 6036: 5759: 5699: 5634: 5481: 5471: 5404: 5394: 5236: 5226: 5067: 4914: 4560: 4423: 4362: 4330: 4241:[Proceedings] 1991 IEEE International Joint Conference on Neural Networks 4206: 4037: 4009: 3963: 3827: 3025: 2990: 2917: 2809: 2734: 2730: 2722: 2695: 2691: 2650: 2599: 2578: 2503: 2345: 2287: 2218: 2090: 2009:
The ubiquity of Internet brought the possibility of large-scale cloud computing.
2004: 1798: 1767: 1699: 1695: 1675: 1671: 1655: 1558:, such as manipulating coordinates, color channels or in loops unrolled by hand. 1504: 1360: 1341: 670: 397: 348: 185: 1647:(NUMA) architecture. Distributed memory systems have non-uniform memory access. 586:
gives a less pessimistic and more realistic assessment of parallel performance:
37: 6647: 6303: 6169: 5890: 5706: 5363: 5256: 4968: 4940: 4580: 4141: 4114: 4097: 2843: 2804: 2725:
discussed the use of parallelism in numerical calculations for the first time.
2530: 2515: 2499: 2428:
for parallelization. A few fully implicit parallel programming languages exist—
2384: 2380: 2277: 2214: 2121: 2079: 2024: 1967: 1802: 1741: 1628: 1593: 1547: 1486: 1283: 1019: 324: 319: 162: 143: 4533: 4301: 4248: 1806:
cost-effective, provided that a sufficient amount of memory bandwidth exists.
1705:
Parallel computers based on interconnected networks need to have some kind of
6782: 6633: 5779: 5656: 4726: 4699: 4604: 4219: 4123: 3432: 2800: 2707: 2673: 2349: 1951: 1933: 1878: 1757: 1576: 1508: 1143: 1089:
One thread will successfully lock variable V, while the other thread will be
355: 99: 84:, but has gained broader interest due to the physical constraints preventing 42: 4505: 4066: 3746: 3296: 2367:
Efforts to standardize parallel programming include an open standard called
2205:(ASIC) approaches have been devised for dealing with parallel applications. 918:
1: function Dep(a, b) 2: c := a * b 3: d := 3 * c 4: end function
153:, particularly those that use concurrency, are more difficult to write than 5379: 4460: 4142:"Hardware Design of Approximate Matrix Multiplier based on FPGA in Verilog" 4074: 3284: 2953: 2772: 2764: 2543: 2160: 2106: 1687: 1683: 1667: 1659: 1605: 1316: 158: 5152:
Lawrence Livermore National Laboratory: Introduction to Parallel Computing
5127:
was created from a revision of this article dated 21 August 2013
4596: 4498:
Proceedings of the Tenth ACM International Conference on Embedded Software
3979: 3161:(3rd ed.). San Francisco, Calif.: International Thomson. p. 43. 3068: 1855:
is more difficult if they are not. The most common type of cluster is the
6515: 5900: 4996: 4209:(PDF). ARL-SR-154, U.S. Army Research Lab. Retrieved on November 7, 2007. 3864:"Exploiting Superword Level Parallelism with Multimedia Instruction Sets" 3448:
Parallel processing and parallel algorithms : theory and computation
3077: 2849: 2021:, they tend to be applicable to only a few classes of parallel problems. 1737: 1698:, fat hypercube (a hypercube with more than one processor at a node), or 1555: 1495: 1109: 947:
Race conditions, mutual exclusion, synchronization, and parallel slowdown
272: 221: 177: 4562:
Parallel Computers 2: Architecture, Programming and Algorithms, Volume 2
3397: 3354: 298:
To deal with the problem of power consumption and overheating the major
34:
Programming paradigm in which many processes are executed simultaneously
27:"Parallelization" redirects here. For parallelization of manifolds, see 3016:
are expensive. New is power is expensive, but transistors are "free".
3013: 2927: 2172: 2152: 2084: 2060: 1984: 1899: 1601: 224:. This led to the design of parallel hardware and software, as well as 2159:. Nvidia has also released specific products for computation in their 1469: 157:
ones, because concurrency introduces several new classes of potential
5774: 5749: 5166: 4581:"Something old: the Gamma 60 the computer that was ahead of its time" 4377:
Modern processor design : fundamentals of superscalar processors
4279: 3816: 2776: 2741: 2677: 2613: 2483: 2437: 2156: 2064: 1947: 1929: 1651: 1463: 1422: 1352: 244: 205: 57: 46: 5156: 4945:
Divided consciousness: multiple controls in human thought and action
4468: 4442:"A Randomized Parallel Sorting Algorithm with an Experimental Study" 4428:"The Landscape of Parallel Computing Research: A View from Berkeley" 3033:"The Landscape of Parallel Computing Research: A View from Berkeley" 1920: 5824: 5804: 5729: 3357:
Structured Parallel Programming: Patterns for Efficient Computation
2902: 2409: 2368: 2148: 2068: 2052: 1973: 1867: 1325: 1140:, can be improved in some cases by software analysis and redesign. 5965: 4053:
Kirkpatrick, Scott (2003). "COMPUTER SCIENCE: Rough Times Ahead".
2790: 2280:
numbers. They are closely related to Flynn's SIMD classification.
2239: 2055:
languages that attempt to emulate the syntax and semantics of the
115:
and concurrency are two different things: a parallel program uses
5829: 5809: 5784: 5419: 3083:
Computer organization and design: the hardware/software interface
2745: 2645:
systems performing the same operation in parallel. This provides
2457: 2441: 2295: 2072: 2048: 1895: 1736:(called "cores") on the same chip. This processor differs from a 1706: 1348: 522: 389: 280: 181: 1458:
processors. The canonical example of a pipelined processor is a
5799: 5794: 5034: 2784: 2737:
was coined to define the limit of speed-up due to parallelism.
2388: 2353: 2244: 2184: 2180: 2164: 2132: 1955: 1882: 1864: 1365: 1321: 184:
of a single program as a result of parallelization is given by
3773:( ed.). San Francisco: Morgan Kaufmann Publ. p. 15. 2101: 2091:
General-purpose computing on graphics processing units (GPGPU)
5021:
In Search of the Miraculous. Fragments of an Unknown Teaching
4095: 3947:
Webopedia computer dictionary. Retrieved on November 7, 2007.
2749: 2429: 2299: 2290:
do include some vector processing instructions, such as with
2176: 2096: 1732:
A multi-core processor is a processor that includes multiple
1386: 343: 288: 4827:
Beyond the Conscious Mind. Unlocking the Secrets of the Self
4365:
refers to MPI as "the dominant HPC communications interface"
4238: 2356:
are two of the most widely used shared memory APIs, whereas
2025:
Reconfigurable computing with field-programmable gate arrays
570: 5834: 5764: 5754: 4921:
Evolution of Consciousness: The Origins of the Way We Think
4567:
Sketch of the Analytical Engine Invented by Charles Babbage
3156: 2984:"Parallel Computing Research at Illinois: The UPCRC Agenda" 2797:
MIT Computer Science and Artificial Intelligence Laboratory
2453: 2445: 2283: 2190: 2163:. The technology consortium Khronos Group has released the 2140: 2044: 1754: 1459: 1267: 4200:
GPUs: An Emerging Platform for General-Purpose Computation
3861: 3257: 3232: 3131: 1839: 1144:
Fine-grained, coarse-grained, and embarrassing parallelism
5744: 5721: 4627:"Architecture Sketch of Bull Gamma 60 -- Mark Smotherman" 4535:
Sketch of the Analytic Engine Invented by Charles Babbage
4379:(1st ed.). Dubuque, Iowa: McGraw-Hill. p. 561. 2951: 2775:, which was the earliest SIMD parallel-computing effort, 2168: 2136: 1943: 1925: 1746: 1639:(UMA) systems. Typically, that can be achieved only by a 4679:
Vol. 1 #1, pp2-10, British Computer Society, April 1958.
4440:
David R., Helman; David A., Bader; JaJa, Joseph (1998).
4324:
Scoping the Problem of DFM in the Semiconductor Industry
3186:. Upper Saddle River, N.J.: Prentice-Hall. p. 235. 3086:(2. ed., 3rd print. ed.). San Francisco: Kaufmann. 2637:
Parallel computing can also be applied to the design of
2416:
Mainstream parallel programming languages remain either
384:
s speedup is greater by ratio, (5 times versus 2 times).
5157:
Designing and Building Parallel Programs, by Ian Foster
3260:
Parallel Programming: for Multicore and Cluster Systems
3235:
Parallel Programming: for Multicore and Cluster Systems
3184:
Digital integrated circuits : a design perspective
3134:
Parallel Programming: for Multicore and Cluster Systems
2649:
in case one component fails, and also allows automatic
1859:, which is a cluster implemented on multiple identical 1615: 1124:. Barriers are typically implemented using a lock or a 5057: 4885:
The Social Brain. Discovering the Networks of the Mind
4700:"The History of the Development of Parallel Computing" 4356:
Sidney Fernbach Award given to MPI inventor Bill Gropp
2147:
respectively. Other GPU programming languages include
4198:
Boggan, Sha'Kia and Daniel M. Pressel (August 2007).
3726:"Some Computer Organizations and Their Effectiveness" 3355:
Michael McCool; James Reinders; Arch Robison (2013).
3331:
The mythical man month essays on software engineering
2321:
List of concurrent and parallel programming languages
865: 815: 765: 595: 408: 4658:. Department of Computer Science, Clemson University 4439: 4374: 3262:. Springer Science & Business Media. p. 3. 3237:. Springer Science & Business Media. p. 2. 3136:. Springer Science & Business Media. p. 1. 1575:(shared between all processing elements in a single 1515:(which is similar to scoreboarding but makes use of 3622:"What's the opposite of "embarrassingly parallel"?" 1116:Many parallel programs require that their subtasks 1113:neither thread can complete, and deadlock results. 338: 4918: 4882: 4824: 4649: 3209:"Intel Halts Development Of 2 New Microprocessors" 2684:The origins of true (MIMD) parallelism go back to 1989:Berkeley Open Infrastructure for Network Computing 1612:physically distributed across multiple I/O nodes. 900: 850: 800: 641: 500: 3572:"Synchronization internals – the semaphore" 2379:The rise of consumer GPUs has led to support for 2314: 2012: 1873:. Beowulf technology was originally developed by 1604:, this external shared memory system is known as 6780: 2075:based on C++ can also be used for this purpose. 1712: 1375: 951:Subtasks in a parallel program are often called 4855: 3159:Computer architecture / a quantitative approach 3157:Hennessy, John L.; Patterson, David A. (2002). 2791:Biological brain as massively parallel computer 1909: 1499:processors. Superscalar processors differ from 1324:, a parallel supercomputing device that joined 1247:Associative processing (predicated/masked SIMD) 327:design) due to thermal and design constraints. 3980:"List Statistics | TOP500 Supercomputer Sites" 3046:Parallel and Concurrent Programming in Haskell 2474:As a computer system grows in complexity, the 1537: 359:Assume that a task has two independent parts, 5951: 5182: 4652:"An Evaluation of the Design of the Gamma 60" 4449:Journal of Parallel and Distributed Computing 3672:"Why a simple test can get parallel slowdown" 2087:stealers.' Now they call us their partners." 1571:Main memory in a parallel computer is either 901:{\displaystyle O_{i}\cap O_{j}=\varnothing .} 851:{\displaystyle I_{i}\cap O_{j}=\varnothing ,} 801:{\displaystyle I_{j}\cap O_{i}=\varnothing ,} 659:varies linearly with the number of processors 642:{\displaystyle S_{\text{latency}}(s)=1-p+sp.} 49:are designed to heavily exploit parallelism. 4404:Encyclopedia of Parallel Computing, Volume 4 3793:: CS1 maint: multiple names: authors list ( 2813:theory, which views the biological brain as 2463: 2394: 1786: 1104:locks introduces the possibility of program 959:, while others use bigger versions known as 234:was the dominant reason for improvements in 4798:. New York: Simon & Schuster. pp.  4767:. New York: Simon & Schuster. pp.  4719: 4693: 4691: 4689: 4687: 4685: 4515: 4513: 4139: 4052: 2958:. Redwood City, Calif.: Benjamin/Cummings. 2952:Gottlieb, Allan; Almasi, George S. (1989). 2424:, in which a programmer gives the compiler 2383:, either in graphics APIs (referred to as 1566: 5958: 5944: 5189: 5175: 5015: 4375:Shen, John Paul; Mikko H. Lipasti (2004). 4148:. Madurai, India: IEEE. pp. 496–498. 3714: 3598:"An Introduction to Lock-Free Programming" 3529:Introduction to Operating System Deadlocks 3109: 1678:or an interconnect network of a myriad of 1608:, which is typically built from arrays of 1485:Most modern processors also have multiple 323:performance and efficiency cores (such as 6022:Programming in the large and in the small 4877: 4822: 4185: 4183: 4181: 4113: 3846: 3837: 3525: 3425:IEEE Transactions on Electronic Computers 3422: 3387: 3369: 3105: 3103: 3019: 2888:List of distributed computing conferences 5135:, and does not reflect subsequent edits. 5118: 5035:"Official Neurocluster Brain Model site" 4913: 4682: 4578: 4546: 4544: 4510: 4341: 4339: 4140:Gupta, Ankit; Suneja, Kriti (May 2020). 3890: 3888: 3878: 3876: 3644: 3595: 3450:. New York, NY : Springer. p. 114. 3114:. Lawrence Livermore National Laboratory 2672: 2666:For broader coverage of this topic, see 2238: 2191:Application-specific integrated circuits 2139:releasing programming environments with 2100: 1954:in the world according to the June 2009 1919: 1838: 1809: 1614: 1468: 1421: 1385: 1315: 1305: 569: 354: 342: 98: 80:. Parallelism has long been employed in 36: 4967: 4939: 3669: 2680:, "the most infamous of supercomputers" 2203:application-specific integrated circuit 2197:Application-specific integrated circuit 1721: 1438:All modern processors have multi-stage 729:the output variables, and likewise for 655:independent of the number of processors 14: 6781: 5196: 4791: 4760: 4697: 4585:ACM SIGARCH Computer Architecture News 4178: 3830:(wmv). Distinguished Lecturer talk at 3690: 3495: 3445: 3327: 3282: 3181: 3100: 2883:Content Addressable Parallel Processor 2769:Lawrence Livermore National Laboratory 2493: 2276:are each 64-element vectors of 64-bit 1414:processors. These instructions can be 975:, for example when they must update a 5939: 5170: 4541: 4336: 4282:The Journal of VLSI Signal Processing 3885: 3873: 3768: 3720: 3569: 3258:Thomas Rauber; Gudula RĂĽnger (2013). 3233:Thomas Rauber; Gudula RĂĽnger (2013). 3206: 3132:Thomas Rauber; Gudula RĂĽnger (2013). 1983:Most grid computing applications use 247:programs. However, power consumption 4995: 4751:Patterson and Hennessy, p. 749. 4550:Patterson and Hennessy, p. 753. 4345:Patterson and Hennessy, p. 751. 4021:Hennessy and Patterson, p. 537. 3903:Patterson and Hennessy, p. 714. 3894:Hennessy and Patterson, p. 549. 3882:Patterson and Hennessy, p. 713. 3759:Patterson and Hennessy, p. 748. 3112:"Introduction to Parallel Computing" 2573:brute-force cryptographic techniques 2391:), or in other language extensions. 2228: 2221:machine which uses custom ASICs for 1828: 1128:. One class of algorithms, known as 5162:Internet Parallel Computing Archive 4243:. Vol. 3. pp. 2162–2167. 3526:Tanenbaum, Andrew S. (2002-02-01). 2525:Cooley–Tukey fast Fourier transform 1740:processor, which includes multiple 1522: 1466:processor had a 35-stage pipeline. 1156: 529:of the execution of the whole task; 306:have brought parallel computing to 291:'s May 8, 2004 cancellation of its 251:by a chip is given by the equation 238:from the mid-1980s until 2004. The 24: 5105: 5051: 4322:Kahng, Andrew B. (June 21, 2004) " 3862:Samuel Larsen; Saman Amarasinghe. 3769:Singh, David Culler; J.P. (1997). 2626: 1998: 1242:Pipelined processing (packed SIMD) 1149:times per second, and it exhibits 1130:lock-free and wait-free algorithms 720:be all of the input variables and 25: 6810: 5086: 4720:Anthes, Gry (November 19, 2001). 4675:"Parallel Programming", S. Gill, 4650:Tumlin, Smotherman (2023-08-14). 4223:Lecture Notes in Computer Science 2972: 2616:, a concise message-passing model 1961: 1542:Superword level parallelism is a 1493:). These processors are known as 1454:). These processors are known as 1410:). These processors are known as 1100:Locking multiple variables using 892: 842: 792: 6566:Partitioned global address space 5919: 5918: 5117: 4154:10.1109/ICICCS48265.2020.9121004 4043:. Retrieved on November 7, 2007. 3969:. Retrieved on November 7, 2007. 3834:. Retrieved on November 7, 2007. 2326:Concurrent programming languages 756:are independent if they satisfy 339:Amdahl's law and Gustafson's law 5390:Analysis of parallel algorithms 5027: 5009: 4989: 4961: 4933: 4907: 4871: 4849: 4816: 4785: 4754: 4745: 4713: 4669: 4643: 4619: 4572: 4553: 4523: 4490: 4433: 4417: 4397: 4368: 4348: 4316: 4273: 4232: 4212: 4192: 4133: 4089: 4046: 4024: 4015: 3996: 3972: 3950: 3938: 3922: 3906: 3897: 3855: 3810: 3801: 3762: 3753: 3684: 3663: 3638: 3614: 3589: 3563: 3519: 3498:"Thread Safety for Performance" 3489: 3464: 3439: 3416: 3370:Gustafson, John L. (May 1988). 3363: 3348: 3321: 3276: 3251: 3226: 3207:Flynn, Laurie J. (8 May 2004). 3200: 3175: 2771:. His design was funded by the 2752:, a multi-processor project at 2639:fault-tolerant computer systems 2051:. Several vendors have created 1663:distributed memory systems do. 1446:-stage pipeline can have up to 1372:processors become commonplace. 673:is fundamental in implementing 664: 318:have 10+ core processors. From 134:within a single machine, while 4656:ACONIT Computer History Museum 4426:, et al. (December 18, 2006). 3771:Parallel computer architecture 3734:IEEE Transactions on Computers 3670:Kukanov, Alexey (2008-03-04). 3645:Schwartz, David (2011-08-15). 3150: 3125: 3062: 3038: 3000: 2945: 2878:Concurrency (computer science) 2633:Fault-tolerant computer system 2387:), in dedicated APIs (such as 2315:Parallel programming languages 2041:hardware description languages 2013:Specialized parallel computers 1916:Massively parallel (computing) 1390:A canonical processor without 1300: 612: 606: 574:A graphical representation of 492: 480: 425: 419: 347:A graphical representation of 60:in which many calculations or 13: 1: 5337:Simultaneous and heterogenous 4559:R.W. Hockney, C.R. Jesshope. 3596:Preshing, Jeff (2012-06-08). 3532:. Pearson Education, Informit 3328:Brooks, Frederick P. (1996). 2939: 2408:of a sequential program by a 2112:General-purpose computing on 2039:FPGAs can be programmed with 2034:field-programmable gate array 1713:Classes of parallel computers 1650:Computer systems make use of 1382:Instruction-level parallelism 1376:Instruction-level parallelism 1120:. This requires the use of a 1075:4B: Write back to variable V 1072:4A: Write back to variable V 1012:3B: Write back to variable V 1009:3A: Write back to variable V 191: 29:Parallelization (mathematics) 6093:Uniform Function Call Syntax 5925:Category: Parallel computing 5068:10.1109/BIMNICS.2007.4610083 3647:"What is thread contention?" 2817:. In 1986, Minsky published 1910:Massively parallel computing 1334:very-large-scale integration 1022:. The programmer must use a 7: 6561:Parallel programming models 6535:Concurrent constraint logic 4698:Wilson, Gregory V. (1994). 4579:Bataille, M. (1972-04-01). 3693:"Threading for Performance" 3570:Cecil, David (2015-11-03). 3476:Microsoft Developer Network 3372:"Reevaluating Amdahl's law" 2898:Manchester dataflow machine 2865: 2815:massively parallel computer 2795:In the early 1970s, at the 2704:Compagnie des Machines Bull 2556:problems (such as found in 2338:parallel programming models 2309: 2078:AMD's decision to open its 1863:computers connected with a 1764:Simultaneous multithreading 1561: 1538:Superword level parallelism 1404:instruction per clock cycle 10: 6815: 6654:Metalinguistic abstraction 6521:Automatic mutual exclusion 5232:High-performance computing 4823:Blakeslee, Thomas (1996). 4722:"The Power of Parallelism" 4218:Maslennikov, Oleg (2002). 4115:10.3390/electronics8060694 3852:Culler et al. p. 125. 3843:Culler et al. p. 124. 3832:Carnegie Mellon University 2908:Parallel programming model 2856:George Ivanovich Gurdjieff 2754:Carnegie Mellon University 2665: 2661: 2630: 2523:Spectral methods (such as 2488:high performance computing 2476:mean time between failures 2467: 2398: 2318: 2288:processor instruction sets 2232: 2194: 2183:and others are supporting 2094: 2002: 1965: 1913: 1832: 1813: 1790: 1753:, designed for use in the 1725: 1526: 1379: 1309: 1160: 226:high performance computing 130:computers having multiple 103:Parallelism vs concurrency 82:high-performance computing 26: 6709: 6594: 6526:Choreographic programming 6496: 6312: 6254: 6211: 6114: 6105: 6045: 5987: 5978: 5914: 5866:Automatic parallelization 5858: 5720: 5560: 5510: 5502:Application checkpointing 5464: 5428: 5372: 5316: 5265: 5204: 4249:10.1109/IJCNN.1991.170708 3807:Culler et al. p. 15. 3446:Roosta, Seyed H. (2000). 3376:Communications of the ACM 3012:Old : Power is free, but 2955:Highly parallel computing 2861:Neurocluster Brain Model. 2548:Lattice Boltzmann methods 2480:Application checkpointing 2470:Application checkpointing 2464:Application checkpointing 2406:Automatic parallelization 2401:Automatic parallelization 2395:Automatic parallelization 2358:Message Passing Interface 2304:Streaming SIMD Extensions 2114:graphics processing units 1793:Symmetric multiprocessing 1787:Symmetric multiprocessing 1645:non-uniform memory access 1621:non-uniform memory access 1585:Distributed shared memory 204:. To solve a problem, an 6576:Relativistic programming 4889:. Basic Books. pp.  3433:10.1109/PGEC.1966.264565 3283:Amdahl, Gene M. (1967). 3048:. O'Reilly Media. 2013. 2913:Parallelization contract 2030:Reconfigurable computing 1861:commercial off-the-shelf 1567:Memory and communication 1151:embarrassing parallelism 1067:3B: Add 1 to variable V 1064:3A: Add 1 to variable V 1004:2B: Add 1 to variable V 1001:2A: Add 1 to variable V 92:, mainly in the form of 5881:Embarrassingly parallel 5876:Deterministic algorithm 4792:Minsky, Marvin (1986). 4761:Minsky, Marvin (1986). 4506:10.1145/2380356.2380375 4302:10.1023/A:1008005616596 4067:10.1126/science.1081623 3747:10.1109/TC.1972.5009071 3697:Develop for Performance 3691:Krauss, Kirk J (2018). 3602:Preshing on Programming 3502:Develop for Performance 3496:Krauss, Kirk J (2018). 3472:"Processes and Threads" 3359:. Elsevier. p. 61. 3297:10.1145/1465482.1465560 3182:Rabaey, Jan M. (1996). 2923:Synchronous programming 2807:started developing the 2686:Luigi Federico Menabrea 2658:off-the-shelf systems. 2558:finite element analysis 2292:Freescale Semiconductor 1978:embarrassingly parallel 1777:Temporal multithreading 1473:A canonical five-stage 1426:A canonical five-stage 1344:processor must add two 1237:Array processing (SIMT) 300:central processing unit 210:central processing unit 6586:Structured concurrency 5971:Comparison by language 5596:Associative processing 5552:Non-blocking algorithm 5358:Clustered multi-thread 5113: 5093:Listen to this article 5002:The Future of the Mind 4631:www.feb-patrimoine.com 4461:10.1006/jpdc.1998.1462 3478:. Microsoft Corp. 2018 3427:. EC-15 (5): 757–763. 2893:Loop-level parallelism 2681: 2374:remote procedure calls 2248: 2247:is a vector processor. 2109: 2071:. Specific subsets of 2057:C programming language 1936: 1847: 1624: 1482: 1435: 1399: 1329: 1083:5B: Unlock variable V 1080:5A: Unlock variable V 941:synchronization method 902: 852: 802: 643: 578: 543:before parallelization 502: 385: 352: 104: 50: 6799:Distributed computing 6551:Multitier programming 6367:Interface description 5967:Programming paradigms 5712:Hardware acceleration 5625:Superscalar processor 5615:Dataflow architecture 5212:Distributed computing 5112: 5019:(1992). "Chapter 3". 4831:. Springer. pp.  4597:10.1145/641276.641278 3031:(December 18, 2006). 2873:Computer multitasking 2744:introduced its first 2727:Burroughs Corporation 2676: 2631:Further information: 2538:Barnes–Hut simulation 2520:Sparse linear algebra 2342:algorithmic skeletons 2242: 2104: 1923: 1842: 1816:Distributed computing 1810:Distributed computing 1637:uniform memory access 1618: 1589:memory virtualization 1501:multi-core processors 1472: 1440:instruction pipelines 1425: 1389: 1319: 1312:Bit-level parallelism 1306:Bit-level parallelism 1208:Multiple data streams 903: 853: 803: 644: 573: 503: 358: 346: 304:Multi-core processors 200:has been written for 165:are the most common. 102: 94:multi-core processors 90:computer architecture 40: 6794:Concurrent computing 5591:Pipelined processing 5540:Explicit parallelism 5535:Implicit parallelism 5525:Dataflow programming 5144:More spoken articles 4677:The Computer Journal 4406:by David Padua 2011 3291:. pp. 483–485. 2832:Michael S. Gazzaniga 2828:Thomas R. Blakeslee, 2668:History of computing 2620:Finite-state machine 2604:hidden Markov models 1958:ranking, is an MPP. 1950:, the fifth fastest 1823:concurrent computing 1728:Multi-core processor 1722:Multi-core computing 1619:A logical view of a 1059:2B: Read variable V 1056:2A: Read variable V 1051:1B: Lock variable V 1048:1A: Lock variable V 996:1B: Read variable V 993:1A: Read variable V 863: 813: 763: 593: 406: 236:computer performance 218:engineering sciences 6691:Self-modifying code 6299:Probabilistic logic 6230:Functional reactive 6185:Expression-oriented 6139:Partial application 5815:Parallel Extensions 5620:Pipelined processor 4975:. New York: Wiley. 4947:. New York: Wiley. 4867:. pp. 132–161. 4865:The Integrated Mind 4795:The Society of Mind 4764:The Society of Mind 4734:on January 31, 2008 4478:on 19 November 2012 4294:1998JSPSy..19...97A 3957:Beowulf definition. 3945:What is clustering? 3398:10.1145/42411.42415 3074:Patterson, David A. 2819:The Society of Mind 2763:of the processor's 2641:, particularly via 2602:(such as detecting 2589:Dynamic programming 2569:Combinational logic 2494:Algorithmic methods 2478:usually decreases. 2418:explicitly parallel 2211:UV photolithography 1993:volunteer computing 1932:massively parallel 1889:importantly, a low- 1751:Cell microprocessor 1610:non-volatile memory 1546:technique based on 1368:architectures, did 1332:From the advent of 675:parallel algorithms 380:, even though part 151:parallel algorithms 132:processing elements 6789:Parallel computing 6604:Attribute-oriented 6377:List comprehension 6322:Algebraic modeling 6135:Anonymous function 6027:Design by contract 5997:Jackson structures 5689:Massively parallel 5667:distributed shared 5487:Cache invalidation 5451:Instruction window 5242:Manycore processor 5222:Massively parallel 5217:Parallel computing 5198:Parallel computing 5114: 4879:Gazzaniga, Michael 4857:Gazzaniga, Michael 4361:2011-07-25 at the 4329:2008-01-31 at the 4205:2016-12-25 at the 4036:2013-05-11 at the 4008:2015-01-28 at the 3962:2012-10-10 at the 3826:2008-04-14 at the 3724:(September 1972). 2989:2018-01-11 at the 2838:Robert E. Ornstein 2682: 2583:sorting algorithms 2564:Monte Carlo method 2546:problems (such as 2422:partially implicit 2249: 2223:molecular dynamics 2110: 1937: 1871:local area network 1848: 1766:(of which Intel's 1700:n-dimensional mesh 1625: 1581:distributed memory 1513:Tomasulo algorithm 1483: 1436: 1400: 1338:computer word size 1330: 1291:David A. Patterson 1229:SIMD subcategories 1187:Single data stream 898: 848: 798: 639: 579: 498: 386: 353: 202:serial computation 117:multiple CPU cores 105: 54:Parallel computing 51: 6774: 6773: 6664:Program synthesis 6556:Organic computing 6492: 6491: 6397:Non-English-based 6372:Language-oriented 6150:Purely functional 6101: 6100: 5933: 5932: 5886:Parallel slowdown 5520:Stream processing 5410:Karp–Flatt metric 5110: 5023:. pp. 72–83. 5017:Ouspenskii, Pyotr 4982:978-0-471-80572-4 4954:978-0-471-39602-4 4809:978-0-671-60740-1 4778:978-0-671-60740-1 4386:978-0-07-057064-1 4258:978-0-7803-0227-3 4163:978-1-7281-4876-2 4061:(5607): 668–669. 3780:978-1-55860-343-1 3722:Flynn, Michael J. 3457:978-0-387-98716-3 3341:978-0-201-83595-3 3193:978-0-13-178609-7 3168:978-1-55860-724-8 3093:978-1-55860-428-5 3070:Hennessy, John L. 2995:computer industry 2982:(November 2008). 2965:978-0-8053-0177-9 2933:Vector processing 2781:vector processing 2608:Bayesian networks 2606:and constructing 2554:Unstructured grid 2508:sequence analysis 2229:Vector processors 2118:computer graphics 1829:Cluster computing 1517:register renaming 1280:signal processing 1275: 1274: 1138:parallel slowdown 1087: 1086: 1016: 1015: 671:data dependencies 603: 521:is the potential 496: 460: 457: 416: 312:desktop computers 308:desktop computers 293:Tejas and Jayhawk 232:Frequency scaling 198:computer software 86:frequency scaling 70:instruction-level 16:(Redirected from 6806: 6676:by demonstration 6581:Service-oriented 6571:Process-oriented 6546:Macroprogramming 6531:Concurrent logic 6402:Page description 6392:Natural language 6362:Grammar-oriented 6289:Nondeterministic 6278:Constraint logic 6180:Point-free style 6175:Functional logic 6112: 6111: 6083:Immutable object 6002:Block-structured 5985: 5984: 5960: 5953: 5946: 5937: 5936: 5922: 5921: 5896:Software lockout 5695:Computer cluster 5630:Vector processor 5585:Array processing 5570:Flynn's taxonomy 5477:Memory coherence 5252:Computer network 5191: 5184: 5177: 5168: 5167: 5134: 5132: 5121: 5120: 5111: 5101: 5099: 5094: 5079: 5046: 5045: 5043: 5041: 5031: 5025: 5024: 5013: 5007: 5006: 4993: 4987: 4986: 4965: 4959: 4958: 4937: 4931: 4930: 4924: 4915:Ornstein, Robert 4911: 4905: 4904: 4888: 4875: 4869: 4868: 4853: 4847: 4846: 4830: 4820: 4814: 4813: 4789: 4783: 4782: 4758: 4752: 4749: 4743: 4742: 4740: 4739: 4730:. Archived from 4717: 4711: 4710: 4708: 4707: 4695: 4680: 4673: 4667: 4666: 4664: 4663: 4647: 4641: 4640: 4638: 4637: 4623: 4617: 4616: 4576: 4570: 4557: 4551: 4548: 4539: 4527: 4521: 4517: 4508: 4494: 4488: 4487: 4485: 4483: 4477: 4471:. Archived from 4446: 4437: 4431: 4421: 4415: 4401: 4395: 4394: 4372: 4366: 4352: 4346: 4343: 4334: 4320: 4314: 4313: 4277: 4271: 4270: 4236: 4230: 4216: 4210: 4196: 4190: 4187: 4176: 4175: 4137: 4131: 4130: 4117: 4093: 4087: 4086: 4050: 4044: 4028: 4022: 4019: 4013: 4000: 3994: 3993: 3991: 3990: 3976: 3970: 3954: 3948: 3942: 3936: 3926: 3920: 3910: 3904: 3901: 3895: 3892: 3883: 3880: 3871: 3870: 3868: 3859: 3853: 3850: 3844: 3841: 3835: 3814: 3808: 3805: 3799: 3798: 3792: 3784: 3766: 3760: 3757: 3751: 3750: 3730: 3718: 3712: 3711: 3709: 3708: 3699:. Archived from 3688: 3682: 3681: 3679: 3678: 3667: 3661: 3660: 3658: 3657: 3642: 3636: 3635: 3633: 3632: 3618: 3612: 3611: 3609: 3608: 3593: 3587: 3586: 3584: 3583: 3567: 3561: 3560: 3554: 3550: 3548: 3540: 3538: 3537: 3523: 3517: 3516: 3514: 3513: 3504:. Archived from 3493: 3487: 3486: 3484: 3483: 3468: 3462: 3461: 3443: 3437: 3436: 3420: 3414: 3413: 3408:. Archived from 3391: 3367: 3361: 3360: 3352: 3346: 3345: 3325: 3319: 3318: 3280: 3274: 3273: 3255: 3249: 3248: 3230: 3224: 3223: 3221: 3219: 3204: 3198: 3197: 3179: 3173: 3172: 3154: 3148: 3147: 3129: 3123: 3122: 3120: 3119: 3110:Barney, Blaise. 3107: 3098: 3097: 3066: 3060: 3059: 3042: 3036: 3023: 3017: 3004: 2998: 2976: 2970: 2969: 2949: 2710:. It utilized a 2655:error correction 2600:Graphical models 2594:Branch and bound 2235:Vector processor 2107:Tesla GPGPU card 2032:is the use of a 1904:Gigabit Ethernet 1835:Computer cluster 1734:processing units 1529:Task parallelism 1523:Task parallelism 1492: 1480: 1453: 1433: 1409: 1397: 1396:IPC = 0.2 < 1 1295:John L. Hennessy 1181:Flynn's taxonomy 1177: 1176: 1172:Flynn's taxonomy 1168:Michael J. Flynn 1163:Flynn's taxonomy 1157:Flynn's taxonomy 1118:act in synchrony 1037: 1036: 1032:critical section 1028:mutual exclusion 982: 981: 907: 905: 904: 899: 888: 887: 875: 874: 857: 855: 854: 849: 838: 837: 825: 824: 807: 805: 804: 799: 788: 787: 775: 774: 648: 646: 645: 640: 605: 604: 601: 562: 507: 505: 504: 499: 497: 495: 466: 461: 459: 458: 450: 432: 418: 417: 414: 332:operating system 325:ARM's big.LITTLE 109:computer science 78:task parallelism 21: 6814: 6813: 6809: 6808: 6807: 6805: 6804: 6803: 6779: 6778: 6775: 6770: 6712: 6705: 6596:Metaprogramming 6590: 6506: 6501: 6488: 6470:Graph rewriting 6308: 6284:Inductive logic 6264:Abductive logic 6250: 6207: 6170:Dependent types 6118: 6097: 6069:Prototype-based 6049: 6047:Object-oriented 6041: 6037:Nested function 6032:Invariant-based 5974: 5964: 5934: 5929: 5910: 5854: 5760:Coarray Fortran 5716: 5700:Beowulf cluster 5556: 5506: 5497:Synchronization 5482:Cache coherence 5472:Multiprocessing 5460: 5424: 5405:Cost efficiency 5400:Gustafson's law 5368: 5312: 5261: 5237:Multiprocessing 5227:Cloud computing 5200: 5195: 5148: 5147: 5136: 5130: 5128: 5125:This audio file 5122: 5115: 5106: 5103: 5097: 5096: 5092: 5089: 5054: 5052:Further reading 5049: 5039: 5037: 5033: 5032: 5028: 5014: 5010: 4994: 4990: 4983: 4969:Hilgard, Ernest 4966: 4962: 4955: 4941:Hilgard, Ernest 4938: 4934: 4912: 4908: 4901: 4876: 4872: 4854: 4850: 4843: 4821: 4817: 4810: 4790: 4786: 4779: 4759: 4755: 4750: 4746: 4737: 4735: 4718: 4714: 4705: 4703: 4696: 4683: 4674: 4670: 4661: 4659: 4648: 4644: 4635: 4633: 4625: 4624: 4620: 4577: 4573: 4558: 4554: 4549: 4542: 4530:Menabrea, L. F. 4528: 4524: 4518: 4511: 4495: 4491: 4481: 4479: 4475: 4444: 4438: 4434: 4424:Asanovic, Krste 4422: 4418: 4402: 4398: 4387: 4373: 4369: 4363:Wayback Machine 4353: 4349: 4344: 4337: 4331:Wayback Machine 4321: 4317: 4278: 4274: 4259: 4237: 4233: 4217: 4213: 4207:Wayback Machine 4197: 4193: 4188: 4179: 4164: 4138: 4134: 4094: 4090: 4051: 4047: 4038:Wayback Machine 4031:MPP Definition. 4029: 4025: 4020: 4016: 4010:Wayback Machine 4001: 3997: 3988: 3986: 3978: 3977: 3973: 3964:Wayback Machine 3955: 3951: 3943: 3939: 3931:, p. xix, 1–2. 3927: 3923: 3911: 3907: 3902: 3898: 3893: 3886: 3881: 3874: 3866: 3860: 3856: 3851: 3847: 3842: 3838: 3828:Wayback Machine 3819:(April 2004). " 3815: 3811: 3806: 3802: 3786: 3785: 3781: 3767: 3763: 3758: 3754: 3728: 3719: 3715: 3706: 3704: 3689: 3685: 3676: 3674: 3668: 3664: 3655: 3653: 3643: 3639: 3630: 3628: 3620: 3619: 3615: 3606: 3604: 3594: 3590: 3581: 3579: 3568: 3564: 3552: 3551: 3542: 3541: 3535: 3533: 3524: 3520: 3511: 3509: 3494: 3490: 3481: 3479: 3470: 3469: 3465: 3458: 3444: 3440: 3421: 3417: 3389:10.1.1.509.6892 3368: 3364: 3353: 3349: 3342: 3326: 3322: 3307: 3281: 3277: 3270: 3256: 3252: 3245: 3231: 3227: 3217: 3215: 3205: 3201: 3194: 3180: 3176: 3169: 3155: 3151: 3144: 3130: 3126: 3117: 3115: 3108: 3101: 3094: 3078:Larus, James R. 3067: 3063: 3056: 3044: 3043: 3039: 3026:Asanovic, Krste 3024: 3020: 3005: 3001: 2991:Wayback Machine 2977: 2973: 2966: 2950: 2946: 2942: 2937: 2918:Serializability 2868: 2810:Society of Mind 2793: 2731:crossbar switch 2723:Daniel Slotnick 2712:fork-join model 2696:Charles Babbage 2692:Analytic Engine 2671: 2664: 2651:error detection 2635: 2629: 2627:Fault tolerance 2579:Graph traversal 2544:Structured grid 2504:protein folding 2496: 2472: 2466: 2403: 2397: 2385:compute shaders 2381:compute kernels 2346:message passing 2323: 2317: 2312: 2237: 2231: 2219:RIKEN MDGRAPE-3 2199: 2193: 2099: 2093: 2027: 2019:domain-specific 2015: 2007: 2005:Cloud computing 2001: 1999:Cloud computing 1991:(BOINC). Often 1970: 1964: 1924:A cabinet from 1918: 1912: 1875:Thomas Sterling 1857:Beowulf cluster 1845:Beowulf cluster 1837: 1831: 1818: 1812: 1795: 1789: 1768:Hyper-Threading 1742:execution units 1730: 1724: 1715: 1672:crossbar switch 1656:cache coherency 1569: 1564: 1540: 1531: 1525: 1505:data dependency 1490: 1487:execution units 1478: 1451: 1431: 1407: 1395: 1384: 1378: 1314: 1308: 1303: 1284:systolic arrays 1165: 1159: 1146: 949: 926: 919: 883: 879: 870: 866: 864: 861: 860: 833: 829: 820: 816: 814: 811: 810: 783: 779: 770: 766: 764: 761: 760: 755: 746: 737: 728: 719: 710: 701: 692: 667: 600: 596: 594: 591: 590: 584:Gustafson's law 576:Gustafson's law 556: 550: 520: 470: 465: 449: 436: 431: 413: 409: 407: 404: 403: 388:Optimally, the 341: 196:Traditionally, 194: 171:synchronization 163:race conditions 128:multi-processor 35: 32: 23: 22: 18:Parallelization 15: 12: 11: 5: 6812: 6802: 6801: 6796: 6791: 6772: 6771: 6769: 6768: 6763: 6758: 6753: 6748: 6743: 6738: 6733: 6728: 6723: 6717: 6715: 6707: 6706: 6704: 6703: 6698: 6693: 6688: 6683: 6661: 6656: 6651: 6641: 6636: 6631: 6626: 6621: 6616: 6606: 6600: 6598: 6592: 6591: 6589: 6588: 6583: 6578: 6573: 6568: 6563: 6558: 6553: 6548: 6543: 6538: 6528: 6523: 6518: 6512: 6510: 6494: 6493: 6490: 6489: 6487: 6486: 6481: 6466:Transformation 6463: 6458: 6453: 6448: 6443: 6438: 6433: 6428: 6423: 6418: 6413: 6404: 6399: 6394: 6389: 6384: 6379: 6374: 6369: 6364: 6359: 6354: 6352:Differentiable 6349: 6339: 6332:Automata-based 6329: 6324: 6318: 6316: 6310: 6309: 6307: 6306: 6301: 6296: 6291: 6286: 6281: 6271: 6266: 6260: 6258: 6252: 6251: 6249: 6248: 6243: 6238: 6233: 6223: 6217: 6215: 6209: 6208: 6206: 6205: 6199:Function-level 6196: 6187: 6182: 6177: 6172: 6167: 6162: 6157: 6152: 6147: 6142: 6132: 6126: 6124: 6109: 6103: 6102: 6099: 6098: 6096: 6095: 6090: 6085: 6080: 6075: 6061: 6059: 6043: 6042: 6040: 6039: 6034: 6029: 6024: 6019: 6014: 6012:Non-structured 6009: 6004: 5999: 5993: 5991: 5982: 5976: 5975: 5963: 5962: 5955: 5948: 5940: 5931: 5930: 5928: 5927: 5915: 5912: 5911: 5909: 5908: 5903: 5898: 5893: 5891:Race condition 5888: 5883: 5878: 5873: 5868: 5862: 5860: 5856: 5855: 5853: 5852: 5847: 5842: 5837: 5832: 5827: 5822: 5817: 5812: 5807: 5802: 5797: 5792: 5787: 5782: 5777: 5772: 5767: 5762: 5757: 5752: 5747: 5742: 5737: 5732: 5726: 5724: 5718: 5717: 5715: 5714: 5709: 5704: 5703: 5702: 5692: 5686: 5685: 5684: 5679: 5674: 5669: 5664: 5659: 5649: 5648: 5647: 5642: 5635:Multiprocessor 5632: 5627: 5622: 5617: 5612: 5611: 5610: 5605: 5600: 5599: 5598: 5593: 5588: 5577: 5566: 5564: 5558: 5557: 5555: 5554: 5549: 5548: 5547: 5542: 5537: 5527: 5522: 5516: 5514: 5508: 5507: 5505: 5504: 5499: 5494: 5489: 5484: 5479: 5474: 5468: 5466: 5462: 5461: 5459: 5458: 5453: 5448: 5443: 5438: 5432: 5430: 5426: 5425: 5423: 5422: 5417: 5412: 5407: 5402: 5397: 5392: 5387: 5382: 5376: 5374: 5370: 5369: 5367: 5366: 5364:Hardware scout 5361: 5355: 5350: 5345: 5339: 5334: 5328: 5322: 5320: 5318:Multithreading 5314: 5313: 5311: 5310: 5305: 5300: 5295: 5290: 5285: 5280: 5275: 5269: 5267: 5263: 5262: 5260: 5259: 5257:Systolic array 5254: 5249: 5244: 5239: 5234: 5229: 5224: 5219: 5214: 5208: 5206: 5202: 5201: 5194: 5193: 5186: 5179: 5171: 5165: 5164: 5159: 5154: 5137: 5123: 5116: 5104: 5091: 5090: 5088: 5087:External links 5085: 5084: 5083: 5080: 5053: 5050: 5048: 5047: 5026: 5008: 4988: 4981: 4960: 4953: 4932: 4906: 4899: 4870: 4861:LeDoux, Joseph 4848: 4841: 4815: 4808: 4784: 4777: 4753: 4744: 4712: 4681: 4668: 4642: 4618: 4571: 4552: 4540: 4522: 4509: 4489: 4432: 4416: 4396: 4385: 4367: 4347: 4335: 4315: 4272: 4257: 4231: 4211: 4191: 4177: 4162: 4132: 4088: 4045: 4023: 4014: 4003:"Interconnect" 3995: 3984:www.top500.org 3971: 3949: 3937: 3921: 3905: 3896: 3884: 3872: 3854: 3845: 3836: 3809: 3800: 3779: 3761: 3752: 3741:(9): 948–960. 3713: 3683: 3662: 3637: 3613: 3588: 3562: 3553:|website= 3518: 3488: 3463: 3456: 3438: 3415: 3412:on 2007-09-27. 3382:(5): 532–533. 3362: 3347: 3340: 3320: 3305: 3275: 3268: 3250: 3243: 3225: 3213:New York Times 3199: 3192: 3174: 3167: 3149: 3142: 3124: 3099: 3092: 3061: 3054: 3037: 3018: 2999: 2971: 2964: 2943: 2941: 2938: 2936: 2935: 2930: 2925: 2920: 2915: 2910: 2905: 2900: 2895: 2890: 2885: 2880: 2875: 2869: 2867: 2864: 2863: 2862: 2859: 2853: 2847: 2844:Ernest Hilgard 2841: 2835: 2829: 2805:Seymour Papert 2792: 2789: 2690:Sketch of the 2663: 2660: 2628: 2625: 2624: 2623: 2617: 2611: 2597: 2591: 2586: 2576: 2566: 2561: 2551: 2541: 2534:-body problems 2528: 2521: 2518: 2516:linear algebra 2500:bioinformatics 2495: 2492: 2468:Main article: 2465: 2462: 2399:Main article: 2396: 2393: 2362:future concept 2319:Main article: 2316: 2313: 2311: 2308: 2278:floating-point 2233:Main article: 2230: 2227: 2195:Main article: 2192: 2189: 2122:linear algebra 2095:Main article: 2092: 2089: 2080:HyperTransport 2026: 2023: 2014: 2011: 2003:Main article: 2000: 1997: 1968:Grid computing 1966:Main article: 1963: 1962:Grid computing 1960: 1914:Main article: 1911: 1908: 1853:load balancing 1833:Main article: 1830: 1827: 1814:Main article: 1811: 1808: 1803:Bus contention 1791:Main article: 1788: 1785: 1726:Main article: 1723: 1720: 1714: 1711: 1594:supercomputers 1568: 1565: 1563: 1560: 1548:loop unrolling 1539: 1536: 1527:Main article: 1524: 1521: 1507:between them. 1479:IPC = 2 > 1 1380:Main article: 1377: 1374: 1359:Historically, 1320:Taiwania 3 of 1310:Main article: 1307: 1304: 1302: 1299: 1273: 1272: 1271: 1270: 1265: 1257: 1256: 1252: 1251: 1250: 1249: 1244: 1239: 1231: 1230: 1226: 1225: 1224: 1223: 1218: 1210: 1209: 1205: 1204: 1203: 1202: 1197: 1189: 1188: 1184: 1183: 1161:Main article: 1158: 1155: 1145: 1142: 1085: 1084: 1081: 1077: 1076: 1073: 1069: 1068: 1065: 1061: 1060: 1057: 1053: 1052: 1049: 1045: 1044: 1041: 1020:race condition 1014: 1013: 1010: 1006: 1005: 1002: 998: 997: 994: 990: 989: 986: 948: 945: 939:or some other 924: 917: 909: 908: 897: 894: 891: 886: 882: 878: 873: 869: 858: 847: 844: 841: 836: 832: 828: 823: 819: 808: 797: 794: 791: 786: 782: 778: 773: 769: 751: 742: 733: 724: 715: 706: 697: 688: 669:Understanding 666: 663: 650: 649: 638: 635: 632: 629: 626: 623: 620: 617: 614: 611: 608: 599: 554: 547: 546: 536: 530: 518: 509: 508: 494: 491: 488: 485: 482: 479: 476: 473: 469: 464: 456: 453: 448: 445: 442: 439: 435: 430: 427: 424: 421: 412: 340: 337: 193: 190: 176:A theoretical 45:such as IBM's 43:supercomputers 33: 9: 6: 4: 3: 2: 6811: 6800: 6797: 6795: 6792: 6790: 6787: 6786: 6784: 6777: 6767: 6764: 6762: 6759: 6757: 6754: 6752: 6749: 6747: 6744: 6742: 6739: 6737: 6736:Data-oriented 6734: 6732: 6729: 6727: 6724: 6722: 6719: 6718: 6716: 6714: 6708: 6702: 6699: 6697: 6694: 6692: 6689: 6687: 6684: 6681: 6677: 6673: 6669: 6665: 6662: 6660: 6657: 6655: 6652: 6649: 6645: 6642: 6640: 6637: 6635: 6634:Homoiconicity 6632: 6630: 6627: 6625: 6622: 6620: 6617: 6614: 6610: 6607: 6605: 6602: 6601: 6599: 6597: 6593: 6587: 6584: 6582: 6579: 6577: 6574: 6572: 6569: 6567: 6564: 6562: 6559: 6557: 6554: 6552: 6549: 6547: 6544: 6542: 6541:Concurrent OO 6539: 6536: 6532: 6529: 6527: 6524: 6522: 6519: 6517: 6514: 6513: 6511: 6509: 6504: 6499: 6495: 6485: 6482: 6479: 6475: 6471: 6467: 6464: 6462: 6459: 6457: 6454: 6452: 6449: 6447: 6444: 6442: 6439: 6437: 6436:Set-theoretic 6434: 6432: 6429: 6427: 6424: 6422: 6419: 6417: 6416:Probabilistic 6414: 6412: 6408: 6405: 6403: 6400: 6398: 6395: 6393: 6390: 6388: 6385: 6383: 6380: 6378: 6375: 6373: 6370: 6368: 6365: 6363: 6360: 6358: 6355: 6353: 6350: 6347: 6343: 6340: 6337: 6333: 6330: 6328: 6325: 6323: 6320: 6319: 6317: 6315: 6311: 6305: 6302: 6300: 6297: 6295: 6292: 6290: 6287: 6285: 6282: 6279: 6275: 6272: 6270: 6267: 6265: 6262: 6261: 6259: 6257: 6253: 6247: 6244: 6242: 6239: 6237: 6234: 6231: 6227: 6224: 6222: 6219: 6218: 6216: 6214: 6210: 6204: 6200: 6197: 6195: 6194:Concatenative 6191: 6188: 6186: 6183: 6181: 6178: 6176: 6173: 6171: 6168: 6166: 6163: 6161: 6158: 6156: 6153: 6151: 6148: 6146: 6143: 6140: 6136: 6133: 6131: 6128: 6127: 6125: 6122: 6117: 6113: 6110: 6108: 6104: 6094: 6091: 6089: 6086: 6084: 6081: 6079: 6076: 6074: 6070: 6066: 6063: 6062: 6060: 6057: 6053: 6048: 6044: 6038: 6035: 6033: 6030: 6028: 6025: 6023: 6020: 6018: 6015: 6013: 6010: 6008: 6005: 6003: 6000: 5998: 5995: 5994: 5992: 5990: 5986: 5983: 5981: 5977: 5972: 5968: 5961: 5956: 5954: 5949: 5947: 5942: 5941: 5938: 5926: 5917: 5916: 5913: 5907: 5904: 5902: 5899: 5897: 5894: 5892: 5889: 5887: 5884: 5882: 5879: 5877: 5874: 5872: 5869: 5867: 5864: 5863: 5861: 5857: 5851: 5848: 5846: 5843: 5841: 5838: 5836: 5833: 5831: 5828: 5826: 5823: 5821: 5818: 5816: 5813: 5811: 5808: 5806: 5803: 5801: 5798: 5796: 5793: 5791: 5788: 5786: 5783: 5781: 5780:Global Arrays 5778: 5776: 5773: 5771: 5768: 5766: 5763: 5761: 5758: 5756: 5753: 5751: 5748: 5746: 5743: 5741: 5738: 5736: 5733: 5731: 5728: 5727: 5725: 5723: 5719: 5713: 5710: 5708: 5707:Grid computer 5705: 5701: 5698: 5697: 5696: 5693: 5690: 5687: 5683: 5680: 5678: 5675: 5673: 5670: 5668: 5665: 5663: 5660: 5658: 5655: 5654: 5653: 5650: 5646: 5643: 5641: 5638: 5637: 5636: 5633: 5631: 5628: 5626: 5623: 5621: 5618: 5616: 5613: 5609: 5606: 5604: 5601: 5597: 5594: 5592: 5589: 5586: 5583: 5582: 5581: 5578: 5576: 5573: 5572: 5571: 5568: 5567: 5565: 5563: 5559: 5553: 5550: 5546: 5543: 5541: 5538: 5536: 5533: 5532: 5531: 5528: 5526: 5523: 5521: 5518: 5517: 5515: 5513: 5509: 5503: 5500: 5498: 5495: 5493: 5490: 5488: 5485: 5483: 5480: 5478: 5475: 5473: 5470: 5469: 5467: 5463: 5457: 5454: 5452: 5449: 5447: 5444: 5442: 5439: 5437: 5434: 5433: 5431: 5427: 5421: 5418: 5416: 5413: 5411: 5408: 5406: 5403: 5401: 5398: 5396: 5393: 5391: 5388: 5386: 5383: 5381: 5378: 5377: 5375: 5371: 5365: 5362: 5359: 5356: 5354: 5351: 5349: 5346: 5343: 5340: 5338: 5335: 5332: 5329: 5327: 5324: 5323: 5321: 5319: 5315: 5309: 5306: 5304: 5301: 5299: 5296: 5294: 5291: 5289: 5286: 5284: 5281: 5279: 5276: 5274: 5271: 5270: 5268: 5264: 5258: 5255: 5253: 5250: 5248: 5245: 5243: 5240: 5238: 5235: 5233: 5230: 5228: 5225: 5223: 5220: 5218: 5215: 5213: 5210: 5209: 5207: 5203: 5199: 5192: 5187: 5185: 5180: 5178: 5173: 5172: 5169: 5163: 5160: 5158: 5155: 5153: 5150: 5149: 5145: 5141: 5126: 5081: 5077: 5073: 5069: 5065: 5061: 5056: 5055: 5036: 5030: 5022: 5018: 5012: 5004: 5003: 4998: 4992: 4984: 4978: 4974: 4970: 4964: 4956: 4950: 4946: 4942: 4936: 4928: 4923: 4922: 4916: 4910: 4902: 4900:9780465078509 4896: 4892: 4887: 4886: 4880: 4874: 4866: 4862: 4858: 4852: 4844: 4842:9780306452628 4838: 4834: 4829: 4828: 4819: 4811: 4805: 4801: 4797: 4796: 4788: 4780: 4774: 4770: 4766: 4765: 4757: 4748: 4733: 4729: 4728: 4727:Computerworld 4723: 4716: 4701: 4694: 4692: 4690: 4688: 4686: 4678: 4672: 4657: 4653: 4646: 4632: 4628: 4622: 4614: 4610: 4606: 4602: 4598: 4594: 4590: 4586: 4582: 4575: 4568: 4564: 4563: 4556: 4547: 4545: 4537: 4536: 4531: 4526: 4516: 4514: 4507: 4503: 4499: 4493: 4474: 4470: 4466: 4462: 4458: 4454: 4450: 4443: 4436: 4429: 4425: 4420: 4413: 4409: 4405: 4400: 4393: 4388: 4382: 4378: 4371: 4364: 4360: 4357: 4351: 4342: 4340: 4332: 4328: 4325: 4319: 4311: 4307: 4303: 4299: 4295: 4291: 4288:(2): 97–113. 4287: 4283: 4276: 4268: 4264: 4260: 4254: 4250: 4246: 4242: 4235: 4228: 4224: 4221: 4215: 4208: 4204: 4201: 4195: 4186: 4184: 4182: 4173: 4169: 4165: 4159: 4155: 4151: 4147: 4143: 4136: 4129: 4125: 4121: 4116: 4111: 4107: 4103: 4099: 4092: 4084: 4080: 4076: 4072: 4068: 4064: 4060: 4056: 4049: 4042: 4039: 4035: 4032: 4027: 4018: 4011: 4007: 4004: 3999: 3985: 3981: 3975: 3968: 3965: 3961: 3958: 3953: 3946: 3941: 3934: 3930: 3925: 3918: 3917:Keidar (2008) 3914: 3909: 3900: 3891: 3889: 3879: 3877: 3865: 3858: 3849: 3840: 3833: 3829: 3825: 3822: 3818: 3813: 3804: 3796: 3790: 3782: 3776: 3772: 3765: 3756: 3748: 3744: 3740: 3736: 3735: 3727: 3723: 3717: 3703:on 2018-05-13 3702: 3698: 3694: 3687: 3673: 3666: 3652: 3651:StackOverflow 3648: 3641: 3627: 3626:StackOverflow 3623: 3617: 3603: 3599: 3592: 3577: 3573: 3566: 3558: 3546: 3531: 3530: 3522: 3508:on 2018-05-13 3507: 3503: 3499: 3492: 3477: 3473: 3467: 3459: 3453: 3449: 3442: 3434: 3430: 3426: 3419: 3411: 3407: 3403: 3399: 3395: 3390: 3385: 3381: 3377: 3373: 3366: 3358: 3351: 3343: 3337: 3333: 3332: 3324: 3316: 3312: 3308: 3306:9780805301779 3302: 3298: 3294: 3290: 3286: 3279: 3271: 3269:9783642378010 3265: 3261: 3254: 3246: 3244:9783642378010 3240: 3236: 3229: 3214: 3210: 3203: 3195: 3189: 3185: 3178: 3170: 3164: 3160: 3153: 3145: 3143:9783642378010 3139: 3135: 3128: 3113: 3106: 3104: 3095: 3089: 3085: 3084: 3079: 3075: 3071: 3065: 3057: 3055:9781449335922 3051: 3047: 3041: 3034: 3030: 3027: 3022: 3015: 3011: 3008: 3003: 2996: 2992: 2988: 2985: 2981: 2975: 2967: 2961: 2957: 2956: 2948: 2944: 2934: 2931: 2929: 2926: 2924: 2921: 2919: 2916: 2914: 2911: 2909: 2906: 2904: 2901: 2899: 2896: 2894: 2891: 2889: 2886: 2884: 2881: 2879: 2876: 2874: 2871: 2870: 2860: 2857: 2854: 2851: 2848: 2845: 2842: 2839: 2836: 2833: 2830: 2827: 2826: 2825: 2822: 2820: 2816: 2812: 2811: 2806: 2802: 2801:Marvin Minsky 2798: 2788: 2786: 2782: 2778: 2774: 2770: 2766: 2762: 2757: 2755: 2751: 2747: 2743: 2738: 2736: 2732: 2728: 2724: 2720: 2715: 2713: 2709: 2705: 2700: 2698: 2697: 2693: 2687: 2679: 2675: 2669: 2659: 2656: 2652: 2648: 2644: 2640: 2634: 2621: 2618: 2615: 2612: 2609: 2605: 2601: 2598: 2595: 2592: 2590: 2587: 2584: 2580: 2577: 2574: 2570: 2567: 2565: 2562: 2559: 2555: 2552: 2549: 2545: 2542: 2539: 2535: 2533: 2529: 2526: 2522: 2519: 2517: 2513: 2512: 2511: 2509: 2505: 2501: 2491: 2489: 2485: 2481: 2477: 2471: 2461: 2459: 2455: 2451: 2447: 2443: 2439: 2435: 2431: 2427: 2423: 2420:or (at best) 2419: 2414: 2411: 2407: 2402: 2392: 2390: 2386: 2382: 2377: 2375: 2370: 2365: 2363: 2359: 2355: 2351: 2350:POSIX Threads 2347: 2343: 2339: 2335: 2331: 2327: 2322: 2307: 2305: 2301: 2297: 2293: 2289: 2285: 2281: 2279: 2275: 2271: 2267: 2263: 2259: 2255: 2246: 2241: 2236: 2226: 2224: 2220: 2216: 2212: 2206: 2204: 2198: 2188: 2186: 2182: 2178: 2174: 2170: 2166: 2162: 2158: 2154: 2150: 2146: 2142: 2138: 2134: 2128: 2126: 2123: 2119: 2115: 2108: 2103: 2098: 2088: 2086: 2081: 2076: 2074: 2070: 2066: 2062: 2058: 2054: 2050: 2046: 2042: 2037: 2035: 2031: 2022: 2020: 2010: 2006: 1996: 1994: 1990: 1986: 1981: 1979: 1975: 1969: 1959: 1957: 1953: 1952:supercomputer 1949: 1945: 1941: 1935: 1934:supercomputer 1931: 1927: 1922: 1917: 1907: 1905: 1901: 1897: 1892: 1886: 1884: 1881:. 87% of all 1880: 1879:Donald Becker 1876: 1872: 1869: 1866: 1862: 1858: 1854: 1846: 1841: 1836: 1826: 1824: 1817: 1807: 1804: 1800: 1794: 1784: 1782: 1778: 1774: 1769: 1765: 1761: 1759: 1758:PlayStation 3 1756: 1752: 1748: 1743: 1739: 1735: 1729: 1719: 1710: 1708: 1703: 1701: 1697: 1693: 1689: 1685: 1681: 1677: 1673: 1669: 1664: 1661: 1657: 1653: 1648: 1646: 1642: 1641:shared memory 1638: 1635:are known as 1634: 1630: 1622: 1617: 1613: 1611: 1607: 1603: 1599: 1595: 1590: 1586: 1582: 1578: 1577:address space 1574: 1573:shared memory 1559: 1557: 1553: 1549: 1545: 1544:vectorization 1535: 1530: 1520: 1518: 1514: 1510: 1509:Scoreboarding 1506: 1502: 1498: 1497: 1488: 1476: 1471: 1467: 1465: 1461: 1457: 1449: 1445: 1441: 1429: 1424: 1420: 1417: 1413: 1405: 1393: 1388: 1383: 1373: 1371: 1367: 1362: 1357: 1354: 1350: 1347: 1343: 1339: 1335: 1327: 1323: 1318: 1313: 1298: 1296: 1292: 1289:According to 1287: 1285: 1281: 1269: 1266: 1264: 1261: 1260: 1259: 1258: 1254: 1253: 1248: 1245: 1243: 1240: 1238: 1235: 1234: 1233: 1232: 1228: 1227: 1222: 1219: 1217: 1214: 1213: 1212: 1211: 1207: 1206: 1201: 1198: 1196: 1193: 1192: 1191: 1190: 1186: 1185: 1182: 1179: 1178: 1175: 1173: 1169: 1164: 1154: 1152: 1141: 1139: 1133: 1131: 1127: 1123: 1119: 1114: 1111: 1107: 1103: 1098: 1096: 1092: 1082: 1079: 1078: 1074: 1071: 1070: 1066: 1063: 1062: 1058: 1055: 1054: 1050: 1047: 1046: 1042: 1039: 1038: 1035: 1033: 1029: 1025: 1021: 1011: 1008: 1007: 1003: 1000: 999: 995: 992: 991: 987: 984: 983: 980: 978: 974: 970: 967:access to an 966: 962: 958: 954: 944: 942: 938: 934: 929: 923: 916: 913: 895: 889: 884: 880: 876: 871: 867: 859: 845: 839: 834: 830: 826: 821: 817: 809: 795: 789: 784: 780: 776: 771: 767: 759: 758: 757: 754: 750: 745: 741: 736: 732: 727: 723: 718: 714: 709: 705: 700: 696: 691: 687: 682: 680: 679:critical path 676: 672: 662: 660: 656: 636: 633: 630: 627: 624: 621: 618: 615: 609: 597: 589: 588: 587: 585: 577: 572: 568: 566: 560: 553: 544: 540: 537: 534: 531: 528: 524: 517: 514: 513: 512: 489: 486: 483: 477: 474: 471: 467: 462: 454: 451: 446: 443: 440: 437: 433: 428: 422: 410: 402: 401: 400: 399: 394: 391: 383: 379: 375: 370: 366: 362: 357: 350: 345: 336: 333: 328: 326: 321: 317: 313: 309: 305: 301: 296: 294: 290: 286: 282: 278: 274: 270: 266: 262: 258: 254: 250: 246: 245:compute-bound 241: 237: 233: 229: 227: 223: 219: 213: 211: 207: 203: 199: 189: 187: 183: 179: 174: 172: 168: 167:Communication 164: 160: 159:software bugs 156: 152: 147: 145: 141: 137: 133: 129: 124: 122: 118: 114: 110: 101: 97: 95: 91: 87: 83: 79: 75: 71: 67: 63: 59: 56:is a type of 55: 48: 44: 39: 30: 19: 6776: 6741:Event-driven 6507: 6145:Higher-order 6073:Object-based 5465:Coordination 5395:Amdahl's law 5331:Simultaneous 5216: 5197: 5059: 5038:. Retrieved 5029: 5020: 5011: 5001: 4997:Kaku, Michio 4991: 4972: 4963: 4944: 4935: 4920: 4909: 4884: 4873: 4864: 4851: 4826: 4818: 4794: 4787: 4763: 4756: 4747: 4736:. Retrieved 4732:the original 4725: 4715: 4704:. Retrieved 4676: 4671: 4660:. Retrieved 4655: 4645: 4634:. Retrieved 4630: 4621: 4591:(2): 10–15. 4588: 4584: 4574: 4566: 4561: 4555: 4534: 4525: 4497: 4492: 4480:. Retrieved 4473:the original 4452: 4448: 4435: 4419: 4403: 4399: 4390: 4376: 4370: 4350: 4318: 4285: 4281: 4275: 4240: 4234: 4229:p. 272. 4226: 4222: 4214: 4194: 4145: 4135: 4127: 4105: 4101: 4091: 4058: 4054: 4048: 4040: 4026: 4017: 3998: 3987:. Retrieved 3983: 3974: 3966: 3952: 3940: 3933:Peleg (2000) 3929:Lynch (1996) 3924: 3913:Ghosh (2007) 3908: 3899: 3857: 3848: 3839: 3812: 3803: 3770: 3764: 3755: 3738: 3732: 3716: 3705:. Retrieved 3701:the original 3696: 3686: 3675:. Retrieved 3665: 3654:. Retrieved 3650: 3640: 3629:. Retrieved 3625: 3616: 3605:. Retrieved 3601: 3591: 3580:. Retrieved 3575: 3565: 3534:. Retrieved 3528: 3521: 3510:. Retrieved 3506:the original 3501: 3491: 3480:. Retrieved 3475: 3466: 3447: 3441: 3424: 3418: 3410:the original 3379: 3375: 3365: 3356: 3350: 3330: 3323: 3288: 3278: 3259: 3253: 3234: 3228: 3216:. Retrieved 3212: 3202: 3183: 3177: 3158: 3152: 3133: 3127: 3116:. Retrieved 3082: 3064: 3045: 3040: 3028: 3021: 3009: 3002: 2979: 2974: 2954: 2947: 2823: 2818: 2808: 2794: 2773:US Air Force 2765:control unit 2758: 2739: 2735:Amdahl's law 2716: 2701: 2694:Invented by 2689: 2683: 2636: 2531: 2497: 2473: 2415: 2404: 2378: 2366: 2324: 2282: 2273: 2269: 2265: 2261: 2257: 2253: 2250: 2225:simulation. 2207: 2200: 2161:Tesla series 2129: 2127:operations. 2111: 2077: 2038: 2028: 2016: 2008: 1982: 1971: 1942: 1938: 1887: 1849: 1819: 1796: 1780: 1772: 1762: 1731: 1716: 1704: 1670:) memory, a 1665: 1660:Bus snooping 1649: 1626: 1606:burst buffer 1570: 1541: 1532: 1494: 1484: 1455: 1447: 1443: 1437: 1411: 1401: 1358: 1331: 1288: 1276: 1166: 1147: 1134: 1115: 1099: 1088: 1017: 965:synchronized 950: 930: 927: 920: 914: 910: 752: 748: 743: 739: 734: 730: 725: 721: 716: 712: 707: 703: 698: 694: 689: 685: 683: 668: 665:Dependencies 658: 654: 651: 580: 564: 558: 557:< 1/(1 - 551: 548: 542: 538: 532: 515: 510: 398:Amdahl's law 395: 387: 381: 377: 373: 368: 364: 360: 349:Amdahl's law 329: 297: 284: 276: 268: 264: 260: 256: 252: 248: 230: 214: 195: 186:Amdahl's law 175: 148: 125: 112: 106: 53: 52: 6751:Intentional 6731:Data-driven 6713:of concerns 6672:Inferential 6659:Multi-stage 6639:Interactive 6516:Actor-based 6503:distributed 6446:Stack-based 6246:Synchronous 6203:Value-level 6190:Applicative 6107:Declarative 6065:Class-based 5901:Scalability 5662:distributed 5545:Concurrency 5512:Programming 5353:Cooperative 5342:Speculative 5278:Instruction 4925:. pp.  4102:Electronics 4041:PC Magazine 3967:PC Magazine 3578:. AspenCore 3014:transistors 2850:Michio Kaku 2432:, Parallel 2215:Moore's law 1948:Blue Gene/L 1930:Blue Gene/L 1738:superscalar 1674:, a shared 1668:multiplexed 1556:inline code 1552:parallelism 1496:superscalar 1301:Granularity 1110:atomic lock 1095:reliability 1026:to provide 320:Moore's law 273:capacitance 222:meteorology 178:upper bound 161:, of which 113:parallelism 58:computation 47:Blue Gene/P 6783:Categories 6726:Components 6711:Separation 6686:Reflective 6680:by example 6624:Extensible 6498:Concurrent 6474:Production 6461:Templating 6441:Simulation 6426:Scientific 6346:Spacecraft 6274:Constraint 6269:Answer set 6221:Flow-based 6121:comparison 6116:Functional 6088:Persistent 6052:comparison 6017:Procedural 5989:Structured 5980:Imperative 5906:Starvation 5645:asymmetric 5380:PRAM model 5348:Preemptive 5140:Audio help 5131:2013-08-21 4738:2008-01-08 4706:2008-01-08 4662:2023-08-14 4636:2023-08-14 4482:26 October 4412:0387097651 4227:2328/2002: 4108:(6): 694. 3989:2018-08-05 3817:Patt, Yale 3707:2018-05-10 3677:2015-02-15 3656:2018-05-10 3631:2018-05-10 3607:2018-05-10 3582:2018-05-10 3536:2018-05-10 3512:2018-05-10 3482:2018-05-10 3118:2007-11-09 2978:S.V. Adve 2940:References 2928:Transputer 2761:gate delay 2719:John Cocke 2647:redundancy 2622:simulation 2426:directives 2153:PeakStream 2145:Stream SDK 1985:middleware 1980:problems. 1900:InfiniBand 1682:including 1680:topologies 1602:Infiniband 1491:IPC > 1 1416:re-ordered 1408:IPC < 1 1102:non-atomic 1091:locked out 933:semaphores 220:, such as 192:Background 155:sequential 6613:Inductive 6609:Automatic 6431:Scripting 6130:Recursive 5640:symmetric 5385:PEM model 5062:: 66–69. 4605:0163-5964 4500:, 83–92. 4172:219990653 4124:2079-9292 3915:, p. 10. 3789:cite book 3555:ignored ( 3545:cite book 3384:CiteSeerX 3315:195607370 2777:ILLIAC IV 2742:Honeywell 2740:In 1969, 2702:In 1957, 2678:ILLIAC IV 2614:HBJ model 2581:(such as 2571:(such as 2536:(such as 2484:core dump 2450:Mitrion-C 2438:SequenceL 2340:(such as 2330:libraries 2157:RapidMind 2105:Nvidia's 2065:Impulse C 2061:Mitrion-C 1783:threads. 1775:threads. 1696:hypercube 1633:bandwidth 1475:pipelined 1464:Pentium 4 1428:pipelined 1412:subscalar 1353:carry bit 1126:semaphore 1043:Thread B 1040:Thread A 988:Thread B 985:Thread A 971:or other 961:processes 893:∅ 877:∩ 843:∅ 827:∩ 793:∅ 777:∩ 622:− 487:− 441:− 206:algorithm 66:bit-level 62:processes 6766:Subjects 6756:Literate 6746:Features 6701:Template 6696:Symbolic 6668:Bayesian 6648:Hygienic 6508:parallel 6387:Modeling 6382:Low-code 6357:End-user 6294:Ontology 6226:Reactive 6213:Dataflow 5871:Deadlock 5859:Problems 5825:pthreads 5805:OpenHMPP 5730:Ateji PX 5691:computer 5562:Hardware 5429:Elements 5415:Slowdown 5326:Temporal 5308:Pipeline 5142: Â· 5076:15185219 5040:July 22, 4999:(2014). 4971:(1986). 4943:(1977). 4917:(1992). 4881:(1985). 4863:(1978). 4613:34642285 4532:(1842). 4469:1903/835 4455:: 1–23. 4414:page 265 4359:Archived 4327:Archived 4267:61094111 4203:Archived 4083:60622095 4075:12560537 4034:Archived 4006:Archived 3960:Archived 3824:Archived 3576:Embedded 3406:33937392 3080:(1999). 3007:Asanovic 2987:Archived 2903:Manycore 2866:See also 2708:Gamma 60 2688:and his 2643:lockstep 2442:System C 2410:compiler 2369:OpenHMPP 2310:Software 2264:, where 2201:Several 2149:BrookGPU 2069:Handel-C 2053:C to HDL 2043:such as 1974:Internet 1868:Ethernet 1781:multiple 1773:multiple 1718:common. 1562:Hardware 1511:and the 1392:pipeline 1349:integers 1328:research 1326:COVID-19 1255:See also 1106:deadlock 977:variable 973:resource 937:barriers 314:, while 267:, where 182:speed-up 136:clusters 6721:Aspects 6629:Generic 6619:Dynamic 6478:Pattern 6456:Tactile 6421:Quantum 6411:filters 6342:Command 6241:Streams 6236:Signals 6007:Modular 5830:RaftLib 5810:OpenACC 5785:GPUOpen 5775:C++ AMP 5750:Charm++ 5492:Barrier 5436:Process 5420:Speedup 5205:General 5129: ( 5100:minutes 4310:2976028 4290:Bibcode 4055:Science 3935:, p. 1. 2746:Multics 2662:History 2596:methods 2458:Verilog 2434:Haskell 2306:(SSE). 2296:AltiVec 2073:SystemC 2049:Verilog 1896:Myrinet 1891:latency 1707:routing 1629:latency 1452:IPC = 1 1432:IPC = 1 1122:barrier 953:threads 602:latency 555:latency 527:latency 523:speedup 519:latency 415:latency 390:speedup 367:. Part 316:servers 281:voltage 271:is the 240:runtime 180:on the 121:threads 6484:Visual 6451:System 6336:Action 6160:Strict 5923:  5800:OpenCL 5795:OpenMP 5740:Chapel 5657:shared 5652:Memory 5587:(SIMT) 5530:Models 5441:Thread 5373:Theory 5344:(SpMT) 5298:Memory 5283:Thread 5266:Levels 5074:  4979:  4951:  4897:  4839:  4806:  4775:  4611:  4603:  4410:  4383:  4308:  4265:  4255:  4170:  4160:  4122:  4081:  4073:  3777:  3454:  3404:  3386:  3338:  3313:  3303:  3266:  3241:  3218:5 June 3190:  3165:  3140:  3090:  3052:  3029:et al. 3010:et al. 2980:et al. 2962:  2785:Cray-1 2514:Dense 2456:, and 2389:OpenCL 2354:OpenMP 2336:, and 2272:, and 2245:Cray-1 2185:OpenCL 2181:Nvidia 2165:OpenCL 2155:, and 2133:Nvidia 2125:matrix 2085:socket 2067:, and 1956:TOP500 1883:Top500 1865:TCP/IP 1652:caches 1579:), or 1456:scalar 1370:64-bit 1366:x86-64 1346:16-bit 1322:Taiwan 969:object 957:fibers 711:, let 582:case, 549:Since 511:where 283:, and 142:, and 76:, and 41:Large 6761:Roles 6644:Macro 6407:Pipes 6327:Array 6304:Query 6256:Logic 6165:GADTs 6155:Total 6078:Agent 5770:Dryad 5735:Boost 5456:Array 5446:Fiber 5360:(CMT) 5333:(SMT) 5247:GPGPU 5072:S2CID 4891:77–79 4609:S2CID 4476:(PDF) 4445:(PDF) 4306:S2CID 4263:S2CID 4168:S2CID 4079:S2CID 3867:(PDF) 3729:(PDF) 3402:S2CID 3311:S2CID 2750:C.mmp 2502:(for 2446:FPGAs 2444:(for 2430:SISAL 2300:Intel 2177:Intel 2173:Apple 2097:GPGPU 1902:, or 1361:4-bit 1342:8-bit 1108:. An 289:Intel 144:grids 6409:and 6056:list 5835:ROCm 5765:CUDA 5755:Cilk 5722:APIs 5682:COMA 5677:NUMA 5608:MIMD 5603:MISD 5580:SIMD 5575:SISD 5303:Loop 5293:Data 5288:Task 5042:2017 4977:ISBN 4949:ISBN 4895:ISBN 4837:ISBN 4804:ISBN 4773:ISBN 4601:ISSN 4484:2012 4408:ISBN 4381:ISBN 4354:The 4253:ISBN 4158:ISBN 4120:ISSN 4071:PMID 3795:link 3775:ISBN 3739:C-21 3557:help 3452:ISBN 3336:ISBN 3301:ISBN 3264:ISBN 3239:ISBN 3220:2012 3188:ISBN 3163:ISBN 3138:ISBN 3088:ISBN 3050:ISBN 2960:ISBN 2803:and 2721:and 2653:and 2506:and 2454:VHDL 2352:and 2334:APIs 2298:and 2284:Cray 2243:The 2143:and 2141:CUDA 2135:and 2045:VHDL 1877:and 1755:Sony 1692:tree 1688:ring 1684:star 1631:and 1598:PGAS 1587:and 1460:RISC 1293:and 1268:MPMD 1263:SPMD 1221:MIMD 1216:SIMD 1200:MISD 1195:SISD 1024:lock 747:and 693:and 684:Let 363:and 169:and 140:MPPs 74:data 6314:DSL 5850:ZPL 5845:TBB 5840:UPC 5820:PVM 5790:MPI 5745:HPX 5672:UMA 5273:Bit 5064:doi 4833:6–7 4593:doi 4502:doi 4465:hdl 4457:doi 4298:doi 4245:doi 4150:doi 4110:doi 4063:doi 4059:299 3743:doi 3429:doi 3394:doi 3293:doi 2448:), 2302:'s 2294:'s 2169:AMD 2137:AMD 2047:or 1946:'s 1944:IBM 1928:'s 1926:IBM 1799:bus 1749:'s 1747:IBM 1676:bus 1554:of 525:in 330:An 279:is 107:In 6785:: 6678:, 6674:, 6670:, 6476:, 6472:, 6201:, 6192:, 6071:, 6067:, 6054:, 5098:54 5070:. 4893:. 4859:; 4835:. 4802:. 4800:29 4771:. 4769:17 4724:. 4684:^ 4654:. 4629:. 4607:. 4599:. 4587:. 4583:. 4569:". 4543:^ 4512:^ 4463:. 4453:52 4451:. 4447:. 4389:. 4338:^ 4304:. 4296:. 4286:19 4284:. 4261:. 4251:. 4225:, 4180:^ 4166:. 4156:. 4144:. 4126:. 4118:. 4104:. 4100:. 4077:. 4069:. 4057:. 3982:. 3887:^ 3875:^ 3791:}} 3787:{{ 3737:. 3731:. 3695:. 3649:. 3624:. 3600:. 3574:. 3549:: 3547:}} 3543:{{ 3500:. 3474:. 3400:. 3392:. 3380:31 3378:. 3374:. 3309:. 3299:. 3287:. 3211:. 3102:^ 3076:; 3072:; 2799:, 2787:. 2699:. 2490:. 2460:. 2452:, 2440:, 2436:, 2376:. 2348:. 2332:, 2328:, 2268:, 2260:Ă— 2256:= 2187:. 2179:, 2175:, 2171:, 2151:, 2063:, 1906:. 1898:, 1843:A 1801:. 1702:. 1694:, 1690:, 1686:, 1481:). 1434:). 1398:). 1097:. 943:. 935:, 738:. 661:. 382:B' 263:Ă— 259:Ă— 255:= 228:. 138:, 111:, 96:. 72:, 68:, 6682:) 6666:( 6650:) 6646:( 6615:) 6611:( 6537:) 6533:( 6505:, 6500:, 6480:) 6468:( 6348:) 6344:( 6338:) 6334:( 6280:) 6276:( 6232:) 6228:( 6141:) 6137:( 6123:) 6119:( 6058:) 6050:( 5973:) 5969:( 5959:e 5952:t 5945:v 5190:e 5183:t 5176:v 5146:) 5138:( 5133:) 5102:) 5095:( 5078:. 5066:: 5044:. 5005:. 4985:. 4957:. 4929:. 4927:2 4903:. 4845:. 4812:. 4781:. 4741:. 4709:. 4665:. 4639:. 4615:. 4595:: 4589:1 4504:: 4486:. 4467:: 4459:: 4312:. 4300:: 4292:: 4269:. 4247:: 4174:. 4152:: 4112:: 4106:8 4085:. 4065:: 4012:. 3992:. 3919:. 3869:. 3797:) 3783:. 3749:. 3745:: 3710:. 3680:. 3659:. 3634:. 3610:. 3585:. 3559:) 3539:. 3515:. 3485:. 3460:. 3435:. 3431:: 3396:: 3344:. 3317:. 3295:: 3272:. 3247:. 3222:. 3196:. 3171:. 3146:. 3121:. 3096:. 3058:. 2968:. 2858:, 2852:, 2846:, 2840:, 2834:, 2670:. 2610:) 2585:) 2575:) 2560:) 2550:) 2540:) 2532:N 2527:) 2274:C 2270:B 2266:A 2262:C 2258:B 2254:A 1821:" 1448:N 1444:N 1406:( 896:. 890:= 885:j 881:O 872:i 868:O 846:, 840:= 835:j 831:O 822:i 818:I 796:, 790:= 785:i 781:O 772:j 768:I 753:j 749:P 744:i 740:P 735:j 731:P 726:i 722:O 717:i 713:I 708:i 704:P 699:j 695:P 690:i 686:P 637:. 634:p 631:s 628:+ 625:p 619:1 616:= 613:) 610:s 607:( 598:S 565:p 561:) 559:p 552:S 545:. 539:p 533:s 516:S 493:) 490:s 484:1 481:( 478:p 475:+ 472:s 468:s 463:= 455:s 452:p 447:+ 444:p 438:1 434:1 429:= 426:) 423:s 420:( 411:S 378:B 374:A 369:B 365:B 361:A 285:F 277:V 269:C 265:F 261:V 257:C 253:P 249:P 31:. 20:)

Index

Parallelization
Parallelization (mathematics)

supercomputers
Blue Gene/P
computation
processes
bit-level
instruction-level
data
task parallelism
high-performance computing
frequency scaling
computer architecture
multi-core processors

computer science
multiple CPU cores
threads
multi-processor
processing elements
clusters
MPPs
grids
parallel algorithms
sequential
software bugs
race conditions
Communication
synchronization

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑