47:
857:
4784:
322:. These computers had many limited-functionality processors that would work in parallel. For example, each of 65,536 single-bit processors in a Thinking Machines CM-2 would execute the same instruction at the same time, allowing, for instance, to logically combine 65,536 pairs of bits at a time, using a hypercube-connected network or processor-dedicated RAM to find its operands. Supercomputing moved away from the SIMD approach when inexpensive scalar
865:
524:'s vector extension uses an alternative approach: instead of exposing the sub-register-level details to the programmer, the instruction set abstracts them out as a few "vector registers" that use the same interfaces across all CPUs with this instruction set. The hardware handles all alignment issues and "strip-mining" of loops. Machines with different vector sizes would be able to run the same code. LLVM calls this vector type "vscale".
181:
404:
to be distinguished from older ones, the newer architectures are then considered "short-vector" architectures, as earlier SIMD and vector supercomputers had vector lengths from 64 to 64,000. A modern supercomputer is almost always a cluster of MIMD computers, each of which implements (short-vector) SIMD instructions.
877:
with embedded SIMD have largely taken over this task from the CPU. Some systems also include permute functions that re-pack elements inside vectors, making them particularly useful for data processing and compression. They are also used in cryptography. The trend of general-purpose computing on GPUs
457:
to target maximal cache optimality, though this technique will require more intermediate state. Note: Batch-pipeline systems (example: GPUs or software rasterization pipelines) are most advantageous for cache control when implemented with SIMD intrinsics, but they are not exclusive to SIMD features.
502:
Different architectures provide different register sizes (e.g. 64, 128, 256 and 512 bits) and instruction sets, meaning that programmers must provide multiple implementations of vectorized code to operate optimally on any given CPU. In addition, the possible set of SIMD instructions grows with each
403:
All of these developments have been oriented toward support for real-time graphics, and are therefore oriented toward processing in two, three, or four dimensions, usually with vector lengths of between two and sixteen words, depending on data type and architecture. When new SIMD architectures need
420:
of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory. Audio
1114:
Consumer software is typically expected to work on a range of CPUs covering multiple generations, which could limit the programmer's ability to use new SIMD instructions to improve the computational performance of a program. The solution is to include multiple versions of the same code that uses
428:
With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single
1230:
Emscripten, Mozilla's C/C++-to-JavaScript compiler, with extensions can enable compilation of C++ programs that make use of SIMD intrinsics or GCC-style vector code to the SIMD API of JavaScript, resulting in equivalent speedups compared to scalar code. It also supports (and now prefers) the
333:
The current era of SIMD processors grew out of the desktop-computer market rather than the supercomputer market. As desktop processors became powerful enough to support real-time gaming and audio/video processing during the 1990s, demand grew for this particular type of computing power, and
1018:, is a range of techniques and tricks used for performing SIMD in general-purpose registers on hardware that does not provide any direct support for SIMD instructions. This can be used to exploit parallelism in certain algorithms even on hardware that does not support SIMD directly.
933:
confused matters somewhat, but today the system seems to have settled down (after AMD adopted SSE) and newer compilers should result in more SIMD-enabled software. Intel and AMD now both provide optimized math libraries that use SIMD instructions, and open source alternatives like
1192:
Instances of these types are immutable and in optimized code are mapped directly to SIMD registers. Operations expressed in Dart typically are compiled into a single instruction without any overhead. This is similar to C and C++ intrinsics. Benchmarks for
1085:
Instead of providing an SIMD datatype, compilers can also be hinted to auto-vectorize some loops, potentially taking some assertions about the lack of data dependency. This is not as flexible as manipulating SIMD variables directly, but is easier to use.
429:
instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For a variety of reasons, this can take much less time than retrieving each pixel individually, as with a traditional CPU design.
889:
software was at first slow, due to a number of problems. One was that many of the early SIMD instruction sets tended to slow overall performance of the system due to the re-use of existing floating point registers. Other systems, like
1341:
and controlled by a general purpose CPU) and is geared towards the huge datasets required by 3D and video processing applications. It differs from traditional ISAs by being SIMD from the ground up with no separate scalar registers.
222:: there are simultaneous (parallel) computations, but each unit performs the exact same instruction at any given moment (just with different data). SIMD is particularly applicable to common tasks such as adjusting the contrast in a
1305:
executing its own instruction stream, or as a coprocessor driven by ordinary CPU instructions. 3D graphics applications tend to lend themselves well to SIMD processing as they rely heavily on operations with 4-dimensional vectors.
1034:) guaranteeing the generation of vector code. Intel, AltiVec, and ARM NEON provide extensions widely adopted by the compilers targeting their CPUs. (More complex operations are the task of vector math libraries.)
868:
The SIMD tripling of four 8-bit numbers. The CPU loads 4 numbers at once, multiplies them all in one SIMD-multiplication, and saves them all at once back to RAM. In theory, the speed can be multiplied by
499:
Instruction sets are architecture-specific: some processors lack SIMD instructions entirely, so programmers must provide non-vectorized implementations (or different vectorized implementations) for them.
860:
The ordinary tripling of four 8-bit numbers. The CPU loads one 8-bit number into R1, multiplies it with R2, and then saves the answer from R3 back to RAM. This process is repeated for each number.
1165:
As using FMV requires code modification on GCC and Clang, vendors more commonly use library multi-versioning: this is easier to achieve as only compiler switches need to be changed.
604:
432:
Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the
1064:
that works similarly to the GCC extension. LLVM's libcxx seems to implement it. For GCC and libstdc++, a wrapper library that builds on top of the GCC extension is available.
412:
An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points, a common operation in many
440:; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel.
2055:
715:
Small-scale (64 or 128 bits) SIMD became popular on general-purpose CPUs in the early 1990s and continued through 1997 and later with Motion Video
Instructions (MVI) for
1041:
takes the extensions a step further by abstracting them into a universal interface that can be used on any platform by providing a way of defining SIMD datatypes. The
728:
972:, therefore assembly language programming is rarely needed. Additionally, many of the systems that would benefit from SIMD were supplied by Apple itself, for example
1352:'s CSX600 (2004) has 96 cores each with two double-precision floating point units while the CSX700 (2008) has 192. Stream Processors is headed by computer architect
486:; programmers familiar with one particular architecture may not expect this. Worse: the alignment may change from one revision or "compatible" processor to another.
1162:
also supports FMV. The setup is similar to GCC and Clang in that the code defines what instruction sets to compile for, but cloning is manually done via inlining.
4110:
3137:
1904:
898:, offered support for data types that were not interesting to a wide audience and had expensive context switching instructions to switch between using the
543:
464:
Currently, implementing an algorithm with SIMD instructions usually requires human labor; most compilers do not generate SIMD instructions from a typical
1973:
436:
operation being applied to the data will happen to all eight values at the same time. This parallelism is separate from the parallelism provided by a
1294:
527:
An order of magnitude increase in code size is not uncommon, when compared to equivalent scalar or equivalent vector code, and an order of magnitude
1227:. As of August 2020, the WebAssembly interface remains unfinished, but its portable 128-bit SIMD feature has already seen some use in many engines.
1150:
since clang 7 allow for a simplified approach, with the compiler taking care of function duplication and selection. GCC and clang requires explicit
510:
instruction set shared a register file with the floating-point stack, which caused inefficiencies when mixing floating-point and MMX code. However,
2109:
698:
1127:
in the program or a library is duplicated and compiled for many instruction set extensions, and the program decides which one to use at run-time.
539:
239:
1138:
FMV, manually coded in assembly language, is quite commonly used in a number of performance-critical libraries such as glibc and libjpeg-turbo.
3248:
2431:
1004:. Even though Apple has stopped using PowerPC processors in their products, further development of AltiVec is continued in several PowerPC and
702:
4200:
2950:
458:
Further complexity may be apparent to avoid dependence within series such as code strings; while independence is required for vectorization.
384:
system. Since then, there have been several extensions to the SIMD instruction sets for both architectures. Advanced vector extensions AVX,
4052:
2228:
688:
3107:
2673:
2490:
684:
3461:
2084:: A portable implementation of platform-specific intrinsics for other platforms (e.g. SSE intrinsics for ARM NEON), using C/C++ headers
2008:
1314:
now chooses at runtime processor-specific implementations of its own math operations, including the use of SIMD-capable instructions.
2453:
1251:
939:
935:
3102:
1878:
1748:
3174:
2068:
1243:
1134:
is duplicated for many instruction set extensions, and the operating system or the program decides which one to load at run-time.
943:
614:
1486:
Conte, G.; Tommesani, S.; Zanichelli, F. (2000). "The long and winding road to high-performance image processing with MMX/SSE".
4181:
3456:
2927:
251:
35:
1181:
programming language, bringing the benefits of SIMD to web programs for the first time. The interface consists of two types:
1078:
package, available on NuGet, implements SIMD datatypes. Java also has a new proposed API for SIMD instructions available in
4221:
1223:. However, by 2017, SIMD.js has been taken out of the ECMAScript standard queue in favor of pursuing a similar interface in
4448:
3871:
2995:
2258:
2102:
1794:
243:
4471:
3881:
3022:
489:
Gathering data into SIMD registers and scattering it to the correct destination locations is tricky (sometimes requiring
323:
123:
2045:
1734:
4814:
4360:
2149:
985:
743:
335:
288:, which could operate on a "vector" of data with a single instruction. Vector processing was especially popularized by
4216:
4466:
3189:
3017:
2369:
1542:
981:
102:
254:, both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution.
4045:
4004:
3567:
2460:
2426:
2421:
2340:
2305:
1348:
Larger scale commercial SIMD processors are available from ClearSpeed
Technology, Ltd. and Stream Processors, Inc.
1220:
1177:
In 2013 John McCutchan announced that he had created a high-performance interface to SIMD instruction sets for the
4438:
4253:
3979:
3876:
3277:
3184:
2985:
2206:
2095:
1825:
593:
285:
97:
1911:
342:
1.1 desktops in 1994 to accelerate MPEG decoding. Sun
Microsystems introduced SIMD integer instructions in its "
4545:
4459:
4408:
3005:
2724:
2159:
1442:
496:
Specific instructions like rotations or three-operand addition are not available in some SIMD instruction sets.
149:
551:
4809:
4769:
4603:
4454:
4141:
3179:
3027:
3000:
2861:
2475:
2436:
2293:
663:
316:
165:
144:
4819:
2056:
Article about
Optimizing the Rendering Pipeline of Animated Models Using the Intel Streaming SIMD Extensions
1345:
Ziilabs produced an SIMD type processor for use on mobile devices, such as media players and mobile phones.
3616:
3378:
2854:
2815:
2470:
2465:
2399:
2211:
1390:
1054:
208:
204:
1026:
It is common for publishers of the SIMD instruction sets to make their own C/C++ language extensions with
4824:
4788:
4734:
4194:
4038:
3243:
2940:
2638:
2335:
1178:
826:
649:
308:
292:
in the 1970s and 1980s. Vector processing architectures are now considered separate from SIMD computers:
139:
1239:
It has generally proven difficult to find sustainable commercial applications for SIMD-only processors.
1154:
labels in the code to "clone" functions, while ICC does so automatically (under the command-line option
806:
4713:
4508:
4393:
4355:
4205:
4095:
3893:
3540:
2957:
2448:
2416:
2186:
2174:
2154:
1381:
1267:
56:
1082:
17 in an incubator module. It also has a safe fallback mechanism on unsupported CPUs to simple loops.
4729:
4708:
4653:
4540:
4530:
4503:
4365:
3984:
3947:
3937:
2325:
1469:
1365:
1334:
1159:
926:
755:
535:
453:
may not easily benefit from SIMD; however, it is theoretically possible to vectorize comparisons and
422:
381:
377:
2060:
4683:
4309:
4248:
4161:
3999:
3406:
3342:
3319:
3169:
3131:
2967:
2917:
2912:
2389:
2283:
2191:
1302:
1255:
1201:
503:
new register size. Unfortunately, for legacy support reasons, the older versions cannot be retired.
483:
465:
258:
1045:
Clang compiler also implements the feature, with an analogous interface defined in the IR. Rust's
203:. SIMD can be internal (part of the hardware design) and it can be directly accessible through an
4829:
4744:
4739:
4598:
4189:
3952:
3735:
3629:
3593:
3510:
3494:
3336:
3125:
3084:
3072:
2935:
2849:
2770:
2535:
2196:
2139:
1608:
1143:
1115:
either older or newer SIMD technologies, and pick one that best fits the user's CPU at run-time (
1001:
550:
but is still far better than non-predicated SIMD. Detailed comparative examples are given in the
469:
231:
1591:
4483:
4415:
4319:
4211:
4166:
3758:
3730:
3640:
3605:
3354:
3348:
3330:
3064:
3058:
2962:
2866:
2757:
2696:
2558:
2201:
1216:
922:
790:
774:
343:
1215:
and Intel announced at IDF 2013 that they are implementing McCutchan's specification for both
449:
Not all algorithms can be vectorized easily. For example, a flow-control-heavy task like code
4575:
4535:
4488:
4478:
4273:
4136:
4075:
3932:
3841:
3587:
3299:
3117:
2876:
2844:
2802:
2714:
2515:
2330:
2320:
2310:
2300:
2270:
2253:
2118:
1369:
1318:
1197:
1067:
891:
822:
751:
507:
479:
Programming with particular SIMD instruction sets can involve numerous low-level challenges.
437:
358:
2016:
1860:
1670:
1434:
960:
offered a rich system and can be programmed using increasingly sophisticated compilers from
4515:
4403:
4398:
4388:
4375:
4171:
3962:
3898:
3484:
3206:
3096:
3043:
2575:
2288:
2144:
2126:
1525:
Lee, R.B. (1995). "Realtime MPEG video via software decompression on a PA-RISC processor".
1139:
1031:
293:
219:
31:
17:
425:
would likewise, for volume control, multiply both Left and Right channels simultaneously.
8:
4678:
4633:
4433:
4299:
4009:
3994:
3814:
3665:
3647:
3611:
3599:
3253:
3200:
2977:
2893:
2775:
2630:
2525:
2384:
1395:
1131:
1094:
hint. This OpenMP interface has replaced a wide set of nonstandard extensions, including
956:
had somewhat more success, even though they entered the SIMD market later than the rest.
899:
490:
297:
200:
83:
1169:
supports LMV and this functionality is adopted by the Intel-backed Clear Linux project.
63:
Please help update this article to reflect recent events or newly available information.
4703:
4552:
4525:
4350:
4314:
4304:
4263:
4105:
4085:
4080:
4061:
3866:
3858:
3710:
3685:
3489:
3364:
2888:
2829:
2709:
2441:
2169:
2040:
1776:
1548:
1507:
1290:
1027:
903:
834:
659:
319:
247:
2065:
1695:
4749:
4425:
4383:
4278:
3819:
3786:
3702:
3634:
3535:
3525:
3515:
3446:
3441:
3436:
3359:
3288:
3194:
3154:
2787:
2737:
2687:
2663:
2545:
2485:
2480:
2362:
2278:
1538:
1279:
930:
907:
886:
810:
747:
547:
334:
microprocessor vendors turned to SIMD to meet the demand. Hewlett-Packard introduced
196:
1511:
841:. The Xetal has 320 16-bit processor elements especially designed for vision tasks.
719:. SIMD instructions can be found, to one degree or another, on most CPUs, including
4759:
4558:
4493:
4340:
4156:
4151:
4146:
4115:
3989:
3922:
3908:
3763:
3670:
3624:
3431:
3426:
3421:
3416:
3411:
3401:
3271:
3238:
3149:
3144:
3053:
2905:
2900:
2883:
2871:
2810:
2374:
2352:
2238:
2216:
2134:
2066:
Introduction to
Parallel Computing from LLNL Lawrence Livermore National Laboratory
2050:
1552:
1530:
1499:
1491:
1451:
1430:
1116:
1000:
as well as AltiVec. Apple was the dominant purchaser of PowerPC chips from IBM and
794:
564:
473:
277:
215:
1566:
1356:. Their Storm-1 processor (2007) contains 80 SIMD cores controlled by a MIPS CPU.
1301:
was unusual in that one of its vector-float units could function as an autonomous
4623:
4563:
4498:
4345:
4335:
4268:
4258:
4100:
4090:
3903:
3888:
3836:
3740:
3715:
3552:
3545:
3396:
3391:
3386:
3325:
3233:
3223:
2945:
2780:
2732:
2495:
2379:
2347:
2248:
2243:
2164:
2072:
1275:
1247:
782:
736:
622:
618:
1990:
1802:
1488:
Proc. Fifth IEEE Int'l
Workshop on Computer Architectures for Machine Perception
207:(ISA), but it should not be confused with an ISA. SIMD describes computers with
4754:
4570:
4227:
4120:
4014:
3848:
3831:
3824:
3720:
3577:
3314:
3228:
3159:
2742:
2704:
2653:
2648:
2643:
2357:
2181:
1972:
Jensen, Peter; Jibaja, Ivan; Hu, Ningxin; Gohman, Dan; McCutchan, John (2015).
1632:
1534:
1205:
1038:
1258:
applications like conversion between various video standards and frame rates (
1208:
visualization show near 400% speedup compared to scalar code written in Dart.
461:
Large register files which increases power consumption and required chip area.
365:
architecture in 1996. This sparked the introduction of the much more powerful
4803:
4643:
4520:
3809:
3725:
2765:
2747:
2540:
2233:
1635:, showing RSA implemented using a non-SIMD SSE2 integer multiply instruction.
1495:
1298:
1271:
630:
416:
applications. One example would be changing the brightness of an image. Each
312:
227:
223:
2668:
1455:
1194:
4243:
4019:
3957:
3773:
3750:
3562:
3283:
2221:
2077:
1946:
1527:
digest of papers
Compcon '95. Technologies for the Information Superhighway
1338:
980:. However, in 2006, Apple computers moved to Intel x86 processors. Apple's
802:
641:
281:
1254:. The GAPP's recent incarnations have become a powerful tool in real-time
906:. Compilers also often lacked support, requiring programmers to resort to
873:
SIMD instructions are widely used to process 3D graphics, although modern
531:
effectiveness (work done per instruction) is achievable with Vector ISAs.
4764:
3804:
3768:
3479:
3451:
3309:
3164:
2087:
1503:
1224:
856:
211:
that perform the same operation on multiple data points simultaneously.
3690:
3680:
3675:
3657:
3557:
3530:
2792:
2625:
2595:
2315:
1879:"Transparent use of library packages optimized for Intel® architecture"
1645:
1353:
1349:
1286:
1212:
1124:
953:
874:
413:
347:
327:
235:
1720:
472:
in compilers is an active area of computer science research. (Compare
4638:
4613:
4030:
3781:
3778:
3520:
2590:
2568:
1620:
1307:
1297:
has incorporated a SIMD processor somewhere in its architecture. The
1071:
1005:
977:
716:
583:
270:
2061:"Yeppp!": cross-platform, open-source SIMD library from Georgia Tech
1762:
1567:"AMD Zen 4 AVX-512 Performance Analysis On The Ryzen 9 7950X Review"
864:
4688:
4668:
4593:
3796:
2615:
1842:
961:
370:
1211:
McCutchan's work on Dart, now called SIMD.js, has been adopted by
4693:
4673:
4648:
4283:
2605:
2563:
1385:
1326:
1311:
1079:
957:
845:
830:
818:
740:
732:
724:
450:
397:
389:
373:
366:
339:
234:
designs include SIMD instructions to improve the performance of
30:"SIMD" redirects here. For the cryptographic hash function, see
4663:
4658:
2620:
2585:
2550:
2081:
1410:
1087:
973:
918:
895:
778:
673:
521:
301:
3078:
2610:
2580:
1166:
1147:
989:
879:
838:
786:
766:
417:
1749:"The JIT finally proposed. JIT and SIMD are getting married"
380:
systems. Intel responded in 1999 by introducing the all-new
307:
The first era of modern SIMD computers was characterized by
4698:
4628:
4618:
3942:
3090:
3010:
2600:
1928:
1611:, showing how SSE2 is used to implement SHA hash algorithms
1405:
1400:
1377:
1373:
1330:
1259:
1095:
1042:
1015:
997:
993:
947:
814:
798:
770:
763:
759:
645:
511:
393:
385:
351:
289:
170:
542:
as "Associative
Processing", more commonly known today as
300:
does not, due to Flynn's work (1966, 1972) pre-dating the
4608:
4585:
2530:
2520:
1322:
1263:
969:
965:
914:
720:
626:
362:
1947:"tc39/ecmascript_simd: SIMD numeric type for EcmaScript"
1590:
Patterson, David; Waterman, Andrew (18 September 2017).
1485:
357:
The first widely deployed desktop SIMD was with Intel's
180:
350:
microprocessor. MIPS followed suit with their similar
1435:"Some Computer Organizations and Their Effectiveness"
1317:
A later processor that used vector processing is the
1250:
and taken to the commercial sector by their spin-off
326:
approaches based on commodity processors such as the
2051:
Short Vector
Extensions in Commercial Microprocessor
1971:
1185:
Float32x4, 4 single precision floating point values.
848:
SIMD instructions process 512 bits of data at once.
1735:"RyuJIT: The next-generation JIT compiler for .NET"
1285:A more ubiquitous application for SIMD is found in
1321:used in the Playstation 3, which was developed by
330:became more powerful, and interest in SIMD waned.
1589:
59:may be compromised due to out-of-date information
4801:
1623:, showing a stream cipher implemented using SSE2
1242:One that has had some measure of success is the
882:) may lead to wider use of SIMD in the future.
563:Examples of SIMD supercomputers (not including
238:use. SIMD has three different subcategories in
150:Associative processing (predicated/masked SIMD)
269:The first use of SIMD instructions was in the
4046:
2103:
1633:Subject: up to 1.4x RSA throughput using SSE2
346:" instruction set extensions in 1995, in its
261:(GPUs) are often wide SIMD implementations.
3108:Computer performance by orders of magnitude
1974:"SIMD in JavaScript via C++ and Emscripten"
1130:Library multi-versioning (LMV): the entire
4053:
4039:
2117:
2110:
2096:
1902:
1234:
1119:). There are two main camps of solutions:
837:, developed several SIMD processors named
829:'s instruction set is heavily SIMD based.
392:are developed by Intel. AMD supports AVX,
1991:"Porting SIMD code targeting WebAssembly"
1423:
1333:. It uses a number of SIMD processors (a
546:SIMD. This approach is not as compact as
34:. For the Scottish statistical tool, see
1109:
863:
855:
179:
1675:Using the GNU Compiler Collection (GCC)
1021:
662:, models 1 and 2 (CM-1 and CM-2), from
615:Geometric-Arithmetic Parallel Processor
14:
4802:
4060:
2009:"ZiiLABS ZMS-05 ARM 9 Media Processor"
1592:"SIMD Instructions Considered Harmful"
917:had a slow start. The introduction of
821:. The IBM, Sony, Toshiba co-developed
36:Scottish index of multiple deprivation
4034:
2091:
1871:
1429:
3079:Floating-point operations per second
1843:"Function multi-versioning in GCC 6"
1479:
1406:Single Program, Multiple Data (SPMD)
1337:architecture, each with independent
246:. SIMT should not be confused with
40:
1905:"Bringing SIMD to the web via Dart"
1524:
1231:WebAssembly 128-bit SIMD proposal.
1123:Function multi-versioning (FMV): a
1053:) uses this interface, and so does
24:
2046:Cracking Open The Pentium 3 (1999)
1172:
1060:C++ has an experimental interface
744:Multimedia Acceleration eXtensions
145:Pipelined processing (packed SIMD)
25:
4841:
2034:
1823:
1188:Int32x4, 4 32-bit integer values.
946:have started to appear (see also
538:takes another approach, known in
189:Single instruction, multiple data
184:Single instruction, multiple data
4783:
4782:
4005:Semiconductor device fabrication
1008:designs from Freescale and IBM.
443:
45:
4254:Analysis of parallel algorithms
3980:History of general-purpose CPUs
2207:Nondeterministic Turing machine
2001:
1995:Emscripten 1.40.1 documentation
1983:
1965:
1939:
1921:
1896:
1853:
1835:
1817:
1787:
1769:
1755:
1741:
1727:
1713:
1688:
1663:
1638:
1621:Salsa20 speed; Salsa20 software
1155:
1151:
1103:
1099:
1091:
1075:
1061:
1050:
1046:
594:ICL Distributed Array Processor
280:of the early 1970s such as the
273:, which was completed in 1966.
2160:Deterministic finite automaton
1826:"OMP5.1: Loop Transformations"
1801:. 18 July 2012. Archived from
1626:
1614:
1602:
1583:
1559:
1518:
1462:
1443:IEEE Transactions on Computers
605:Burroughs Scientific Processor
482:SIMD may have restrictions on
13:
1:
4201:Simultaneous and heterogenous
2951:Simultaneous and heterogenous
1646:"SIMD library math functions"
1416:
1401:SIMD within a register (SWAR)
664:Thinking Machines Corporation
557:
407:
309:massively parallel processing
4789:Category: Parallel computing
3635:Integrated memory controller
3617:Translation lookaside buffer
2816:Memory dependence prediction
2259:Random-access stored program
2212:Probabilistic Turing machine
1391:Instruction set architecture
1049:crate (and the experimental
885:Adoption of SIMD systems in
642:Massively Parallel Processor
520:To remedy problems 1 and 5,
209:multiple processing elements
205:instruction set architecture
7:
3091:Synaptic updates per second
1953:. Ecma TC39. 22 August 2019
1696:"Clang Language Extensions"
1359:
1030:or special datatypes (with
992:) were modified to support
851:
710:
650:Goddard Space Flight Center
400:in their current products.
226:or adjusting the volume of
27:Type of parallel processing
10:
4846:
4096:High-performance computing
3495:Heterogeneous architecture
2417:Orthogonal instruction set
2187:Alternating Turing machine
2175:Quantum cellular automaton
1535:10.1109/CMPCON.1995.512384
1382:Advanced Vector Extensions
264:
29:
4815:Digital signal processing
4778:
4730:Automatic parallelization
4722:
4584:
4424:
4374:
4366:Application checkpointing
4328:
4292:
4236:
4180:
4129:
4068:
3985:Microprocessor chronology
3972:
3948:Dynamic frequency scaling
3921:
3857:
3795:
3749:
3701:
3656:
3576:
3503:
3472:
3377:
3298:
3262:
3216:
3116:
3103:Cache performance metrics
3042:
2976:
2926:
2837:
2828:
2801:
2756:
2723:
2695:
2686:
2506:
2409:
2398:
2269:
2125:
2041:SIMD architectures (2000)
1723:. VcDevel. 6 August 2020.
1366:Streaming SIMD Extensions
1282:, and image enhancement.
1246:, which was developed by
1160:Rust programming language
536:Scalable Vector Extension
493:) and can be inefficient.
259:graphics processing units
4000:Hardware security module
3343:Digital signal processor
3320:Graphics processing unit
3132:Graphics processing unit
1496:10.1109/CAMP.2000.875989
1202:3D vertex transformation
785:'s ARC Video subsystem,
4745:Embarrassingly parallel
4740:Deterministic algorithm
3953:Dynamic voltage scaling
3736:Memory address register
3630:Branch target predictor
3594:Address generation unit
3337:Physics processing unit
3126:Central processing unit
3085:Transactions per second
3073:Instructions per second
2996:Array processing (SIMT)
2140:Stored-program computer
1456:10.1109/TC.1972.5009071
1235:Commercial applications
1144:GNU Compiler Collection
1062:std::experimental::simd
1002:Freescale Semiconductor
470:Automatic vectorization
468:program, for instance.
276:SIMD was the basis for
140:Array processing (SIMT)
4460:Associative processing
4416:Non-blocking algorithm
4222:Clustered multi-thread
3759:Hardwired control unit
3641:Memory management unit
3606:Memory management unit
3355:Secure cryptoprocessor
3349:Tensor Processing Unit
3331:Vision processing unit
3065:Cycles per instruction
3059:Instructions per cycle
3006:Associative processing
2697:Instruction pipelining
2119:Processor technologies
1795:"Tutorial pragma simd"
1700:Clang 11 documentation
1289:: nearly every modern
1076:System.Numerics.Vector
1012:SIMD within a register
870:
861:
296:includes them whereas
216:data level parallelism
214:Such machines exploit
185:
4576:Hardware acceleration
4489:Superscalar processor
4479:Dataflow architecture
4076:Distributed computing
3842:Sum-addressed decoder
3588:Arithmetic logic unit
2715:Classic RISC pipeline
2669:Epiphany architecture
2516:Motorola 68000 series
1861:"2045-target-feature"
1763:"JEP 338: Vector API"
1276:image noise reduction
1198:matrix multiplication
1110:SIMD multi-versioning
867:
859:
544:"Predicated" (masked)
438:superscalar processor
286:Texas Instruments ASC
278:vector supercomputers
240:Flynn's 1972 Taxonomy
183:
111:Multiple data streams
4810:Classes of computers
4455:Pipelined processing
4404:Explicit parallelism
4399:Implicit parallelism
4389:Dataflow programming
3963:Performance per watt
3541:replacement policies
3207:Package on a package
3097:Performance per watt
3001:Pipelined processing
2771:Tomasulo's algorithm
2576:Clipper architecture
2432:Application-specific
2145:Finite-state machine
1929:"SIMD in JavaScript"
1883:Clear Linux* Project
1737:. 30 September 2013.
1529:. pp. 186–192.
1470:"MIMD1 - XP/S, CM-5"
1325:in cooperation with
1032:operator overloading
1022:Programmer interface
32:SIMD (hash function)
4679:Parallel Extensions
4484:Pipelined processor
3995:Digital electronics
3648:Instruction decoder
3600:Floating-point unit
3254:Soft microprocessor
3201:System in a package
2776:Reservation station
2306:Transport-triggered
1671:"Vector Extensions"
1132:programming library
1028:intrinsic functions
568:
197:parallel processing
4825:Parallel computing
4553:Massively parallel
4531:distributed shared
4351:Cache invalidation
4315:Instruction window
4106:Manycore processor
4086:Massively parallel
4081:Parallel computing
4062:Parallel computing
3867:Integrated circuit
3711:Processor register
3365:Baseband processor
2710:Operand forwarding
2170:Cellular automaton
2071:2013-06-10 at the
1805:on 4 December 2020
1721:"VcDevel/std-simd"
1433:(September 1972).
1291:video game console
1140:Intel C++ Compiler
871:
862:
660:Connection Machine
562:
491:permute operations
361:extensions to the
338:instructions into
242:, one of which is
186:
132:SIMD subcategories
90:Single data stream
4797:
4796:
4750:Parallel slowdown
4384:Stream processing
4274:Karp–Flatt metric
4028:
4027:
3917:
3916:
3536:Instruction cache
3526:Scratchpad memory
3373:
3372:
3360:Network processor
3289:Network on a chip
3244:Ultra-low-voltage
3195:Multi-chip module
3038:
3037:
2824:
2823:
2811:Branch prediction
2788:Register renaming
2682:
2681:
2664:VISC architecture
2486:Quantum computing
2481:VISC architecture
2363:Secondary storage
2279:Microarchitecture
2239:Register machines
1865:The Rust RFC Book
1777:"SIMD Directives"
1431:Flynn, Michael J.
1280:video compression
1146:since GCC 6, and
1106:, and many more.
1104:#pragma GCC ivdep
986:development tools
908:assembly language
887:personal computer
708:
707:
565:vector processors
552:Vector processing
548:Vector processing
474:vector processing
317:Thinking Machines
294:Duncan's Taxonomy
178:
177:
78:
77:
16:(Redirected from
4837:
4820:Flynn's taxonomy
4786:
4785:
4760:Software lockout
4559:Computer cluster
4494:Vector processor
4449:Array processing
4434:Flynn's taxonomy
4341:Memory coherence
4116:Computer network
4055:
4048:
4041:
4032:
4031:
3990:Processor design
3882:Power management
3764:Instruction unit
3625:Branch predictor
3574:
3573:
3272:System on a chip
3214:
3213:
3054:Transistor count
2978:Flynn's taxonomy
2835:
2834:
2693:
2692:
2496:Addressing modes
2407:
2406:
2353:Memory hierarchy
2217:Hypercomputation
2135:Abstract machine
2112:
2105:
2098:
2089:
2088:
2028:
2027:
2025:
2024:
2015:. Archived from
2005:
1999:
1998:
1987:
1981:
1980:
1978:
1969:
1963:
1962:
1960:
1958:
1943:
1937:
1936:
1925:
1919:
1918:
1916:
1910:. Archived from
1909:
1903:John McCutchan.
1900:
1894:
1893:
1891:
1889:
1875:
1869:
1868:
1857:
1851:
1850:
1839:
1833:
1832:
1830:
1824:Kruse, Michael.
1821:
1815:
1814:
1812:
1810:
1791:
1785:
1784:
1773:
1767:
1766:
1759:
1753:
1752:
1745:
1739:
1738:
1731:
1725:
1724:
1717:
1711:
1710:
1708:
1706:
1692:
1686:
1685:
1683:
1681:
1667:
1661:
1660:
1658:
1656:
1642:
1636:
1630:
1624:
1618:
1612:
1606:
1600:
1599:
1587:
1581:
1580:
1578:
1577:
1571:www.phoronix.com
1563:
1557:
1556:
1522:
1516:
1515:
1483:
1477:
1476:
1474:
1466:
1460:
1459:
1439:
1427:
1396:Flynn's taxonomy
1270:formats, etc.),
1256:video processing
1157:
1153:
1117:dynamic dispatch
1105:
1101:
1093:
1092:#pragma omp simd
1077:
1063:
1052:
1048:
569:
561:
540:Flynn's Taxonomy
435:
298:Flynn's Taxonomy
252:hardware threads
248:software threads
201:Flynn's taxonomy
84:Flynn's taxonomy
80:
79:
73:
70:
64:
57:factual accuracy
49:
48:
41:
21:
4845:
4844:
4840:
4839:
4838:
4836:
4835:
4834:
4800:
4799:
4798:
4793:
4774:
4718:
4624:Coarray Fortran
4580:
4564:Beowulf cluster
4420:
4370:
4361:Synchronization
4346:Cache coherence
4336:Multiprocessing
4324:
4288:
4269:Cost efficiency
4264:Gustafson's law
4232:
4176:
4125:
4101:Multiprocessing
4091:Cloud computing
4064:
4059:
4029:
4024:
4010:Tick–tock model
3968:
3924:
3913:
3853:
3837:Address decoder
3791:
3745:
3741:Program counter
3716:Status register
3697:
3652:
3612:Load–store unit
3579:
3572:
3499:
3468:
3369:
3326:Image processor
3301:
3294:
3264:
3258:
3234:Microcontroller
3224:Embedded system
3212:
3112:
3045:
3034:
2972:
2922:
2820:
2797:
2781:Re-order buffer
2752:
2733:Data dependency
2719:
2678:
2508:
2502:
2401:
2400:Instruction set
2394:
2380:Multiprocessing
2348:Cache hierarchy
2341:Register/memory
2265:
2165:Queue automaton
2121:
2116:
2073:Wayback Machine
2037:
2032:
2031:
2022:
2020:
2007:
2006:
2002:
1989:
1988:
1984:
1976:
1970:
1966:
1956:
1954:
1945:
1944:
1940:
1927:
1926:
1922:
1914:
1907:
1901:
1897:
1887:
1885:
1877:
1876:
1872:
1859:
1858:
1854:
1841:
1840:
1836:
1828:
1822:
1818:
1808:
1806:
1793:
1792:
1788:
1775:
1774:
1770:
1761:
1760:
1756:
1751:. 7 April 2014.
1747:
1746:
1742:
1733:
1732:
1728:
1719:
1718:
1714:
1704:
1702:
1694:
1693:
1689:
1679:
1677:
1669:
1668:
1664:
1654:
1652:
1644:
1643:
1639:
1631:
1627:
1619:
1615:
1607:
1603:
1588:
1584:
1575:
1573:
1565:
1564:
1560:
1545:
1523:
1519:
1484:
1480:
1472:
1468:
1467:
1463:
1437:
1428:
1424:
1419:
1362:
1266:, NTSC to/from
1248:Lockheed Martin
1237:
1175:
1173:SIMD on the web
1112:
1074:in RyuJIT. The
1024:
854:
713:
623:Lockheed Martin
619:Martin Marietta
560:
446:
433:
410:
267:
195:) is a type of
74:
68:
65:
62:
54:This article's
50:
46:
39:
28:
23:
22:
15:
12:
11:
5:
4843:
4833:
4832:
4830:SIMD computing
4827:
4822:
4817:
4812:
4795:
4794:
4792:
4791:
4779:
4776:
4775:
4773:
4772:
4767:
4762:
4757:
4755:Race condition
4752:
4747:
4742:
4737:
4732:
4726:
4724:
4720:
4719:
4717:
4716:
4711:
4706:
4701:
4696:
4691:
4686:
4681:
4676:
4671:
4666:
4661:
4656:
4651:
4646:
4641:
4636:
4631:
4626:
4621:
4616:
4611:
4606:
4601:
4596:
4590:
4588:
4582:
4581:
4579:
4578:
4573:
4568:
4567:
4566:
4556:
4550:
4549:
4548:
4543:
4538:
4533:
4528:
4523:
4513:
4512:
4511:
4506:
4499:Multiprocessor
4496:
4491:
4486:
4481:
4476:
4475:
4474:
4469:
4464:
4463:
4462:
4457:
4452:
4441:
4430:
4428:
4422:
4421:
4419:
4418:
4413:
4412:
4411:
4406:
4401:
4391:
4386:
4380:
4378:
4372:
4371:
4369:
4368:
4363:
4358:
4353:
4348:
4343:
4338:
4332:
4330:
4326:
4325:
4323:
4322:
4317:
4312:
4307:
4302:
4296:
4294:
4290:
4289:
4287:
4286:
4281:
4276:
4271:
4266:
4261:
4256:
4251:
4246:
4240:
4238:
4234:
4233:
4231:
4230:
4228:Hardware scout
4225:
4219:
4214:
4209:
4203:
4198:
4192:
4186:
4184:
4182:Multithreading
4178:
4177:
4175:
4174:
4169:
4164:
4159:
4154:
4149:
4144:
4139:
4133:
4131:
4127:
4126:
4124:
4123:
4121:Systolic array
4118:
4113:
4108:
4103:
4098:
4093:
4088:
4083:
4078:
4072:
4070:
4066:
4065:
4058:
4057:
4050:
4043:
4035:
4026:
4025:
4023:
4022:
4017:
4015:Pin grid array
4012:
4007:
4002:
3997:
3992:
3987:
3982:
3976:
3974:
3970:
3969:
3967:
3966:
3960:
3955:
3950:
3945:
3940:
3935:
3929:
3927:
3919:
3918:
3915:
3914:
3912:
3911:
3906:
3901:
3896:
3891:
3886:
3885:
3884:
3879:
3874:
3863:
3861:
3855:
3854:
3852:
3851:
3849:Barrel shifter
3846:
3845:
3844:
3839:
3832:Binary decoder
3829:
3828:
3827:
3817:
3812:
3807:
3801:
3799:
3793:
3792:
3790:
3789:
3784:
3776:
3771:
3766:
3761:
3755:
3753:
3747:
3746:
3744:
3743:
3738:
3733:
3728:
3723:
3721:Stack register
3718:
3713:
3707:
3705:
3699:
3698:
3696:
3695:
3694:
3693:
3688:
3678:
3673:
3668:
3662:
3660:
3654:
3653:
3651:
3650:
3645:
3644:
3643:
3632:
3627:
3622:
3621:
3620:
3614:
3603:
3597:
3591:
3584:
3582:
3571:
3570:
3565:
3560:
3555:
3550:
3549:
3548:
3543:
3538:
3533:
3528:
3523:
3513:
3507:
3505:
3501:
3500:
3498:
3497:
3492:
3487:
3482:
3476:
3474:
3470:
3469:
3467:
3466:
3465:
3464:
3454:
3449:
3444:
3439:
3434:
3429:
3424:
3419:
3414:
3409:
3404:
3399:
3394:
3389:
3383:
3381:
3375:
3374:
3371:
3370:
3368:
3367:
3362:
3357:
3352:
3346:
3340:
3334:
3328:
3323:
3317:
3315:AI accelerator
3312:
3306:
3304:
3296:
3295:
3293:
3292:
3286:
3281:
3278:Multiprocessor
3275:
3268:
3266:
3260:
3259:
3257:
3256:
3251:
3246:
3241:
3236:
3231:
3229:Microprocessor
3226:
3220:
3218:
3217:By application
3211:
3210:
3204:
3198:
3192:
3187:
3182:
3177:
3172:
3167:
3162:
3160:Tile processor
3157:
3152:
3147:
3142:
3141:
3140:
3129:
3122:
3120:
3114:
3113:
3111:
3110:
3105:
3100:
3094:
3088:
3082:
3076:
3070:
3069:
3068:
3056:
3050:
3048:
3040:
3039:
3036:
3035:
3033:
3032:
3031:
3030:
3020:
3015:
3014:
3013:
3008:
3003:
2998:
2988:
2982:
2980:
2974:
2973:
2971:
2970:
2965:
2960:
2955:
2954:
2953:
2948:
2946:Hyperthreading
2938:
2932:
2930:
2928:Multithreading
2924:
2923:
2921:
2920:
2915:
2910:
2909:
2908:
2898:
2897:
2896:
2891:
2881:
2880:
2879:
2874:
2864:
2859:
2858:
2857:
2852:
2841:
2839:
2832:
2826:
2825:
2822:
2821:
2819:
2818:
2813:
2807:
2805:
2799:
2798:
2796:
2795:
2790:
2785:
2784:
2783:
2778:
2768:
2762:
2760:
2754:
2753:
2751:
2750:
2745:
2740:
2735:
2729:
2727:
2721:
2720:
2718:
2717:
2712:
2707:
2705:Pipeline stall
2701:
2699:
2690:
2684:
2683:
2680:
2679:
2677:
2676:
2671:
2666:
2661:
2658:
2657:
2656:
2654:z/Architecture
2651:
2646:
2641:
2633:
2628:
2623:
2618:
2613:
2608:
2603:
2598:
2593:
2588:
2583:
2578:
2573:
2572:
2571:
2566:
2561:
2553:
2548:
2543:
2538:
2533:
2528:
2523:
2518:
2512:
2510:
2504:
2503:
2501:
2500:
2499:
2498:
2488:
2483:
2478:
2473:
2468:
2463:
2458:
2457:
2456:
2446:
2445:
2444:
2434:
2429:
2424:
2419:
2413:
2411:
2404:
2396:
2395:
2393:
2392:
2387:
2382:
2377:
2372:
2367:
2366:
2365:
2360:
2358:Virtual memory
2350:
2345:
2344:
2343:
2338:
2333:
2328:
2318:
2313:
2308:
2303:
2298:
2297:
2296:
2286:
2281:
2275:
2273:
2267:
2266:
2264:
2263:
2262:
2261:
2256:
2251:
2246:
2236:
2231:
2226:
2225:
2224:
2219:
2214:
2209:
2204:
2199:
2194:
2189:
2182:Turing machine
2179:
2178:
2177:
2172:
2167:
2162:
2157:
2152:
2142:
2137:
2131:
2129:
2123:
2122:
2115:
2114:
2107:
2100:
2092:
2086:
2085:
2075:
2063:
2058:
2053:
2048:
2043:
2036:
2035:External links
2033:
2030:
2029:
2000:
1982:
1964:
1938:
1920:
1917:on 2013-12-03.
1895:
1870:
1852:
1834:
1816:
1786:
1781:www.openmp.org
1768:
1754:
1740:
1726:
1712:
1687:
1662:
1650:Stack Overflow
1637:
1625:
1613:
1609:RE: SSE2 speed
1601:
1582:
1558:
1543:
1517:
1478:
1461:
1450:(9): 948–960.
1421:
1420:
1418:
1415:
1414:
1413:
1408:
1403:
1398:
1393:
1388:
1361:
1358:
1319:Cell Processor
1236:
1233:
1206:Mandelbrot set
1190:
1189:
1186:
1174:
1171:
1136:
1135:
1128:
1111:
1108:
1070:added SIMD to
1039:GNU C Compiler
1023:
1020:
954:Apple Computer
875:graphics cards
853:
850:
823:Cell Processor
752:MMX and iwMMXt
712:
709:
706:
705:
696:
692:
691:
682:
678:
677:
676:MP-1 and MP-2
671:
667:
666:
657:
653:
652:
639:
635:
634:
621:(continued at
612:
608:
607:
602:
598:
597:
591:
587:
586:
581:
577:
576:
573:
559:
556:
518:
517:
516:
515:
514:corrects this.
504:
500:
497:
494:
487:
484:data alignment
477:
462:
459:
445:
442:
409:
406:
369:system in the
313:supercomputers
266:
263:
230:. Most modern
176:
175:
174:
173:
168:
160:
159:
155:
154:
153:
152:
147:
142:
134:
133:
129:
128:
127:
126:
121:
113:
112:
108:
107:
106:
105:
100:
92:
91:
87:
86:
76:
75:
53:
51:
44:
26:
9:
6:
4:
3:
2:
4842:
4831:
4828:
4826:
4823:
4821:
4818:
4816:
4813:
4811:
4808:
4807:
4805:
4790:
4781:
4780:
4777:
4771:
4768:
4766:
4763:
4761:
4758:
4756:
4753:
4751:
4748:
4746:
4743:
4741:
4738:
4736:
4733:
4731:
4728:
4727:
4725:
4721:
4715:
4712:
4710:
4707:
4705:
4702:
4700:
4697:
4695:
4692:
4690:
4687:
4685:
4682:
4680:
4677:
4675:
4672:
4670:
4667:
4665:
4662:
4660:
4657:
4655:
4652:
4650:
4647:
4645:
4644:Global Arrays
4642:
4640:
4637:
4635:
4632:
4630:
4627:
4625:
4622:
4620:
4617:
4615:
4612:
4610:
4607:
4605:
4602:
4600:
4597:
4595:
4592:
4591:
4589:
4587:
4583:
4577:
4574:
4572:
4571:Grid computer
4569:
4565:
4562:
4561:
4560:
4557:
4554:
4551:
4547:
4544:
4542:
4539:
4537:
4534:
4532:
4529:
4527:
4524:
4522:
4519:
4518:
4517:
4514:
4510:
4507:
4505:
4502:
4501:
4500:
4497:
4495:
4492:
4490:
4487:
4485:
4482:
4480:
4477:
4473:
4470:
4468:
4465:
4461:
4458:
4456:
4453:
4450:
4447:
4446:
4445:
4442:
4440:
4437:
4436:
4435:
4432:
4431:
4429:
4427:
4423:
4417:
4414:
4410:
4407:
4405:
4402:
4400:
4397:
4396:
4395:
4392:
4390:
4387:
4385:
4382:
4381:
4379:
4377:
4373:
4367:
4364:
4362:
4359:
4357:
4354:
4352:
4349:
4347:
4344:
4342:
4339:
4337:
4334:
4333:
4331:
4327:
4321:
4318:
4316:
4313:
4311:
4308:
4306:
4303:
4301:
4298:
4297:
4295:
4291:
4285:
4282:
4280:
4277:
4275:
4272:
4270:
4267:
4265:
4262:
4260:
4257:
4255:
4252:
4250:
4247:
4245:
4242:
4241:
4239:
4235:
4229:
4226:
4223:
4220:
4218:
4215:
4213:
4210:
4207:
4204:
4202:
4199:
4196:
4193:
4191:
4188:
4187:
4185:
4183:
4179:
4173:
4170:
4168:
4165:
4163:
4160:
4158:
4155:
4153:
4150:
4148:
4145:
4143:
4140:
4138:
4135:
4134:
4132:
4128:
4122:
4119:
4117:
4114:
4112:
4109:
4107:
4104:
4102:
4099:
4097:
4094:
4092:
4089:
4087:
4084:
4082:
4079:
4077:
4074:
4073:
4071:
4067:
4063:
4056:
4051:
4049:
4044:
4042:
4037:
4036:
4033:
4021:
4018:
4016:
4013:
4011:
4008:
4006:
4003:
4001:
3998:
3996:
3993:
3991:
3988:
3986:
3983:
3981:
3978:
3977:
3975:
3971:
3964:
3961:
3959:
3956:
3954:
3951:
3949:
3946:
3944:
3941:
3939:
3936:
3934:
3931:
3930:
3928:
3926:
3920:
3910:
3907:
3905:
3902:
3900:
3897:
3895:
3892:
3890:
3887:
3883:
3880:
3878:
3875:
3873:
3870:
3869:
3868:
3865:
3864:
3862:
3860:
3856:
3850:
3847:
3843:
3840:
3838:
3835:
3834:
3833:
3830:
3826:
3823:
3822:
3821:
3818:
3816:
3813:
3811:
3810:Demultiplexer
3808:
3806:
3803:
3802:
3800:
3798:
3794:
3788:
3785:
3783:
3780:
3777:
3775:
3772:
3770:
3767:
3765:
3762:
3760:
3757:
3756:
3754:
3752:
3748:
3742:
3739:
3737:
3734:
3732:
3731:Memory buffer
3729:
3727:
3726:Register file
3724:
3722:
3719:
3717:
3714:
3712:
3709:
3708:
3706:
3704:
3700:
3692:
3689:
3687:
3684:
3683:
3682:
3679:
3677:
3674:
3672:
3669:
3667:
3666:Combinational
3664:
3663:
3661:
3659:
3655:
3649:
3646:
3642:
3639:
3638:
3636:
3633:
3631:
3628:
3626:
3623:
3618:
3615:
3613:
3610:
3609:
3607:
3604:
3601:
3598:
3595:
3592:
3589:
3586:
3585:
3583:
3581:
3575:
3569:
3566:
3564:
3561:
3559:
3556:
3554:
3551:
3547:
3544:
3542:
3539:
3537:
3534:
3532:
3529:
3527:
3524:
3522:
3519:
3518:
3517:
3514:
3512:
3509:
3508:
3506:
3502:
3496:
3493:
3491:
3488:
3486:
3483:
3481:
3478:
3477:
3475:
3471:
3463:
3460:
3459:
3458:
3455:
3453:
3450:
3448:
3445:
3443:
3440:
3438:
3435:
3433:
3430:
3428:
3425:
3423:
3420:
3418:
3415:
3413:
3410:
3408:
3405:
3403:
3400:
3398:
3395:
3393:
3390:
3388:
3385:
3384:
3382:
3380:
3376:
3366:
3363:
3361:
3358:
3356:
3353:
3350:
3347:
3344:
3341:
3338:
3335:
3332:
3329:
3327:
3324:
3321:
3318:
3316:
3313:
3311:
3308:
3307:
3305:
3303:
3297:
3290:
3287:
3285:
3282:
3279:
3276:
3273:
3270:
3269:
3267:
3261:
3255:
3252:
3250:
3247:
3245:
3242:
3240:
3237:
3235:
3232:
3230:
3227:
3225:
3222:
3221:
3219:
3215:
3208:
3205:
3202:
3199:
3196:
3193:
3191:
3188:
3186:
3183:
3181:
3178:
3176:
3173:
3171:
3168:
3166:
3163:
3161:
3158:
3156:
3153:
3151:
3148:
3146:
3143:
3139:
3136:
3135:
3133:
3130:
3127:
3124:
3123:
3121:
3119:
3115:
3109:
3106:
3104:
3101:
3098:
3095:
3092:
3089:
3086:
3083:
3080:
3077:
3074:
3071:
3066:
3063:
3062:
3060:
3057:
3055:
3052:
3051:
3049:
3047:
3041:
3029:
3026:
3025:
3024:
3021:
3019:
3016:
3012:
3009:
3007:
3004:
3002:
2999:
2997:
2994:
2993:
2992:
2989:
2987:
2984:
2983:
2981:
2979:
2975:
2969:
2966:
2964:
2961:
2959:
2956:
2952:
2949:
2947:
2944:
2943:
2942:
2939:
2937:
2934:
2933:
2931:
2929:
2925:
2919:
2916:
2914:
2911:
2907:
2904:
2903:
2902:
2899:
2895:
2892:
2890:
2887:
2886:
2885:
2882:
2878:
2875:
2873:
2870:
2869:
2868:
2865:
2863:
2860:
2856:
2853:
2851:
2848:
2847:
2846:
2843:
2842:
2840:
2836:
2833:
2831:
2827:
2817:
2814:
2812:
2809:
2808:
2806:
2804:
2800:
2794:
2791:
2789:
2786:
2782:
2779:
2777:
2774:
2773:
2772:
2769:
2767:
2766:Scoreboarding
2764:
2763:
2761:
2759:
2755:
2749:
2748:False sharing
2746:
2744:
2741:
2739:
2736:
2734:
2731:
2730:
2728:
2726:
2722:
2716:
2713:
2711:
2708:
2706:
2703:
2702:
2700:
2698:
2694:
2691:
2689:
2685:
2675:
2672:
2670:
2667:
2665:
2662:
2659:
2655:
2652:
2650:
2647:
2645:
2642:
2640:
2637:
2636:
2634:
2632:
2629:
2627:
2624:
2622:
2619:
2617:
2614:
2612:
2609:
2607:
2604:
2602:
2599:
2597:
2594:
2592:
2589:
2587:
2584:
2582:
2579:
2577:
2574:
2570:
2567:
2565:
2562:
2560:
2557:
2556:
2554:
2552:
2549:
2547:
2544:
2542:
2541:Stanford MIPS
2539:
2537:
2534:
2532:
2529:
2527:
2524:
2522:
2519:
2517:
2514:
2513:
2511:
2505:
2497:
2494:
2493:
2492:
2489:
2487:
2484:
2482:
2479:
2477:
2474:
2472:
2469:
2467:
2464:
2462:
2459:
2455:
2452:
2451:
2450:
2447:
2443:
2440:
2439:
2438:
2435:
2433:
2430:
2428:
2425:
2423:
2420:
2418:
2415:
2414:
2412:
2408:
2405:
2403:
2402:architectures
2397:
2391:
2388:
2386:
2383:
2381:
2378:
2376:
2373:
2371:
2370:Heterogeneous
2368:
2364:
2361:
2359:
2356:
2355:
2354:
2351:
2349:
2346:
2342:
2339:
2337:
2334:
2332:
2329:
2327:
2324:
2323:
2322:
2321:Memory access
2319:
2317:
2314:
2312:
2309:
2307:
2304:
2302:
2299:
2295:
2292:
2291:
2290:
2287:
2285:
2282:
2280:
2277:
2276:
2274:
2272:
2268:
2260:
2257:
2255:
2254:Random-access
2252:
2250:
2247:
2245:
2242:
2241:
2240:
2237:
2235:
2234:Stack machine
2232:
2230:
2227:
2223:
2220:
2218:
2215:
2213:
2210:
2208:
2205:
2203:
2200:
2198:
2195:
2193:
2190:
2188:
2185:
2184:
2183:
2180:
2176:
2173:
2171:
2168:
2166:
2163:
2161:
2158:
2156:
2153:
2151:
2150:with datapath
2148:
2147:
2146:
2143:
2141:
2138:
2136:
2133:
2132:
2130:
2128:
2124:
2120:
2113:
2108:
2106:
2101:
2099:
2094:
2093:
2090:
2083:
2079:
2076:
2074:
2070:
2067:
2064:
2062:
2059:
2057:
2054:
2052:
2049:
2047:
2044:
2042:
2039:
2038:
2019:on 2011-07-18
2018:
2014:
2010:
2004:
1996:
1992:
1986:
1975:
1968:
1952:
1948:
1942:
1935:. 8 May 2014.
1934:
1930:
1924:
1913:
1906:
1899:
1884:
1880:
1874:
1866:
1862:
1856:
1848:
1844:
1838:
1827:
1820:
1804:
1800:
1796:
1790:
1782:
1778:
1772:
1764:
1758:
1750:
1744:
1736:
1730:
1722:
1716:
1701:
1697:
1691:
1676:
1672:
1666:
1651:
1647:
1641:
1634:
1629:
1622:
1617:
1610:
1605:
1597:
1593:
1586:
1572:
1568:
1562:
1554:
1550:
1546:
1544:0-8186-7029-0
1540:
1536:
1532:
1528:
1521:
1513:
1509:
1505:
1504:11381/2297671
1501:
1497:
1493:
1489:
1482:
1471:
1465:
1457:
1453:
1449:
1445:
1444:
1436:
1432:
1426:
1422:
1412:
1409:
1407:
1404:
1402:
1399:
1397:
1394:
1392:
1389:
1387:
1383:
1379:
1375:
1371:
1367:
1364:
1363:
1357:
1355:
1351:
1346:
1343:
1340:
1336:
1332:
1328:
1324:
1320:
1315:
1313:
1309:
1304:
1300:
1299:PlayStation 2
1296:
1292:
1288:
1283:
1281:
1277:
1273:
1272:deinterlacing
1269:
1265:
1261:
1257:
1253:
1249:
1245:
1240:
1232:
1228:
1226:
1222:
1218:
1214:
1209:
1207:
1203:
1199:
1196:
1187:
1184:
1183:
1182:
1180:
1170:
1168:
1163:
1161:
1152:target_clones
1149:
1145:
1141:
1133:
1129:
1126:
1122:
1121:
1120:
1118:
1107:
1097:
1089:
1083:
1081:
1073:
1069:
1065:
1058:
1056:
1044:
1040:
1035:
1033:
1029:
1019:
1017:
1013:
1009:
1007:
1003:
999:
995:
991:
987:
983:
979:
975:
971:
967:
963:
959:
955:
951:
949:
945:
941:
937:
932:
928:
924:
920:
916:
911:
909:
905:
901:
897:
893:
888:
883:
881:
876:
866:
858:
849:
847:
842:
840:
836:
832:
828:
824:
820:
817:(MaDMaX) and
816:
812:
808:
804:
800:
796:
792:
788:
784:
780:
776:
772:
768:
765:
761:
757:
753:
749:
745:
742:
738:
734:
730:
726:
722:
718:
704:
700:
697:
694:
693:
690:
686:
683:
680:
679:
675:
672:
669:
668:
665:
661:
658:
655:
654:
651:
647:
643:
640:
637:
636:
632:
631:Silicon Optix
628:
624:
620:
616:
613:
610:
609:
606:
603:
600:
599:
595:
592:
589:
588:
585:
582:
579:
578:
574:
571:
570:
566:
555:
553:
549:
545:
541:
537:
532:
530:
525:
523:
513:
509:
505:
501:
498:
495:
492:
488:
485:
481:
480:
478:
475:
471:
467:
463:
460:
456:
452:
448:
447:
444:Disadvantages
441:
439:
430:
426:
424:
419:
415:
405:
401:
399:
395:
391:
387:
383:
379:
375:
372:
368:
364:
360:
355:
353:
349:
345:
341:
337:
331:
329:
328:Intel i860 XP
325:
321:
320:CM-1 and CM-2
318:
314:
310:
305:
303:
299:
295:
291:
287:
283:
279:
274:
272:
262:
260:
255:
253:
249:
245:
241:
237:
233:
229:
228:digital audio
225:
224:digital image
221:
217:
212:
210:
206:
202:
198:
194:
190:
182:
172:
169:
167:
164:
163:
162:
161:
157:
156:
151:
148:
146:
143:
141:
138:
137:
136:
135:
131:
130:
125:
122:
120:
117:
116:
115:
114:
110:
109:
104:
101:
99:
96:
95:
94:
93:
89:
88:
85:
82:
81:
72:
60:
58:
52:
43:
42:
37:
33:
19:
4443:
4329:Coordination
4259:Amdahl's law
4195:Simultaneous
4020:Chip carrier
3958:Clock gating
3877:Mixed-signal
3774:Write buffer
3751:Control unit
3563:Clock signal
3302:accelerators
3284:Cypress PSoC
2990:
2941:Simultaneous
2758:Out-of-order
2390:Neuromorphic
2271:Architecture
2229:Belt machine
2222:Zeno machine
2155:Hierarchical
2021:. Retrieved
2017:the original
2012:
2003:
1994:
1985:
1967:
1955:. Retrieved
1950:
1941:
1932:
1923:
1912:the original
1898:
1886:. Retrieved
1882:
1873:
1864:
1855:
1846:
1837:
1819:
1807:. Retrieved
1803:the original
1798:
1789:
1780:
1771:
1757:
1743:
1729:
1715:
1703:. Retrieved
1699:
1690:
1678:. Retrieved
1674:
1665:
1653:. Retrieved
1649:
1640:
1628:
1616:
1604:
1595:
1585:
1574:. Retrieved
1570:
1561:
1526:
1520:
1487:
1481:
1464:
1447:
1441:
1425:
1347:
1344:
1316:
1312:Direct3D 9.0
1284:
1241:
1238:
1229:
1221:SpiderMonkey
1210:
1191:
1176:
1164:
1137:
1113:
1100:#pragma simd
1084:
1066:
1059:
1036:
1025:
1011:
1010:
952:
912:
884:
872:
843:
809:technology,
714:
703:Pyxsys, Inc.
644:(MPP), from
533:
528:
526:
519:
455:"batch flow"
454:
431:
427:
411:
402:
356:
348:UltraSPARC I
332:
315:such as the
306:
282:CDC Star-100
275:
268:
256:
213:
192:
188:
187:
118:
66:
55:
4765:Scalability
4526:distributed
4409:Concurrency
4376:Programming
4217:Cooperative
4206:Speculative
4142:Instruction
3805:Multiplexer
3769:Data buffer
3480:Single-core
3452:bit slicing
3310:Coprocessor
3165:Coprocessor
3046:performance
2968:Cooperative
2958:Speculative
2918:Distributed
2877:Superscalar
2862:Instruction
2830:Parallelism
2803:Speculative
2635:System/3x0
2507:Instruction
2284:Von Neumann
2197:Post–Turing
1957:8 September
1888:8 September
1339:local store
1287:video games
1278:, adaptive
1225:WebAssembly
1090:4.0+ has a
1047:packed_simd
220:concurrency
4804:Categories
4770:Starvation
4509:asymmetric
4244:PRAM model
4212:Preemptive
3925:management
3820:Multiplier
3681:Logic gate
3671:Sequential
3578:Functional
3558:Clock rate
3531:Data cache
3504:Components
3485:Multi-core
3473:Core count
2963:Preemptive
2867:Pipelining
2850:Bit-serial
2793:Wide-issue
2738:Structural
2660:Tilera ISA
2626:MicroBlaze
2596:ETRAX CRIS
2491:Comparison
2336:Load–store
2316:Endianness
2023:2010-05-24
1705:16 January
1680:16 January
1655:16 January
1576:2023-07-13
1417:References
1354:Bill Dally
1350:ClearSpeed
1213:ECMAScript
1125:subroutine
793:and VIS2,
689:Wavetracer
625:, then at
558:Chronology
529:or greater
506:The early
414:multimedia
408:Advantages
376:and IBM's
236:multimedia
218:, but not
69:March 2017
4504:symmetric
4249:PEM model
3859:Circuitry
3779:Microcode
3703:Registers
3546:coherence
3521:CPU cache
3379:Word size
3044:Processor
2688:Execution
2591:DEC Alpha
2569:Power ISA
2385:Cognitive
2192:Universal
1308:Microsoft
1068:Microsoft
1051:std::sims
1006:Power ISA
978:QuickTime
904:registers
685:Zephyr DC
670:1987-1996
638:1983-1991
584:ILLIAC IV
271:ILLIAC IV
4735:Deadlock
4723:Problems
4689:pthreads
4669:OpenHMPP
4594:Ateji PX
4555:computer
4426:Hardware
4293:Elements
4279:Slowdown
4190:Temporal
4172:Pipeline
3797:Datapath
3490:Manycore
3462:variable
3300:Hardware
2936:Temporal
2616:OpenRISC
2311:Cellular
2301:Dataflow
2294:modified
2069:Archived
1809:9 August
1799:CilkPlus
1512:13180531
1360:See also
1262:to/from
1102:, GCC's
962:Motorola
913:SIMD on
910:coding.
902:and MMX
852:Software
844:Intel's
711:Hardware
575:Example
371:Motorola
354:system.
304:(1977).
284:and the
158:See also
4694:RaftLib
4674:OpenACC
4649:GPUOpen
4639:C++ AMP
4614:Charm++
4356:Barrier
4300:Process
4284:Speedup
4069:General
3973:Related
3904:Quantum
3894:Digital
3889:Boolean
3787:Counter
3686:Quantum
3447:512-bit
3442:256-bit
3437:128-bit
3280:(MPSoC)
3265:on chip
3263:Systems
3081:(FLOPS)
2894:Process
2743:Control
2725:Hazards
2611:Itanium
2606:Unicore
2564:PowerPC
2289:Harvard
2249:Pointer
2244:Counter
2202:Quantum
2013:ZiiLabs
1847:lwn.net
1596:SIGARCH
1553:2262046
1386:AVX-512
1327:Toshiba
1252:Teranex
1158:). The
1080:OpenJDK
958:AltiVec
940:SIMDx86
936:libSIMD
846:AVX-512
831:Philips
819:MIPS-3D
746:(MAX),
741:PA-RISC
733:PowerPC
725:AltiVec
627:Teranex
451:parsing
398:AVX-512
390:AVX-512
374:PowerPC
367:AltiVec
340:PA-RISC
311:-style
265:History
257:Modern
4787:
4664:OpenCL
4659:OpenMP
4604:Chapel
4521:shared
4516:Memory
4451:(SIMT)
4394:Models
4305:Thread
4237:Theory
4208:(SpMT)
4162:Memory
4147:Thread
4130:Levels
3909:Switch
3899:Analog
3637:(IMC)
3608:(MMU)
3457:others
3432:64-bit
3427:48-bit
3422:32-bit
3417:24-bit
3412:16-bit
3407:15-bit
3402:12-bit
3239:Mobile
3155:Stream
3150:Barrel
3145:Vector
3134:(GPU)
3093:(SUPS)
3061:(IPC)
2913:Memory
2906:Vector
2889:Thread
2872:Scalar
2674:Others
2621:RISC-V
2586:SuperH
2555:Power
2551:MIPS-X
2526:PDP-11
2375:Fabric
2127:Models
2082:GitHub
1951:GitHub
1933:01.org
1551:
1541:
1510:
1411:OpenCL
1293:since
1204:, and
1088:OpenMP
1057:2.0+.
974:iTunes
919:3DNow!
896:3DNow!
833:, now
779:3DNow!
771:SSE4.x
674:MasPar
596:(DAP)
554:page.
534:ARM's
522:RISC-V
396:, and
302:Cray-1
4634:Dryad
4599:Boost
4320:Array
4310:Fiber
4224:(CMT)
4197:(SMT)
4111:GPGPU
3965:(PPW)
3923:Power
3815:Adder
3691:Array
3658:Logic
3619:(TLB)
3602:(FPU)
3596:(AGU)
3590:(ALU)
3580:units
3516:Cache
3397:8-bit
3392:4-bit
3387:1-bit
3351:(TPU)
3345:(DSP)
3339:(PPU)
3333:(VPU)
3322:(GPU)
3291:(NoC)
3274:(SoC)
3209:(PoP)
3203:(SiP)
3197:(MCM)
3138:GPGPU
3128:(CPU)
3118:Types
3099:(PPW)
3087:(TPS)
3075:(IPS)
3067:(CPI)
2838:Level
2649:S/390
2644:S/370
2639:S/360
2581:SPARC
2559:POWER
2442:TRIPS
2410:Types
2078:simde
1977:(PDF)
1915:(PDF)
1908:(PDF)
1829:(PDF)
1549:S2CID
1508:S2CID
1473:(PDF)
1438:(PDF)
1167:Glibc
1148:Clang
1055:Swift
1014:, or
990:XCode
944:SLEEF
931:Intel
880:GPGPU
839:Xetal
787:SPARC
767:SSSE3
748:Intel
717:Alpha
701:from
699:Xplor
687:from
617:from
418:pixel
378:POWER
4699:ROCm
4629:CUDA
4619:Cilk
4586:APIs
4546:COMA
4541:NUMA
4472:MIMD
4467:MISD
4444:SIMD
4439:SISD
4167:Loop
4157:Data
4152:Task
3943:ACPI
3676:Glue
3568:FIFO
3511:Core
3249:ASIP
3190:CPLD
3185:FPOA
3180:FPGA
3175:ASIC
3028:SPMD
3023:MIMD
3018:MISD
3011:SWAR
2991:SIMD
2986:SISD
2901:Data
2884:Task
2855:Word
2601:M32R
2546:MIPS
2509:sets
2476:ZISC
2471:NISC
2466:OISC
2461:MISC
2454:EPIC
2449:VLIW
2437:EDGE
2427:RISC
2422:CISC
2331:HUMA
2326:NUMA
1959:2019
1890:2019
1811:2020
1707:2020
1682:2020
1657:2020
1539:ISBN
1448:C-21
1378:SSE3
1374:SSE2
1335:NUMA
1331:Sony
1329:and
1295:1998
1268:HDTV
1260:NTSC
1244:GAPP
1219:and
1179:Dart
1156:/Qax
1096:Cilk
1072:.NET
1043:LLVM
1037:The
1016:SWAR
998:SSE3
996:and
994:SSE2
984:and
982:APIs
976:and
968:and
948:libm
942:and
925:and
894:and
815:MDMX
811:MIPS
807:Neon
799:MAJC
769:and
764:SSE3
760:SSE2
731:for
727:and
695:2001
681:1991
656:1985
646:NASA
629:and
611:1981
601:1976
590:1974
580:1974
572:Year
512:SSE2
423:DSPs
394:AVX2
388:and
386:AVX2
352:MDMX
324:MIMD
290:Cray
244:SIMT
193:SIMD
171:MPMD
166:SPMD
124:MIMD
119:SIMD
103:MISD
98:SISD
18:SIMD
4714:ZPL
4709:TBB
4704:UPC
4684:PVM
4654:MPI
4609:HPX
4536:UMA
4137:Bit
3938:APM
3933:PMU
3825:CPU
3782:ROM
3553:Bus
3170:PAL
2845:Bit
2631:LMC
2536:ARM
2531:x86
2521:VAX
2080:on
1531:doi
1500:hdl
1492:doi
1452:doi
1370:MMX
1323:IBM
1310:'s
1303:DSP
1264:PAL
1195:4Ă—4
1098:'s
970:GNU
966:IBM
950:).
929:by
927:SSE
923:AMD
921:by
915:x86
900:FPU
892:MMX
835:NXP
827:SPU
825:'s
805:'s
803:ARM
797:'s
795:Sun
791:VIS
789:'s
783:ARC
777:'s
775:AMD
756:SSE
750:'s
739:'s
729:SPE
723:'s
721:IBM
508:MMX
434:add
382:SSE
363:x86
359:MMX
344:VIS
336:MAX
250:or
232:CPU
199:in
4806::
3872:3D
2011:.
1993:.
1949:.
1931:.
1881:.
1863:.
1845:.
1797:.
1779:.
1698:.
1673:.
1648:.
1594:.
1569:.
1547:.
1537:.
1506:.
1498:.
1490:.
1446:.
1440:.
1384:,
1380:,
1376:,
1372:,
1368:,
1274:,
1217:V8
1200:,
1142:,
964:,
938:,
869:4.
813:'
801:,
781:,
773:,
762:,
758:,
754:,
737:HP
735:,
633:)
567:)
476:.)
4054:e
4047:t
4040:v
2111:e
2104:t
2097:v
2026:.
1997:.
1979:.
1961:.
1892:.
1867:.
1849:.
1831:.
1813:.
1783:.
1765:.
1709:.
1684:.
1659:.
1598:.
1579:.
1555:.
1533::
1514:.
1502::
1494::
1475:.
1458:.
1454::
988:(
878:(
648:/
466:C
191:(
71:)
67:(
61:.
38:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.