1209:
155:
and an accumulator register that stores the result. The output of the register is fed back to one input of the adder, so that on each clock cycle, the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more
467:
using a succession of multiply and add steps. Instruction descriptions do not specify whether the multiply and add are performed using a single FMA step. This instruction has been a part of the VAX instruction set since its original 11/780 implementation in 1977.
483:
standard math library function and the automatic transformation of a multiplication followed by an addition (contraction of floating-point expressions), which can be explicitly enabled or disabled with standard pragmas
496:
C compilers do such transformations by default for processor architectures that support FMA instructions. With GCC, which does not support the aforementioned pragma, this can be globally controlled by the
164:
was the first to conceive a MAC in his
Analytical Machine of 1909, and the first to exploit a MAC for division (using multiplication seeded by reciprocal, via the convergent series
112:
216:.) Therefore, it makes a difference to the result whether the multiply–add is performed with two roundings, or in one operation with a single rounding (a fused multiply–add).
397:, an FMA can be faster than a multiply operation followed by an add. However, standard industrial implementations based on the original IBM RS/6000 design require a 2
390:
due to the first multiplication discarding low significance bits. This could then lead to an error if, for instance, the square root of the result is then evaluated.
1972:
328:
1280:
1325:
1251:
1441:
428:
Some machines combine multiple fused multiply add operations into a single step, e.g. performing a four-element dot-product on two 128-bit
1237:
1027:
1003:
1822:
1377:
730:
1967:
1651:
1355:
17:
1122:
1318:
1188:
Montoye, R. K.; Hokenek, E.; Runyon, S. L. (January 1990). "Design of the IBM RISC System/6000 floating-point execution unit".
1404:
910:
710:
936:"PV-MAC: Multiply-and-accumulate unit structure exploiting precision variability in on-device convolutional neural networks"
1603:
73:
716:
2047:
213:
1615:
1598:
1311:
429:
417:
769:
1610:
1532:
1066:
66:); the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulator
604:
288:
A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products:
597:
1093:
811:
611:
452:
835:
2042:
1806:
1571:
1259:
935:
748:
631:
530:
404:
Another benefit of including this instruction is that it allows an efficient software implementation of
2021:
1938:
981:
754:
742:
322:
368:(following Kahan's suggested notation in which redundant parentheses direct the compiler to round the
1875:
1360:
760:
736:
197:
125:
117:
35:
1208:
1987:
1681:
1334:
1107:
476:
173:
54:) operation is a common step that computes the product of two numbers and adds that product to an
1223:"Godson-3 Emulates x86: New MIPS-Compatible Chinese Processor Has Extensions for x86 Translation"
618:
489:
55:
858:"A Method of Increasing Digital Filter Performance Based on Truncated Multiply-Accumulate Units"
1796:
1102:
405:
157:
1051:
1982:
1943:
1754:
1686:
791:
663:
590:
551:
302:
297:
244:), with a single rounding. That is, where an unfused multiply–add would compute the product
1950:
1928:
1907:
1702:
1676:
1625:
1593:
1164:
1047:
682:
657:
335:
201:
8:
2009:
1955:
1849:
1646:
1497:
584:
152:
148:
220:
specifies that it must be performed with one rounding, yielding a more accurate result.
147:
Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in
1999:
1656:
1559:
1222:
963:
916:
722:
409:
312:
189:
1992:
1791:
1666:
1554:
1140:
1126:
1004:"Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs"
967:
955:
906:
567:
420:) operations, thus eliminating the need for dedicated hardware for those operations.
128:), or with a single rounding. When performed with a single rounding, it is called a
920:
334:
Fused multiply–add can usually be relied on to give more accurate results. However,
2004:
1977:
1912:
1890:
1854:
1786:
1197:
947:
898:
869:
856:
Lyakhov, Pavel; Valueva, Maria; Valuev, Georgii; Nagornov, Nikolai (January 2020).
504:
The fused multiply–add operation was introduced as "multiply–add fused" in the IBM
464:
306:
1960:
1885:
1620:
1564:
1431:
512:
241:
891:"Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements"
380:
term first) using fused multiply–add, then the result may be negative even when
1801:
1527:
951:
902:
890:
701:
642:
537:
394:
209:
1436:
508:(1990) processor, but has been added to numerous other processors since then:
2036:
1837:
1581:
1477:
1456:
959:
688:
651:
624:
521:
445:
217:
205:
161:
1844:
1419:
1397:
1392:
1303:
318:
266:
significant bits, a fused multiply–add would compute the entire expression
193:
895:
2009 IEEE International
Symposium on Parallel & Distributed Processing
2014:
1832:
1707:
1507:
1446:
669:
413:
292:
1201:
888:
874:
857:
1902:
1870:
1827:
1717:
1482:
1414:
780:
560:
176:, but the technique is now also common in general-purpose processors.
1933:
1492:
1370:
774:
577:
31:
524:
240:) is a floating-point multiply–add operation performed in one step (
1880:
1365:
675:
571:
338:
has pointed out that it can give problems if used unthinkingly. If
121:
172:). The first modern processors to be equipped with MAC units were
1774:
1641:
1586:
1502:
1487:
1451:
1382:
889:
Tung Thanh Hoang; Sjalander, M.; Larsson-Edefors, P. (May 2009).
831:
557:
544:
534:
515:
185:
1095:
Software
Division and Square Root Using Goldschmidt's Algorithms
1671:
1661:
1547:
1542:
1472:
1426:
1409:
1387:
798:
505:
281:
to its full precision before rounding the final result down to
58:. The hardware unit that performs the operation is known as a
204:. That is, digital floating-point arithmetic is generally not
1897:
1779:
1764:
1747:
1742:
1737:
1732:
1727:
1722:
1712:
493:
1165:"Optimize Options (Using the GNU Compiler Collection (GCC))"
855:
1769:
1759:
1537:
1252:"Intel adds 22nm octo-core 'Haswell' to CPU design roadmap"
1032:
698:
472:
456:
1052:"IEEE Standard 754 for Binary Floating-Point Arithmetic"
76:
1187:
463:
instruction is used for evaluating polynomials with
315:
for evaluating functions (from the inverse function)
1001:
200:numbers have only a certain amount of mathematical
1141:"Bug 20785 - Pragma STDC * (C99 FP) unimplemented"
214:Floating-point arithmetic § Accuracy problems
106:
832:"The Feasibility of Ludgate's Analytical Machine"
2034:
1101:. 6th Conference on Real Numbers and Computers.
1085:
1238:"New "Bulldozer" and "Piledriver" Instructions"
1068:Floating-Point Fused Multiply–Add Architectures
691:has "Four-operand FMA with Prefix Instruction".
179:
27:Operation common in numerical signal processing
1319:
1235:
188:, the operation is typically exact (computed
1333:
1002:Whitehead, Nathan; Fit-Florea, Alex (2011).
934:Kang, Jongsung; Kim, Taewhan (2020-03-01).
1326:
1312:
1281:"STM32 Cortex-M33 MCUs programming manual"
423:
1106:
1091:
873:
639:ARM processors with VFPv4 and/or NEONv2:
1064:
401:-bit adder to compute the sum properly.
120:numbers, it might be performed with two
1190:IBM Journal of Research and Development
933:
479:supports the FMA operation through the
14:
2035:
1307:
1236:Hollingsworth, Brent (October 2012).
1046:
223:
258:significant bits, add the result to
107:{\displaystyle a\gets a+(b\times c)}
1652:Input–output memory management unit
1123:"VAX instruction of the week: POLY"
24:
1092:Markstein, Peter (November 2004).
25:
2059:
648:STM32 Cortex-M33 (VFMA operation)
444:The FMA operation is included in
418:methods of computing square roots
1207:
585:FMA3 and/or FMA4 instruction set
1273:
1244:
1229:
1215:
1181:
1157:
1133:
1115:
838:from the original on 2019-08-07
485:
1058:
1040:
1020:
995:
974:
927:
882:
849:
824:
436:with single cycle throughput.
160:typical of earlier computers.
101:
89:
80:
13:
1:
817:
453:Digital Equipment Corporation
434:a0×b0 + a1×b1 + a2×b2 + a3×b3
158:method of shifting and adding
180:In floating-point arithmetic
7:
1065:Quinnell, Eric (May 2007).
805:
10:
2064:
952:10.1016/j.vlsi.2019.11.003
903:10.1109/IPDPS.2009.5161212
439:
393:When implemented inside a
323:artificial neural networks
2048:Digital signal processing
1921:
1863:
1815:
1695:
1634:
1520:
1465:
1348:
1341:
174:digital signal processors
138:fused multiply–accumulate
36:digital signal processing
1682:Video display controller
1335:Graphics processing unit
1240:. AMD Developer Central.
486:#pragma STDC FP_CONTRACT
329:double-double arithmetic
717:TeraScale 2 "Evergreen"
707:GPUs and GPGPU boards:
424:Dot product instruction
1797:Shared graphics memory
801:instruction set (2010)
477:C programming language
108:
60:multiplier–accumulator
1983:Hardware acceleration
1687:Video processing unit
792:NEC SX-Aurora TSUBASA
600:(2012, FMA3 and FMA4)
501:command line option.
303:Polynomial evaluation
298:Matrix multiplication
109:
1908:Performance per watt
1677:Texture mapping unit
1626:Unified shader model
781:ARM Mali T600 Series
583:x86 processors with
262:, and round back to
74:
2043:Computer arithmetic
1850:Integrated graphics
1202:10.1147/rd.341.0059
875:10.3390/app10249052
788:Vector Processors:
149:combinational logic
40:multiply–accumulate
18:Multiply–accumulate
2000:Parallel computing
1876:Display resolution
1657:Render output unit
1647:Geometry processor
984:. 20 November 2019
723:Graphics Core Next
410:division algorithm
327:Multiplication in
285:significant bits.
230:fused multiply–add
224:Fused multiply–add
130:fused multiply–add
104:
2030:
2029:
1845:External graphics
1828:Discrete graphics
1792:Memory controller
1555:Graphics pipeline
1516:
1515:
912:978-1-4244-3751-1
812:Compound operator
768:Intel GPUs since
733:(2010) and newer
713:(2009) and newer
634:(2015, FMA3 only)
627:(2013, FMA3 only)
621:(2017, FMA3 only)
593:(2011, FMA4 only)
156:quickly than the
124:(typical in many
16:(Redirected from
2055:
2005:Vector processor
1988:Image processing
1978:Graphics library
1913:Transistor count
1855:System on a chip
1787:Memory bandwidth
1667:Stream processor
1346:
1345:
1328:
1321:
1314:
1305:
1304:
1298:
1297:
1295:
1294:
1285:
1277:
1271:
1270:
1268:
1267:
1258:. Archived from
1248:
1242:
1241:
1233:
1227:
1226:
1219:
1213:
1212:
1211:
1205:
1185:
1179:
1178:
1176:
1175:
1161:
1155:
1154:
1152:
1151:
1137:
1131:
1130:
1125:. Archived from
1119:
1113:
1112:
1110:
1100:
1089:
1083:
1082:
1080:
1079:
1073:
1062:
1056:
1055:
1044:
1038:
1037:
1024:
1018:
1017:
1015:
1014:
1008:
999:
993:
992:
990:
989:
978:
972:
971:
931:
925:
924:
897:. pp. 1–7.
886:
880:
879:
877:
862:Applied Sciences
853:
847:
846:
844:
843:
828:
783:(2012) and above
563:(2007) and above
518:(1996) and above
500:
487:
482:
462:
435:
389:
379:
367:
348:is evaluated as
347:
280:
253:
171:
113:
111:
110:
105:
21:
2063:
2062:
2058:
2057:
2056:
2054:
2053:
2052:
2033:
2032:
2031:
2026:
1917:
1859:
1811:
1691:
1630:
1621:Tiled rendering
1512:
1461:
1432:InfiniteReality
1337:
1332:
1302:
1301:
1292:
1290:
1283:
1279:
1278:
1274:
1265:
1263:
1250:
1249:
1245:
1234:
1230:
1221:
1220:
1216:
1206:
1186:
1182:
1173:
1171:
1163:
1162:
1158:
1149:
1147:
1139:
1138:
1134:
1121:
1120:
1116:
1098:
1090:
1086:
1077:
1075:
1071:
1063:
1059:
1045:
1041:
1026:
1025:
1021:
1012:
1010:
1006:
1000:
996:
987:
985:
980:
979:
975:
932:
928:
913:
887:
883:
854:
850:
841:
839:
830:
829:
825:
820:
808:
498:
480:
460:
442:
433:
426:
381:
369:
349:
339:
313:Newton's method
267:
245:
242:fused operation
226:
184:When done with
182:
165:
151:followed by an
116:When done with
75:
72:
71:
28:
23:
22:
15:
12:
11:
5:
2061:
2051:
2050:
2045:
2028:
2027:
2025:
2024:
2019:
2018:
2017:
2007:
2002:
1997:
1996:
1995:
1985:
1980:
1975:
1970:
1965:
1964:
1963:
1958:
1948:
1947:
1946:
1941:
1936:
1925:
1923:
1919:
1918:
1916:
1915:
1910:
1905:
1900:
1895:
1894:
1893:
1888:
1878:
1873:
1867:
1865:
1861:
1860:
1858:
1857:
1852:
1847:
1842:
1841:
1840:
1835:
1825:
1819:
1817:
1813:
1812:
1810:
1809:
1804:
1802:Texture memory
1799:
1794:
1789:
1784:
1783:
1782:
1777:
1772:
1767:
1762:
1752:
1751:
1750:
1745:
1740:
1735:
1730:
1725:
1720:
1710:
1705:
1699:
1697:
1693:
1692:
1690:
1689:
1684:
1679:
1674:
1669:
1664:
1659:
1654:
1649:
1644:
1638:
1636:
1632:
1631:
1629:
1628:
1623:
1618:
1613:
1608:
1607:
1606:
1596:
1591:
1590:
1589:
1579:
1574:
1569:
1568:
1567:
1562:
1552:
1551:
1550:
1545:
1540:
1530:
1528:Compute kernel
1524:
1522:
1518:
1517:
1514:
1513:
1511:
1510:
1505:
1500:
1495:
1490:
1485:
1480:
1475:
1469:
1467:
1463:
1462:
1460:
1459:
1454:
1449:
1444:
1439:
1434:
1429:
1424:
1423:
1422:
1417:
1412:
1402:
1401:
1400:
1395:
1390:
1385:
1375:
1374:
1373:
1368:
1363:
1352:
1350:
1343:
1339:
1338:
1331:
1330:
1323:
1316:
1308:
1300:
1299:
1272:
1243:
1228:
1214:
1180:
1156:
1132:
1129:on 2020-02-13.
1114:
1108:10.1.1.85.9648
1084:
1057:
1050:(1996-05-31).
1048:Kahan, William
1039:
1028:"fmadd instrs"
1019:
994:
973:
926:
911:
881:
848:
822:
821:
819:
816:
815:
814:
807:
804:
803:
802:
796:
795:
794:
786:
785:
784:
778:
772:
766:
765:
764:
758:
752:
746:
740:
728:
727:
726:
720:
705:
702:z/Architecture
696:
695:
694:
693:
692:
679:
673:
670:Qualcomm Krait
667:
664:ARM Cortex-A15
661:
655:
649:
646:
643:ARM Cortex-M4F
637:
636:
635:
628:
622:
615:
608:
601:
594:
581:
575:
564:
555:
548:
541:
538:Emotion Engine
528:
519:
441:
438:
425:
422:
395:microprocessor
332:
331:
325:
316:
310:
300:
295:
254:, round it to
225:
222:
198:floating-point
181:
178:
118:floating-point
103:
100:
97:
94:
91:
88:
85:
82:
79:
26:
9:
6:
4:
3:
2:
2060:
2049:
2046:
2044:
2041:
2040:
2038:
2023:
2020:
2016:
2013:
2012:
2011:
2008:
2006:
2003:
2001:
1998:
1994:
1991:
1990:
1989:
1986:
1984:
1981:
1979:
1976:
1974:
1971:
1969:
1966:
1962:
1959:
1957:
1954:
1953:
1952:
1949:
1945:
1942:
1940:
1937:
1935:
1932:
1931:
1930:
1927:
1926:
1924:
1920:
1914:
1911:
1909:
1906:
1904:
1901:
1899:
1896:
1892:
1889:
1887:
1884:
1883:
1882:
1879:
1877:
1874:
1872:
1869:
1868:
1866:
1862:
1856:
1853:
1851:
1848:
1846:
1843:
1839:
1836:
1834:
1831:
1830:
1829:
1826:
1824:
1821:
1820:
1818:
1814:
1808:
1805:
1803:
1800:
1798:
1795:
1793:
1790:
1788:
1785:
1781:
1778:
1776:
1773:
1771:
1768:
1766:
1763:
1761:
1758:
1757:
1756:
1753:
1749:
1746:
1744:
1741:
1739:
1736:
1734:
1731:
1729:
1726:
1724:
1721:
1719:
1716:
1715:
1714:
1711:
1709:
1706:
1704:
1701:
1700:
1698:
1694:
1688:
1685:
1683:
1680:
1678:
1675:
1673:
1670:
1668:
1665:
1663:
1660:
1658:
1655:
1653:
1650:
1648:
1645:
1643:
1640:
1639:
1637:
1633:
1627:
1624:
1622:
1619:
1617:
1614:
1612:
1609:
1605:
1602:
1601:
1600:
1597:
1595:
1592:
1588:
1585:
1584:
1583:
1582:Rasterisation
1580:
1578:
1575:
1573:
1572:HDR rendering
1570:
1566:
1563:
1561:
1558:
1557:
1556:
1553:
1549:
1546:
1544:
1541:
1539:
1536:
1535:
1534:
1531:
1529:
1526:
1525:
1523:
1519:
1509:
1506:
1504:
1501:
1499:
1496:
1494:
1491:
1489:
1486:
1484:
1481:
1479:
1478:Apple silicon
1476:
1474:
1471:
1470:
1468:
1464:
1458:
1457:Apple silicon
1455:
1453:
1450:
1448:
1445:
1443:
1440:
1438:
1435:
1433:
1430:
1428:
1425:
1421:
1418:
1416:
1413:
1411:
1408:
1407:
1406:
1403:
1399:
1396:
1394:
1391:
1389:
1386:
1384:
1381:
1380:
1379:
1376:
1372:
1369:
1367:
1364:
1362:
1359:
1358:
1357:
1354:
1353:
1351:
1347:
1344:
1340:
1336:
1329:
1324:
1322:
1317:
1315:
1310:
1309:
1306:
1289:
1282:
1276:
1262:on 2012-02-17
1261:
1257:
1253:
1247:
1239:
1232:
1224:
1218:
1210:
1203:
1199:
1195:
1191:
1184:
1170:
1166:
1160:
1146:
1142:
1136:
1128:
1124:
1118:
1109:
1104:
1097:
1096:
1088:
1070:
1069:
1061:
1053:
1049:
1043:
1035:
1034:
1029:
1023:
1005:
998:
983:
977:
969:
965:
961:
957:
953:
949:
945:
941:
937:
930:
922:
918:
914:
908:
904:
900:
896:
892:
885:
876:
871:
867:
863:
859:
852:
837:
833:
827:
823:
813:
810:
809:
800:
797:
793:
790:
789:
787:
782:
779:
776:
773:
771:
767:
763:-based (2017)
762:
759:
757:-based (2016)
756:
753:
751:-based (2014)
750:
747:
745:-based (2012)
744:
741:
739:-based (2010)
738:
735:
734:
732:
729:
724:
721:
719:-series based
718:
715:
714:
712:
709:
708:
706:
703:
700:
697:
690:
689:Fujitsu A64FX
687:
686:
684:
680:
677:
674:
671:
668:
665:
662:
659:
658:ARM Cortex-A7
656:
653:
652:ARM Cortex-A5
650:
647:
644:
641:
640:
638:
633:
629:
626:
625:Intel Haswell
623:
620:
616:
613:
609:
606:
602:
599:
595:
592:
588:
587:
586:
582:
579:
576:
573:
570:-compatible)
569:
565:
562:
559:
556:
553:
549:
546:
542:
539:
536:
532:
529:
526:
523:
520:
517:
514:
511:
510:
509:
507:
502:
499:-ffp-contract
495:
491:
478:
474:
473:1999 standard
469:
466:
465:Horner's rule
458:
454:
449:
447:
446:IEEE 754-2008
437:
431:
421:
419:
415:
411:
407:
402:
400:
396:
391:
388:
384:
377:
373:
365:
361:
357:
353:
346:
342:
337:
336:William Kahan
330:
326:
324:
320:
317:
314:
311:
308:
307:Horner's rule
304:
301:
299:
296:
294:
291:
290:
289:
286:
284:
278:
274:
270:
265:
261:
257:
252:
248:
243:
239:
235:
231:
221:
219:
218:IEEE 754-2008
215:
211:
207:
203:
199:
195:
191:
187:
177:
175:
169:
163:
162:Percy Ludgate
159:
154:
150:
145:
143:
139:
135:
131:
127:
123:
119:
114:
98:
95:
92:
86:
83:
77:
69:
65:
61:
57:
53:
49:
45:
41:
37:
34:, especially
33:
19:
2010:Video coding
1611:Tessellation
1576:
1521:Architecture
1291:. Retrieved
1287:
1275:
1264:. Retrieved
1260:the original
1256:The Register
1255:
1246:
1231:
1217:
1196:(1): 59–70.
1193:
1189:
1183:
1172:. Retrieved
1168:
1159:
1148:. Retrieved
1144:
1135:
1127:the original
1117:
1094:
1087:
1076:. Retrieved
1074:(PhD thesis)
1067:
1060:
1042:
1031:
1022:
1011:. Retrieved
997:
986:. Retrieved
976:
943:
939:
929:
894:
884:
868:(24): 9052.
865:
861:
851:
840:. Retrieved
826:
770:Sandy Bridge
704:(since 1998)
503:
470:
450:
443:
427:
403:
398:
392:
386:
382:
375:
371:
363:
359:
355:
351:
344:
340:
333:
319:Convolutions
305:(e.g., with
287:
282:
276:
272:
268:
263:
259:
255:
250:
246:
237:
233:
229:
227:
210:distributive
196:). However,
194:power of two
183:
167:
146:
141:
137:
133:
129:
115:
67:
63:
59:
51:
48:multiply-add
47:
43:
39:
29:
1993:Compression
1864:Performance
1816:Form factor
1708:Framebuffer
1672:Tensor unit
1662:Shader unit
1594:Ray-tracing
1533:Fabrication
1508:Intel 2700G
1442:3dfx Voodoo
1437:NEC µPD7220
1169:gcc.gnu.org
1145:gcc.gnu.org
940:Integration
731:Nvidia GPUs
685:processors
605:Steamroller
525:SuperH SH-4
414:square root
293:Dot product
206:associative
56:accumulator
2037:Categories
1903:Frame rate
1871:Clock rate
1833:Clustering
1635:Components
1415:Radeon Pro
1293:2024-05-06
1266:2008-08-19
1174:2022-02-02
1150:2022-02-02
1078:2011-03-28
1013:2013-08-31
988:2021-08-14
982:"mad - ps"
842:2020-08-30
818:References
598:Piledriver
578:Elbrus-8SV
574:-2F (2008)
561:SPARC64 VI
432:registers
1934:Scrolling
1838:Switching
1493:VideoCore
1103:CiteSeerX
968:211264132
960:0167-9260
946:: 76–85.
775:Intel MIC
612:Excavator
591:Bulldozer
202:precision
122:roundings
96:×
81:←
32:computing
1881:Fillrate
1560:Geometry
1420:Instinct
1009:. nvidia
921:14535090
836:Archived
806:See also
711:AMD GPUs
676:Apple A6
572:Loongson
406:division
186:integers
64:MAC unit
1961:Texture
1891:Texel/s
1886:Pixel/s
1823:IP core
1775:HBM-PIM
1642:Blitter
1616:T&L
1587:Shading
1503:Imageon
1498:Vivante
1488:PowerVR
1452:Glaze3D
1383:GeForce
1349:Desktop
749:Maxwell
632:Skylake
558:Fujitsu
545:Itanium
535:Toshiba
522:Hitachi
516:PA-8000
488:). The
475:of the
440:Support
212:. (See
1939:Sprite
1898:FLOP/s
1696:Memory
1565:Vertex
1548:MOSFET
1543:FinFET
1473:Adreno
1466:Mobile
1427:Matrox
1410:Radeon
1388:Quadro
1378:Nvidia
1105:
966:
958:
919:
909:
799:RISC-V
777:(2012)
755:Pascal
743:Kepler
725:-based
678:(2012)
672:(2012)
666:(2012)
660:(2013)
654:(2012)
645:(2010)
630:Intel
614:(2015)
607:(2014)
580:(2018)
554:(2006)
547:(2001)
543:Intel
540:(1999)
527:(1998)
506:POWER1
455:(DEC)
412:) and
190:modulo
38:, the
2015:Codec
1973:GPGPU
1780:HBM3E
1765:HBM2E
1748:GDDR7
1743:GDDR6
1738:GDDR5
1733:GDDR4
1728:GDDR3
1723:GDDR2
1713:SGRAM
1398:Tegra
1393:Tesla
1356:Intel
1284:(PDF)
1099:(PDF)
1072:(PDF)
1007:(PDF)
964:S2CID
917:S2CID
761:Volta
737:Fermi
683:ARMv8
494:Clang
481:fma()
416:(see
408:(see
238:fmadd
192:some
153:adder
136:) or
46:) or
2022:VLIW
1968:ASIC
1944:Tile
1922:Misc
1807:VRAM
1770:HBM3
1760:HBM2
1718:GDDR
1604:SIMT
1599:SIMD
1538:CMOS
1483:Mali
956:ISSN
907:ISBN
681:All
617:AMD
610:AMD
603:AMD
596:AMD
589:AMD
568:MIPS
552:Cell
550:STI
492:and
471:The
461:POLY
451:The
430:SIMD
358:) −
321:and
142:FMAC
126:DSPs
1755:HBM
1703:DMA
1577:MAC
1405:AMD
1371:Arc
1342:GPU
1198:doi
1033:IBM
948:doi
899:doi
870:doi
699:IBM
619:Zen
531:SCE
490:GCC
459:'s
457:VAX
271:+ (
236:or
234:FMA
208:or
166:(1+
144:).
134:FMA
52:MAD
44:MAC
30:In
2039::
1956:GI
1951:3D
1929:2D
1447:S3
1366:Xe
1361:GT
1288:ST
1286:.
1254:.
1194:34
1192:.
1167:.
1143:.
1030:.
962:.
954:.
944:71
942:.
938:.
915:.
905:.
893:.
866:10
864:.
860:.
834:.
513:HP
448:.
385:=
374:×
362:×
354:×
350:((
343:−
275:×
249:×
228:A
70::
1327:e
1320:t
1313:v
1296:.
1269:.
1225:.
1204:.
1200::
1177:.
1153:.
1111:.
1081:.
1054:.
1036:.
1016:.
991:.
970:.
950::
923:.
901::
878:.
872::
845:.
566:(
533:-
484:(
399:N
387:y
383:x
378:)
376:x
372:x
370:(
366:)
364:y
360:y
356:x
352:x
345:y
341:x
309:)
283:N
279:)
277:c
273:b
269:a
264:N
260:a
256:N
251:c
247:b
232:(
170:)
168:x
140:(
132:(
102:)
99:c
93:b
90:(
87:+
84:a
78:a
68:a
62:(
50:(
42:(
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.