Knowledge

Loop fission and fusion

Source ๐Ÿ“

1433:
Some languages specifically targeted towards numerical computing such as Julia might have the concept of loop fusion built into it at a high level, where the compiler will notice adjacent elementwise operations and fuse them into a single loop. Currently, to achieve the same syntax in general purpose
95:
data dependencies that require an intermediate allocation to store the results). If loop fusion is able to remove redundant allocations, performance increases can be large. Otherwise, there is a more complex trade-off between data locality, instruction-level parallelism, and loop overhead (branching,
1442:
functions must pessimistically allocate arrays to store their results, since they do not know what context they will be called from. This issue can be avoided in C++ by using a different syntax that does not rely on the compiler to remove unnecessary temporary allocations (e.g., using functions and
91:. This is possible when there are no data dependencies between the bodies of the two loops (this is in stark contrast to the other main benefit of loop fusion described above, which only presents itself when there 1601: 36:
is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large loop body into smaller ones to achieve better utilization of
80:
within each loop. One of the main benefits of loop fusion is that it allows temporary allocations to be avoided, which can lead to huge performance gains in numerical computing languages such as
87:
Other benefits of loop fusion are that it avoids the overhead of the loop control structures, and also that it allows the loop body to be parallelized by the processor by taking advantage of
84:
when doing elementwise operations on arrays (however, Julia's loop fusion is not technically a compiler optimization, but a syntactic guarantee of the language).
1749: 1722: 1896: 1930: 1553: 1496: 1742: 1486: 2102: 1524: 1979: 1880: 2128: 2021: 1735: 1430:
11.1, this loop fusion and redundant allocation removal does not occur - even on the highest optimization level.
1406:
All of these steps are individually possible. Even step four is possible despite the fact that functions like
1916: 88: 2000: 81: 2051: 2016: 1940: 1815: 96:
incrementing, etc.) that may make loop fusion, loop fission, or neither, the preferable optimization.
1795: 1700: 1399:
Remove the unused stores into the temporary arrays (can use a register or stack variable instead).
1800: 1427: 1948: 1925: 1901: 1830: 76:, two loops may actually perform better than one loop because, for example, there is increased 73: 2077: 2072: 2026: 1953: 1777: 1772: 61: 38: 29: 2107: 2031: 1875: 1460: 42: 8: 2087: 1958: 1850: 1758: 1465: 65: 2082: 2046: 1865: 1855: 1805: 1488:
The Compiler Design Handbook: Optimizations and Machine Code Generation, Second Edition
1369:
However, the above example unnecessarily allocates a temporary array for the result of
46: 1963: 1787: 1549: 1542: 1520: 1492: 2097: 2036: 1906: 1885: 1845: 1825: 33: 17: 2092: 69: 2067: 2041: 1840: 1835: 1820: 1576: 72:
with a single one. Loop fusion does not always improve run-time speed. On some
2122: 77: 1235:// Print the result out - just to make sure the optimizer doesn't remove 986:// it would be unwieldy to define all possible overloaded math operations as 983:// Similarly, we can define an overload for the sin() function. In practice, 1517:
Optimizing Compilers for Modern Architectures: A Dependence-based Approach
827:// Declare an overloaded addition operator as a free friend function (this 1810: 1727: 1422:
so that they can remove unused allocations from the code. However, as of
1414:
have global side effects, since some compilers hardcode symbols such as
1890: 830:// syntax defines operator+ as a free function that is a friend of this 1539: 548:// Factory method to produce an array over an integer range (the upper 1984: 1373:. A more efficient implementation would allocate a single array for 989:// friends inside the class like this, but this is just an example. 1676: 1651: 1626: 1381:
in a single loop. To optimize this, a C++ compiler would need to:
833:// class, despite it appearing as a member function declaration). 378:% Create an array of numbers from 0 to 999 (range is inclusive) 1169:// Here, we perform the same computation as the MATLAB example 1423: 414: 408:% Take the sine of x (element-wise) and add 4 to each element 488:// Internal constructor that produces an uninitialized array 1540:
Steven Muchnick; Muchnick and Associates (15 August 1997).
1514: 1484: 1533: 551:// bound is exclusive, unlike MATLAB's ranges). 45:that can split a task into multiple tasks for each 1541: 1238:// everything (if it's smart enough to do so). 1478: 2120: 1652:"Compiler Explorer - C++ (x86-64 clang 12.0.0)" 1627:"Compiler Explorer - C++ (x86-64 clang 12.0.0)" 1897:Induction variable recognition and elimination 1485:Y.N. Srikant; Priti Shankar (3 October 2018). 1743: 350: 1677:"Compiler Explorer - C++ (x86-64 gcc 11.1)" 1577:"More Dots: Syntactic Loop Fusion in Julia" 1443:overloads for in-place operations, such as 417:by using function and operator overloading: 1757: 1750: 1736: 1515:Kennedy, Ken & Allen, Randy. (2001). 1510: 1508: 41:. This optimization is most efficient in 1931:Sparse conditional constant propagation 1574: 1544:Advanced Compiler Design Implementation 2121: 1575:Johnson, Steven G. (21 January 2017). 1505: 1402:Remove the unused allocation and free. 1731: 413:You could achieve the same syntax in 1570: 1568: 1674: 1649: 1624: 355:Consider the following MATLAB code: 13: 1396:Fuse the loops into a single loop. 14: 2140: 1565: 1881:Common subexpression elimination 1701:"Functions ยท The Julia Language" 2022:Compile-time function execution 104: 1693: 1668: 1643: 1618: 1594: 1: 1471: 89:instruction-level parallelism 2001:Interprocedural optimization 7: 2052:Profile-guided optimization 2017:Bounds-checking elimination 1454: 1253:"The result is: " 10: 2145: 1816:Loop-invariant code motion 99: 2060: 2009: 1993: 1972: 1939: 1915: 1864: 1796:Automatic parallelization 1786: 1765: 719:// Basic array operations 351:Example in C++ and MATLAB 345: 1434:languages like C++, the 419: 357: 205: 108: 68:which replaces multiple 1801:Automatic vectorization 2129:Compiler optimizations 1949:Instruction scheduling 1926:Global value numbering 1902:Live-variable analysis 1831:Loop nest optimization 1759:Compiler optimizations 2078:Control-flow analysis 2073:Array-access analysis 2027:Dead-code elimination 1985:Tail-call elimination 1954:Instruction selection 1778:Local value numbering 1773:Peephole optimization 62:compiler optimization 43:multi-core processors 39:locality of reference 30:compiler optimization 2108:Value range analysis 2032:Expression templates 1876:Available expression 1461:Expression templates 2088:Dependence analysis 1959:Register allocation 1851:Software pipelining 1548:. Morgan Kaufmann. 1519:. Morgan Kaufmann. 1466:Loop transformation 66:loop transformation 2083:Data-flow analysis 2047:Partial evaluation 1856:Strength reduction 1806:Induction variable 1705:docs.julialang.org 203:is equivalent to: 2116: 2115: 1964:Rematerialization 1555:978-1-55860-320-2 1498:978-1-4200-4383-9 26:loop distribution 2136: 2098:Pointer analysis 2037:Inline expansion 1907:Use-define chain 1886:Constant folding 1846:Loop unswitching 1826:Loop interchange 1752: 1745: 1738: 1729: 1728: 1715: 1714: 1712: 1711: 1697: 1691: 1690: 1688: 1687: 1672: 1666: 1665: 1663: 1662: 1647: 1641: 1640: 1638: 1637: 1622: 1616: 1615: 1613: 1612: 1598: 1592: 1591: 1589: 1587: 1572: 1563: 1562: 1547: 1537: 1531: 1530: 1512: 1503: 1502: 1482: 1450: 1446: 1441: 1437: 1421: 1417: 1413: 1409: 1392: 1388: 1380: 1376: 1372: 1365: 1362: 1359: 1356: 1353: 1350: 1347: 1344: 1341: 1338: 1335: 1332: 1329: 1326: 1323: 1320: 1317: 1314: 1311: 1308: 1305: 1302: 1299: 1296: 1293: 1290: 1287: 1284: 1281: 1278: 1275: 1272: 1269: 1266: 1263: 1260: 1257: 1254: 1251: 1248: 1245: 1242: 1239: 1236: 1233: 1230: 1227: 1224: 1221: 1218: 1215: 1212: 1209: 1206: 1203: 1200: 1197: 1194: 1191: 1188: 1185: 1182: 1179: 1176: 1173: 1170: 1167: 1164: 1161: 1158: 1155: 1152: 1149: 1146: 1143: 1140: 1137: 1134: 1131: 1128: 1125: 1122: 1119: 1116: 1113: 1110: 1107: 1104: 1101: 1098: 1095: 1092: 1089: 1086: 1083: 1080: 1077: 1074: 1071: 1068: 1065: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1041: 1038: 1035: 1032: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 984: 981: 978: 975: 972: 969: 966: 963: 960: 957: 954: 951: 948: 945: 942: 939: 936: 933: 930: 927: 924: 921: 918: 915: 912: 909: 906: 903: 900: 897: 894: 891: 888: 885: 882: 879: 876: 873: 870: 867: 864: 861: 858: 855: 852: 849: 846: 843: 840: 837: 834: 831: 828: 825: 822: 819: 816: 813: 810: 807: 804: 801: 798: 795: 792: 789: 786: 783: 780: 777: 774: 771: 768: 765: 762: 759: 756: 753: 750: 747: 744: 741: 738: 735: 732: 729: 726: 723: 720: 717: 714: 711: 708: 705: 702: 699: 696: 693: 690: 687: 684: 681: 678: 675: 672: 669: 666: 663: 660: 657: 654: 651: 648: 645: 642: 639: 636: 633: 630: 627: 624: 621: 618: 615: 612: 609: 606: 603: 600: 597: 594: 591: 588: 585: 582: 579: 576: 573: 570: 567: 564: 561: 558: 555: 552: 549: 546: 543: 540: 537: 534: 531: 528: 525: 522: 519: 516: 513: 510: 507: 504: 501: 498: 495: 492: 489: 486: 483: 480: 477: 474: 471: 468: 465: 462: 459: 456: 453: 450: 447: 444: 443:<iostream> 441: 438: 435: 432: 429: 426: 423: 409: 406: 403: 400: 397: 394: 391: 388: 385: 382: 379: 376: 373: 370: 367: 364: 361: 341: 338: 335: 332: 329: 326: 323: 320: 317: 314: 311: 308: 305: 302: 299: 296: 293: 290: 287: 284: 281: 278: 275: 272: 269: 266: 263: 260: 257: 254: 251: 248: 245: 242: 239: 236: 233: 230: 227: 224: 221: 218: 215: 212: 209: 199: 196: 193: 190: 187: 184: 181: 178: 175: 172: 169: 166: 163: 160: 157: 154: 151: 148: 145: 142: 139: 136: 133: 130: 127: 124: 121: 118: 115: 112: 18:computer science 2144: 2143: 2139: 2138: 2137: 2135: 2134: 2133: 2119: 2118: 2117: 2112: 2093:Escape analysis 2061:Static analysis 2056: 2005: 1989: 1968: 1941:Code generation 1935: 1911: 1867: 1860: 1782: 1761: 1756: 1719: 1718: 1709: 1707: 1699: 1698: 1694: 1685: 1683: 1675:Godbolt, Matt. 1673: 1669: 1660: 1658: 1650:Godbolt, Matt. 1648: 1644: 1635: 1633: 1625:Godbolt, Matt. 1623: 1619: 1610: 1608: 1600: 1599: 1595: 1585: 1583: 1573: 1566: 1556: 1538: 1534: 1527: 1513: 1506: 1499: 1483: 1479: 1474: 1457: 1448: 1444: 1439: 1435: 1419: 1415: 1411: 1407: 1393:function calls. 1390: 1386: 1378: 1374: 1370: 1367: 1366: 1363: 1360: 1357: 1354: 1351: 1348: 1345: 1342: 1339: 1336: 1333: 1330: 1327: 1324: 1321: 1318: 1315: 1312: 1309: 1306: 1303: 1300: 1297: 1294: 1291: 1288: 1285: 1282: 1279: 1276: 1273: 1270: 1267: 1264: 1261: 1258: 1255: 1252: 1249: 1246: 1243: 1240: 1237: 1234: 1231: 1228: 1225: 1222: 1219: 1216: 1213: 1210: 1207: 1204: 1201: 1198: 1195: 1192: 1189: 1186: 1183: 1180: 1177: 1174: 1171: 1168: 1165: 1162: 1159: 1156: 1153: 1150: 1147: 1144: 1141: 1138: 1135: 1132: 1129: 1126: 1123: 1120: 1117: 1114: 1111: 1108: 1105: 1102: 1099: 1096: 1093: 1090: 1087: 1084: 1081: 1078: 1075: 1072: 1069: 1066: 1063: 1060: 1057: 1054: 1051: 1048: 1045: 1042: 1039: 1036: 1033: 1030: 1027: 1024: 1021: 1018: 1015: 1012: 1009: 1006: 1003: 1000: 997: 994: 991: 988: 985: 982: 979: 976: 973: 970: 967: 964: 961: 958: 955: 952: 949: 946: 943: 940: 937: 934: 931: 928: 925: 922: 919: 916: 913: 910: 907: 904: 901: 898: 895: 892: 889: 886: 883: 880: 877: 874: 871: 868: 865: 862: 859: 856: 853: 850: 847: 844: 841: 838: 835: 832: 829: 826: 823: 820: 817: 814: 811: 808: 805: 802: 799: 796: 793: 790: 787: 784: 781: 778: 775: 772: 769: 766: 763: 760: 757: 754: 751: 748: 745: 742: 739: 736: 733: 730: 727: 724: 721: 718: 715: 712: 709: 706: 703: 700: 697: 694: 691: 688: 685: 682: 679: 676: 673: 670: 667: 664: 661: 658: 655: 652: 649: 646: 643: 640: 637: 634: 631: 628: 625: 622: 619: 616: 613: 610: 607: 604: 601: 598: 595: 592: 589: 586: 583: 580: 577: 574: 571: 568: 565: 562: 559: 556: 553: 550: 547: 544: 541: 538: 535: 532: 529: 526: 523: 520: 517: 514: 511: 508: 505: 502: 499: 496: 493: 490: 487: 484: 481: 478: 475: 472: 469: 466: 463: 460: 457: 454: 451: 448: 445: 442: 439: 436: 433: 431:<cassert> 430: 427: 424: 421: 411: 410: 407: 404: 401: 398: 395: 392: 389: 386: 383: 380: 377: 374: 371: 368: 365: 362: 359: 353: 348: 343: 342: 339: 336: 333: 330: 327: 324: 321: 318: 315: 312: 309: 306: 303: 300: 297: 294: 291: 288: 285: 282: 279: 276: 273: 270: 267: 264: 261: 258: 255: 252: 249: 246: 243: 240: 237: 234: 231: 228: 225: 222: 219: 216: 213: 210: 207: 201: 200: 197: 194: 191: 188: 185: 182: 179: 176: 173: 170: 167: 164: 161: 158: 155: 152: 149: 146: 143: 140: 137: 134: 131: 128: 125: 122: 119: 116: 113: 110: 107: 102: 12: 11: 5: 2142: 2132: 2131: 2114: 2113: 2111: 2110: 2105: 2103:Shape analysis 2100: 2095: 2090: 2085: 2080: 2075: 2070: 2068:Alias analysis 2064: 2062: 2058: 2057: 2055: 2054: 2049: 2044: 2042:Jump threading 2039: 2034: 2029: 2024: 2019: 2013: 2011: 2007: 2006: 2004: 2003: 1997: 1995: 1991: 1990: 1988: 1987: 1982: 1976: 1974: 1970: 1969: 1967: 1966: 1961: 1956: 1951: 1945: 1943: 1937: 1936: 1934: 1933: 1928: 1922: 1920: 1913: 1912: 1910: 1909: 1904: 1899: 1894: 1888: 1883: 1878: 1872: 1870: 1862: 1861: 1859: 1858: 1853: 1848: 1843: 1841:Loop unrolling 1838: 1836:Loop splitting 1833: 1828: 1823: 1821:Loop inversion 1818: 1813: 1808: 1803: 1798: 1792: 1790: 1784: 1783: 1781: 1780: 1775: 1769: 1767: 1763: 1762: 1755: 1754: 1747: 1740: 1732: 1726: 1725: 1717: 1716: 1692: 1667: 1642: 1617: 1593: 1564: 1554: 1532: 1525: 1504: 1497: 1476: 1475: 1473: 1470: 1469: 1468: 1463: 1456: 1453: 1449:std::transform 1404: 1403: 1400: 1397: 1394: 1377:, and compute 437:<memory> 420: 358: 352: 349: 347: 344: 206: 109: 106: 103: 101: 98: 9: 6: 4: 3: 2: 2141: 2130: 2127: 2126: 2124: 2109: 2106: 2104: 2101: 2099: 2096: 2094: 2091: 2089: 2086: 2084: 2081: 2079: 2076: 2074: 2071: 2069: 2066: 2065: 2063: 2059: 2053: 2050: 2048: 2045: 2043: 2040: 2038: 2035: 2033: 2030: 2028: 2025: 2023: 2020: 2018: 2015: 2014: 2012: 2008: 2002: 1999: 1998: 1996: 1992: 1986: 1983: 1981: 1980:Deforestation 1978: 1977: 1975: 1971: 1965: 1962: 1960: 1957: 1955: 1952: 1950: 1947: 1946: 1944: 1942: 1938: 1932: 1929: 1927: 1924: 1923: 1921: 1918: 1914: 1908: 1905: 1903: 1900: 1898: 1895: 1892: 1889: 1887: 1884: 1882: 1879: 1877: 1874: 1873: 1871: 1869: 1863: 1857: 1854: 1852: 1849: 1847: 1844: 1842: 1839: 1837: 1834: 1832: 1829: 1827: 1824: 1822: 1819: 1817: 1814: 1812: 1809: 1807: 1804: 1802: 1799: 1797: 1794: 1793: 1791: 1789: 1785: 1779: 1776: 1774: 1771: 1770: 1768: 1764: 1760: 1753: 1748: 1746: 1741: 1739: 1734: 1733: 1730: 1724: 1721: 1720: 1706: 1702: 1696: 1682: 1678: 1671: 1657: 1653: 1646: 1632: 1628: 1621: 1607: 1603: 1602:"Loop Fusion" 1597: 1582: 1581:julialang.org 1578: 1571: 1569: 1561: 1557: 1551: 1546: 1545: 1536: 1528: 1526:1-55860-286-0 1522: 1518: 1511: 1509: 1500: 1494: 1491:. CRC Press. 1490: 1489: 1481: 1477: 1467: 1464: 1462: 1459: 1458: 1452: 1431: 1429: 1425: 1401: 1398: 1395: 1384: 1383: 1382: 425:<cmath> 418: 416: 356: 204: 97: 94: 90: 85: 83: 79: 78:data locality 75: 74:architectures 71: 67: 63: 59: 55: 50: 48: 44: 40: 35: 31: 27: 23: 19: 1723:Loop fission 1708:. Retrieved 1704: 1695: 1684:. Retrieved 1680: 1670: 1659:. Retrieved 1655: 1645: 1634:. Retrieved 1630: 1620: 1609:. Retrieved 1605: 1596: 1584:. Retrieved 1580: 1560:loop fusion. 1559: 1543: 1535: 1516: 1487: 1480: 1432: 1405: 1368: 412: 354: 202: 105:Example in C 92: 86: 58:loop jamming 57: 53: 52:Conversely, 51: 25: 22:loop fission 21: 15: 1893:elimination 1811:Loop fusion 1766:Basic block 1681:godbolt.org 1656:godbolt.org 1631:godbolt.org 1426:12.0.0 and 1385:Inline the 54:loop fusion 32:in which a 1973:Functional 1891:Dead store 1710:2021-06-25 1686:2021-06-25 1661:2021-06-25 1636:2021-06-25 1611:2021-06-25 1472:References 1445:operator+= 470:unique_ptr 1866:Data-flow 1440:operator+ 1391:operator+ 47:processor 2123:Category 1868:analysis 1455:See also 1337:<< 1331:<< 1256:<< 1250:<< 842:operator 794:operator 755:operator 440:#include 434:#include 428:#include 422:#include 1586:25 June 100:Fission 60:) is a 28:) is a 1994:Global 1919:-based 1552:  1523:  1495:  1416:malloc 1408:malloc 1371:sin(x) 1355:return 1277:size_t 1121:return 1049:size_t 992:friend 971:return 905:size_t 836:friend 815:return 800:size_t 773:return 761:size_t 740:length 737:return 722:size_t 707:return 668:length 647:size_t 635:length 608:length 605:size_t 587:assert 575:size_t 566:size_t 554:static 542:public 509:length 497:size_t 458:length 455:size_t 346:Fusion 2010:Other 1606:Intel 1424:clang 1187:Range 1181:Array 1022:Array 1010:& 1007:Array 1004:const 995:Array 878:Array 866:float 857:& 854:Array 851:const 839:Array 809:const 791:& 788:float 785:const 752:& 749:float 731:const 692:start 626:Array 620:start 599:start 569:start 560:Range 557:Array 530:float 491:Array 476:float 449:Array 446:class 82:Julia 70:loops 1788:Loop 1588:2021 1550:ISBN 1521:ISBN 1493:ISBN 1438:and 1420:free 1418:and 1412:free 1410:and 1389:and 1346:endl 1328:cout 1304:size 1295:< 1265:endl 1247:cout 1205:auto 1199:1000 1172:auto 1160:argv 1154:char 1148:argc 1139:main 1076:size 1067:< 1040:()); 1037:size 932:size 923:< 896:()); 893:size 818:data 776:data 725:size 665:< 596:> 521:data 482:data 479:> 473:< 307:< 250:< 153:< 64:and 56:(or 34:loop 24:(or 1917:SSA 1451:). 1447:or 1436:sin 1428:gcc 1387:sin 1340:std 1322:std 1307:(); 1271:for 1259:std 1241:std 1214:sin 1145:int 1136:int 1106:sin 1100:std 1079:(); 1043:for 998:sin 935:(); 899:for 641:for 614:end 593:end 578:end 527:new 464:std 415:C++ 387:sin 372:999 310:100 286:for 253:100 229:for 208:int 156:100 132:for 111:int 93:are 16:In 2125:: 1703:. 1679:. 1654:. 1629:. 1604:. 1579:. 1567:^ 1558:. 1507:^ 1343::: 1325::: 1310:++ 1262::: 1244::: 1202:); 1184::: 1133:}; 1115:); 1103::: 1082:++ 938:++ 728:() 674:++ 638:); 602:); 518:), 467::: 319:++ 262:++ 165:++ 49:. 20:, 1751:e 1744:t 1737:v 1713:. 1689:. 1664:. 1639:. 1614:. 1590:. 1529:. 1501:. 1379:y 1375:y 1364:} 1361:; 1358:0 1352:} 1349:; 1334:y 1319:{ 1316:) 1313:i 1301:. 1298:y 1292:i 1289:; 1286:0 1283:= 1280:i 1274:( 1268:; 1232:; 1229:4 1226:+ 1223:) 1220:x 1217:( 1211:= 1208:y 1196:, 1193:0 1190:( 1178:= 1175:x 1166:{ 1163:) 1157:* 1151:, 1142:( 1130:} 1127:; 1124:b 1118:} 1112:a 1109:( 1097:= 1094:b 1091:{ 1088:) 1085:i 1073:. 1070:a 1064:i 1061:; 1058:0 1055:= 1052:i 1046:( 1034:. 1031:a 1028:( 1025:b 1019:{ 1016:) 1013:a 1001:( 980:} 977:; 974:c 968:} 965:; 962:b 959:+ 956:a 953:= 950:c 947:{ 944:) 941:i 929:. 926:a 920:i 917:; 914:0 911:= 908:i 902:( 890:. 887:a 884:( 881:c 875:{ 872:) 869:b 863:, 860:a 848:( 845:+ 824:} 821:; 812:{ 806:) 803:i 797:( 782:} 779:; 770:{ 767:) 764:i 758:( 746:} 743:; 734:{ 716:} 713:; 710:a 704:} 701:; 698:i 695:+ 689:= 686:a 683:{ 680:) 677:i 671:; 662:i 659:; 656:0 653:= 650:i 644:( 632:( 629:a 623:; 617:- 611:= 590:( 584:{ 581:) 572:, 563:( 545:: 539:} 536:{ 533:) 524:( 515:n 512:( 506:: 503:) 500:n 494:( 485:; 461:; 452:{ 405:; 402:4 399:+ 396:) 393:x 390:( 384:= 381:y 375:; 369:: 366:0 363:= 360:x 340:} 337:; 334:2 331:= 328:b 325:{ 322:) 316:i 313:; 304:i 301:; 298:0 295:= 292:i 289:( 283:} 280:; 277:1 274:= 271:a 268:{ 265:) 259:i 256:; 247:i 244:; 241:0 238:= 235:i 232:( 226:; 223:b 220:, 217:a 214:, 211:i 198:} 195:; 192:2 189:= 186:b 183:; 180:1 177:= 174:a 171:{ 168:) 162:i 159:; 150:i 147:; 144:0 141:= 138:i 135:( 129:; 126:b 123:, 120:a 117:, 114:i

Index

computer science
compiler optimization
loop
locality of reference
multi-core processors
processor
compiler optimization
loop transformation
loops
architectures
data locality
Julia
instruction-level parallelism
C++
clang
gcc
Expression templates
Loop transformation
The Compiler Design Handbook: Optimizations and Machine Code Generation, Second Edition
ISBN
978-1-4200-4383-9


ISBN
1-55860-286-0
Advanced Compiler Design Implementation
ISBN
978-1-55860-320-2

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

โ†‘