Knowledge

Loop fission and fusion

Source ๐Ÿ“

1444:
Some languages specifically targeted towards numerical computing such as Julia might have the concept of loop fusion built into it at a high level, where the compiler will notice adjacent elementwise operations and fuse them into a single loop. Currently, to achieve the same syntax in general purpose
106:
data dependencies that require an intermediate allocation to store the results). If loop fusion is able to remove redundant allocations, performance increases can be large. Otherwise, there is a more complex trade-off between data locality, instruction-level parallelism, and loop overhead (branching,
1453:
functions must pessimistically allocate arrays to store their results, since they do not know what context they will be called from. This issue can be avoided in C++ by using a different syntax that does not rely on the compiler to remove unnecessary temporary allocations (e.g., using functions and
102:. This is possible when there are no data dependencies between the bodies of the two loops (this is in stark contrast to the other main benefit of loop fusion described above, which only presents itself when there 1612: 47:
is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large loop body into smaller ones to achieve better utilization of
91:
within each loop. One of the main benefits of loop fusion is that it allows temporary allocations to be avoided, which can lead to huge performance gains in numerical computing languages such as
98:
Other benefits of loop fusion are that it avoids the overhead of the loop control structures, and also that it allows the loop body to be parallelized by the processor by taking advantage of
95:
when doing elementwise operations on arrays (however, Julia's loop fusion is not technically a compiler optimization, but a syntactic guarantee of the language).
1760: 1733: 1907: 1941: 1564: 1507: 1753: 1497: 2113: 1535: 1990: 1891: 2139: 2032: 1746: 1441:
11.1, this loop fusion and redundant allocation removal does not occur - even on the highest optimization level.
1417:
All of these steps are individually possible. Even step four is possible despite the fact that functions like
1927: 99: 2011: 92: 17: 2062: 2027: 1951: 1826: 107:
incrementing, etc.) that may make loop fusion, loop fission, or neither, the preferable optimization.
1806: 1711: 1410:
Remove the unused stores into the temporary arrays (can use a register or stack variable instead).
1811: 1438: 1959: 1936: 1912: 1841: 87:, two loops may actually perform better than one loop because, for example, there is increased 84: 2088: 2083: 2037: 1964: 1788: 1783: 72: 49: 40: 2118: 2042: 1886: 1471: 53: 8: 2098: 1969: 1861: 1769: 1476: 76: 2093: 2057: 1876: 1866: 1816: 1499:
The Compiler Design Handbook: Optimizations and Machine Code Generation, Second Edition
1380:
However, the above example unnecessarily allocates a temporary array for the result of
57: 1974: 1798: 1560: 1553: 1531: 1503: 2108: 2047: 1917: 1896: 1856: 1836: 44: 28: 2103: 80: 2078: 2052: 1851: 1846: 1831: 1587: 83:
with a single one. Loop fusion does not always improve run-time speed. On some
2133: 88: 1246:// Print the result out - just to make sure the optimizer doesn't remove 997:// it would be unwieldy to define all possible overloaded math operations as 994:// Similarly, we can define an overload for the sin() function. In practice, 1528:
Optimizing Compilers for Modern Architectures: A Dependence-based Approach
838:// Declare an overloaded addition operator as a free friend function (this 1821: 1738: 1433:
so that they can remove unused allocations from the code. However, as of
1425:
have global side effects, since some compilers hardcode symbols such as
1901: 841:// syntax defines operator+ as a free function that is a friend of this 1550: 559:// Factory method to produce an array over an integer range (the upper 1995: 1384:. A more efficient implementation would allocate a single array for 1000:// friends inside the class like this, but this is just an example. 1687: 1662: 1637: 1392:
in a single loop. To optimize this, a C++ compiler would need to:
844:// class, despite it appearing as a member function declaration). 389:% Create an array of numbers from 0 to 999 (range is inclusive) 1180:// Here, we perform the same computation as the MATLAB example 1434: 425: 419:% Take the sine of x (element-wise) and add 4 to each element 499:// Internal constructor that produces an uninitialized array 1551:
Steven Muchnick; Muchnick and Associates (15 August 1997).
1525: 1495: 1544: 562:// bound is exclusive, unlike MATLAB's ranges). 56:that can split a task into multiple tasks for each 1552: 1249:// everything (if it's smart enough to do so). 1489: 2131: 1663:"Compiler Explorer - C++ (x86-64 clang 12.0.0)" 1638:"Compiler Explorer - C++ (x86-64 clang 12.0.0)" 1908:Induction variable recognition and elimination 1496:Y.N. Srikant; Priti Shankar (3 October 2018). 1754: 361: 1688:"Compiler Explorer - C++ (x86-64 gcc 11.1)" 1588:"More Dots: Syntactic Loop Fusion in Julia" 1454:overloads for in-place operations, such as 428:by using function and operator overloading: 1768: 1761: 1747: 1526:Kennedy, Ken & Allen, Randy. (2001). 1521: 1519: 52:. This optimization is most efficient in 1942:Sparse conditional constant propagation 1585: 1555:Advanced Compiler Design Implementation 14: 2132: 1586:Johnson, Steven G. (21 January 2017). 1516: 1413:Remove the unused allocation and free. 1742: 424:You could achieve the same syntax in 1581: 1579: 1685: 1660: 1635: 366:Consider the following MATLAB code: 24: 1407:Fuse the loops into a single loop. 25: 2151: 1576: 1892:Common subexpression elimination 1712:"Functions ยท The Julia Language" 2033:Compile-time function execution 115: 1704: 1679: 1654: 1629: 1605: 13: 1: 1482: 100:instruction-level parallelism 2012:Interprocedural optimization 7: 2063:Profile-guided optimization 2028:Bounds-checking elimination 1465: 1264:"The result is: " 10: 2156: 1827:Loop-invariant code motion 110: 2071: 2020: 2004: 1983: 1950: 1926: 1875: 1807:Automatic parallelization 1797: 1776: 730:// Basic array operations 362:Example in C++ and MATLAB 356: 1445:languages like C++, the 430: 368: 216: 119: 79:which replaces multiple 1812:Automatic vectorization 2140:Compiler optimizations 1960:Instruction scheduling 1937:Global value numbering 1913:Live-variable analysis 1842:Loop nest optimization 1770:Compiler optimizations 2089:Control-flow analysis 2084:Array-access analysis 2038:Dead-code elimination 1996:Tail-call elimination 1965:Instruction selection 1789:Local value numbering 1784:Peephole optimization 73:compiler optimization 54:multi-core processors 50:locality of reference 41:compiler optimization 2119:Value range analysis 2043:Expression templates 1887:Available expression 1472:Expression templates 2099:Dependence analysis 1970:Register allocation 1862:Software pipelining 1559:. Morgan Kaufmann. 1530:. Morgan Kaufmann. 1477:Loop transformation 77:loop transformation 2094:Data-flow analysis 2058:Partial evaluation 1867:Strength reduction 1817:Induction variable 1716:docs.julialang.org 214:is equivalent to: 2127: 2126: 1975:Rematerialization 1566:978-1-55860-320-2 1509:978-1-4200-4383-9 37:loop distribution 16:(Redirected from 2147: 2109:Pointer analysis 2048:Inline expansion 1918:Use-define chain 1897:Constant folding 1857:Loop unswitching 1837:Loop interchange 1763: 1756: 1749: 1740: 1739: 1726: 1725: 1723: 1722: 1708: 1702: 1701: 1699: 1698: 1683: 1677: 1676: 1674: 1673: 1658: 1652: 1651: 1649: 1648: 1633: 1627: 1626: 1624: 1623: 1609: 1603: 1602: 1600: 1598: 1583: 1574: 1573: 1558: 1548: 1542: 1541: 1523: 1514: 1513: 1493: 1461: 1457: 1452: 1448: 1432: 1428: 1424: 1420: 1403: 1399: 1391: 1387: 1383: 1376: 1373: 1370: 1367: 1364: 1361: 1358: 1355: 1352: 1349: 1346: 1343: 1340: 1337: 1334: 1331: 1328: 1325: 1322: 1319: 1316: 1313: 1310: 1307: 1304: 1301: 1298: 1295: 1292: 1289: 1286: 1283: 1280: 1277: 1274: 1271: 1268: 1265: 1262: 1259: 1256: 1253: 1250: 1247: 1244: 1241: 1238: 1235: 1232: 1229: 1226: 1223: 1220: 1217: 1214: 1211: 1208: 1205: 1202: 1199: 1196: 1193: 1190: 1187: 1184: 1181: 1178: 1175: 1172: 1169: 1166: 1163: 1160: 1157: 1154: 1151: 1148: 1145: 1142: 1139: 1136: 1133: 1130: 1127: 1124: 1121: 1118: 1115: 1112: 1109: 1106: 1103: 1100: 1097: 1094: 1091: 1088: 1085: 1082: 1079: 1076: 1073: 1070: 1067: 1064: 1061: 1058: 1055: 1052: 1049: 1046: 1043: 1040: 1037: 1034: 1031: 1028: 1025: 1022: 1019: 1016: 1013: 1010: 1007: 1004: 1001: 998: 995: 992: 989: 986: 983: 980: 977: 974: 971: 968: 965: 962: 959: 956: 953: 950: 947: 944: 941: 938: 935: 932: 929: 926: 923: 920: 917: 914: 911: 908: 905: 902: 899: 896: 893: 890: 887: 884: 881: 878: 875: 872: 869: 866: 863: 860: 857: 854: 851: 848: 845: 842: 839: 836: 833: 830: 827: 824: 821: 818: 815: 812: 809: 806: 803: 800: 797: 794: 791: 788: 785: 782: 779: 776: 773: 770: 767: 764: 761: 758: 755: 752: 749: 746: 743: 740: 737: 734: 731: 728: 725: 722: 719: 716: 713: 710: 707: 704: 701: 698: 695: 692: 689: 686: 683: 680: 677: 674: 671: 668: 665: 662: 659: 656: 653: 650: 647: 644: 641: 638: 635: 632: 629: 626: 623: 620: 617: 614: 611: 608: 605: 602: 599: 596: 593: 590: 587: 584: 581: 578: 575: 572: 569: 566: 563: 560: 557: 554: 551: 548: 545: 542: 539: 536: 533: 530: 527: 524: 521: 518: 515: 512: 509: 506: 503: 500: 497: 494: 491: 488: 485: 482: 479: 476: 473: 470: 467: 464: 461: 458: 455: 454:<iostream> 452: 449: 446: 443: 440: 437: 434: 420: 417: 414: 411: 408: 405: 402: 399: 396: 393: 390: 387: 384: 381: 378: 375: 372: 352: 349: 346: 343: 340: 337: 334: 331: 328: 325: 322: 319: 316: 313: 310: 307: 304: 301: 298: 295: 292: 289: 286: 283: 280: 277: 274: 271: 268: 265: 262: 259: 256: 253: 250: 247: 244: 241: 238: 235: 232: 229: 226: 223: 220: 210: 207: 204: 201: 198: 195: 192: 189: 186: 183: 180: 177: 174: 171: 168: 165: 162: 159: 156: 153: 150: 147: 144: 141: 138: 135: 132: 129: 126: 123: 29:computer science 21: 2155: 2154: 2150: 2149: 2148: 2146: 2145: 2144: 2130: 2129: 2128: 2123: 2104:Escape analysis 2072:Static analysis 2067: 2016: 2000: 1979: 1952:Code generation 1946: 1922: 1878: 1871: 1793: 1772: 1767: 1730: 1729: 1720: 1718: 1710: 1709: 1705: 1696: 1694: 1686:Godbolt, Matt. 1684: 1680: 1671: 1669: 1661:Godbolt, Matt. 1659: 1655: 1646: 1644: 1636:Godbolt, Matt. 1634: 1630: 1621: 1619: 1611: 1610: 1606: 1596: 1594: 1584: 1577: 1567: 1549: 1545: 1538: 1524: 1517: 1510: 1494: 1490: 1485: 1468: 1459: 1455: 1450: 1446: 1430: 1426: 1422: 1418: 1404:function calls. 1401: 1397: 1389: 1385: 1381: 1378: 1377: 1374: 1371: 1368: 1365: 1362: 1359: 1356: 1353: 1350: 1347: 1344: 1341: 1338: 1335: 1332: 1329: 1326: 1323: 1320: 1317: 1314: 1311: 1308: 1305: 1302: 1299: 1296: 1293: 1290: 1287: 1284: 1281: 1278: 1275: 1272: 1269: 1266: 1263: 1260: 1257: 1254: 1251: 1248: 1245: 1242: 1239: 1236: 1233: 1230: 1227: 1224: 1221: 1218: 1215: 1212: 1209: 1206: 1203: 1200: 1197: 1194: 1191: 1188: 1185: 1182: 1179: 1176: 1173: 1170: 1167: 1164: 1161: 1158: 1155: 1152: 1149: 1146: 1143: 1140: 1137: 1134: 1131: 1128: 1125: 1122: 1119: 1116: 1113: 1110: 1107: 1104: 1101: 1098: 1095: 1092: 1089: 1086: 1083: 1080: 1077: 1074: 1071: 1068: 1065: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1041: 1038: 1035: 1032: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 984: 981: 978: 975: 972: 969: 966: 963: 960: 957: 954: 951: 948: 945: 942: 939: 936: 933: 930: 927: 924: 921: 918: 915: 912: 909: 906: 903: 900: 897: 894: 891: 888: 885: 882: 879: 876: 873: 870: 867: 864: 861: 858: 855: 852: 849: 846: 843: 840: 837: 834: 831: 828: 825: 822: 819: 816: 813: 810: 807: 804: 801: 798: 795: 792: 789: 786: 783: 780: 777: 774: 771: 768: 765: 762: 759: 756: 753: 750: 747: 744: 741: 738: 735: 732: 729: 726: 723: 720: 717: 714: 711: 708: 705: 702: 699: 696: 693: 690: 687: 684: 681: 678: 675: 672: 669: 666: 663: 660: 657: 654: 651: 648: 645: 642: 639: 636: 633: 630: 627: 624: 621: 618: 615: 612: 609: 606: 603: 600: 597: 594: 591: 588: 585: 582: 579: 576: 573: 570: 567: 564: 561: 558: 555: 552: 549: 546: 543: 540: 537: 534: 531: 528: 525: 522: 519: 516: 513: 510: 507: 504: 501: 498: 495: 492: 489: 486: 483: 480: 477: 474: 471: 468: 465: 462: 459: 456: 453: 450: 447: 444: 442:<cassert> 441: 438: 435: 432: 422: 421: 418: 415: 412: 409: 406: 403: 400: 397: 394: 391: 388: 385: 382: 379: 376: 373: 370: 364: 359: 354: 353: 350: 347: 344: 341: 338: 335: 332: 329: 326: 323: 320: 317: 314: 311: 308: 305: 302: 299: 296: 293: 290: 287: 284: 281: 278: 275: 272: 269: 266: 263: 260: 257: 254: 251: 248: 245: 242: 239: 236: 233: 230: 227: 224: 221: 218: 212: 211: 208: 205: 202: 199: 196: 193: 190: 187: 184: 181: 178: 175: 172: 169: 166: 163: 160: 157: 154: 151: 148: 145: 142: 139: 136: 133: 130: 127: 124: 121: 118: 113: 23: 22: 15: 12: 11: 5: 2153: 2143: 2142: 2125: 2124: 2122: 2121: 2116: 2114:Shape analysis 2111: 2106: 2101: 2096: 2091: 2086: 2081: 2079:Alias analysis 2075: 2073: 2069: 2068: 2066: 2065: 2060: 2055: 2053:Jump threading 2050: 2045: 2040: 2035: 2030: 2024: 2022: 2018: 2017: 2015: 2014: 2008: 2006: 2002: 2001: 1999: 1998: 1993: 1987: 1985: 1981: 1980: 1978: 1977: 1972: 1967: 1962: 1956: 1954: 1948: 1947: 1945: 1944: 1939: 1933: 1931: 1924: 1923: 1921: 1920: 1915: 1910: 1905: 1899: 1894: 1889: 1883: 1881: 1873: 1872: 1870: 1869: 1864: 1859: 1854: 1852:Loop unrolling 1849: 1847:Loop splitting 1844: 1839: 1834: 1832:Loop inversion 1829: 1824: 1819: 1814: 1809: 1803: 1801: 1795: 1794: 1792: 1791: 1786: 1780: 1778: 1774: 1773: 1766: 1765: 1758: 1751: 1743: 1737: 1736: 1728: 1727: 1703: 1678: 1653: 1628: 1604: 1575: 1565: 1543: 1536: 1515: 1508: 1487: 1486: 1484: 1481: 1480: 1479: 1474: 1467: 1464: 1460:std::transform 1415: 1414: 1411: 1408: 1405: 1388:, and compute 448:<memory> 431: 369: 363: 360: 358: 355: 217: 120: 117: 114: 112: 109: 9: 6: 4: 3: 2: 2152: 2141: 2138: 2137: 2135: 2120: 2117: 2115: 2112: 2110: 2107: 2105: 2102: 2100: 2097: 2095: 2092: 2090: 2087: 2085: 2082: 2080: 2077: 2076: 2074: 2070: 2064: 2061: 2059: 2056: 2054: 2051: 2049: 2046: 2044: 2041: 2039: 2036: 2034: 2031: 2029: 2026: 2025: 2023: 2019: 2013: 2010: 2009: 2007: 2003: 1997: 1994: 1992: 1991:Deforestation 1989: 1988: 1986: 1982: 1976: 1973: 1971: 1968: 1966: 1963: 1961: 1958: 1957: 1955: 1953: 1949: 1943: 1940: 1938: 1935: 1934: 1932: 1929: 1925: 1919: 1916: 1914: 1911: 1909: 1906: 1903: 1900: 1898: 1895: 1893: 1890: 1888: 1885: 1884: 1882: 1880: 1874: 1868: 1865: 1863: 1860: 1858: 1855: 1853: 1850: 1848: 1845: 1843: 1840: 1838: 1835: 1833: 1830: 1828: 1825: 1823: 1820: 1818: 1815: 1813: 1810: 1808: 1805: 1804: 1802: 1800: 1796: 1790: 1787: 1785: 1782: 1781: 1779: 1775: 1771: 1764: 1759: 1757: 1752: 1750: 1745: 1744: 1741: 1735: 1732: 1731: 1717: 1713: 1707: 1693: 1689: 1682: 1668: 1664: 1657: 1643: 1639: 1632: 1618: 1614: 1613:"Loop Fusion" 1608: 1593: 1592:julialang.org 1589: 1582: 1580: 1572: 1568: 1562: 1557: 1556: 1547: 1539: 1537:1-55860-286-0 1533: 1529: 1522: 1520: 1511: 1505: 1502:. CRC Press. 1501: 1500: 1492: 1488: 1478: 1475: 1473: 1470: 1469: 1463: 1442: 1440: 1436: 1412: 1409: 1406: 1395: 1394: 1393: 436:<cmath> 429: 427: 367: 215: 108: 105: 101: 96: 94: 90: 89:data locality 86: 85:architectures 82: 78: 74: 70: 66: 61: 59: 55: 51: 46: 42: 38: 34: 30: 19: 1734:Loop fission 1719:. Retrieved 1715: 1706: 1695:. Retrieved 1691: 1681: 1670:. Retrieved 1666: 1656: 1645:. Retrieved 1641: 1631: 1620:. Retrieved 1616: 1607: 1595:. Retrieved 1591: 1571:loop fusion. 1570: 1554: 1546: 1527: 1498: 1491: 1443: 1416: 1379: 423: 365: 213: 116:Example in C 103: 97: 69:loop jamming 68: 64: 63:Conversely, 62: 36: 33:loop fission 32: 26: 18:Loop fission 1904:elimination 1822:Loop fusion 1777:Basic block 1692:godbolt.org 1667:godbolt.org 1642:godbolt.org 1437:12.0.0 and 1396:Inline the 65:loop fusion 43:in which a 1984:Functional 1902:Dead store 1721:2021-06-25 1697:2021-06-25 1672:2021-06-25 1647:2021-06-25 1622:2021-06-25 1483:References 1456:operator+= 481:unique_ptr 1877:Data-flow 1451:operator+ 1402:operator+ 58:processor 2134:Category 1879:analysis 1466:See also 1348:<< 1342:<< 1267:<< 1261:<< 853:operator 805:operator 766:operator 451:#include 445:#include 439:#include 433:#include 1597:25 June 111:Fission 71:) is a 39:) is a 2005:Global 1930:-based 1563:  1534:  1506:  1427:malloc 1419:malloc 1382:sin(x) 1366:return 1288:size_t 1132:return 1060:size_t 1003:friend 982:return 916:size_t 847:friend 826:return 811:size_t 784:return 772:size_t 751:length 748:return 733:size_t 718:return 679:length 658:size_t 646:length 619:length 616:size_t 598:assert 586:size_t 577:size_t 565:static 553:public 520:length 508:size_t 469:length 466:size_t 357:Fusion 2021:Other 1617:Intel 1435:clang 1198:Range 1192:Array 1033:Array 1021:& 1018:Array 1015:const 1006:Array 889:Array 877:float 868:& 865:Array 862:const 850:Array 820:const 802:& 799:float 796:const 763:& 760:float 742:const 703:start 637:Array 631:start 610:start 580:start 571:Range 568:Array 541:float 502:Array 487:float 460:Array 457:class 93:Julia 81:loops 1799:Loop 1599:2021 1561:ISBN 1532:ISBN 1504:ISBN 1449:and 1431:free 1429:and 1423:free 1421:and 1400:and 1357:endl 1339:cout 1315:size 1306:< 1276:endl 1258:cout 1216:auto 1210:1000 1183:auto 1171:argv 1165:char 1159:argc 1150:main 1087:size 1078:< 1051:()); 1048:size 943:size 934:< 907:()); 904:size 829:data 787:data 736:size 676:< 607:> 532:data 493:data 490:> 484:< 318:< 261:< 164:< 75:and 67:(or 45:loop 35:(or 1928:SSA 1462:). 1458:or 1447:sin 1439:gcc 1398:sin 1351:std 1333:std 1318:(); 1282:for 1270:std 1252:std 1225:sin 1156:int 1147:int 1117:sin 1111:std 1090:(); 1054:for 1009:sin 946:(); 910:for 652:for 625:end 604:end 589:end 538:new 475:std 426:C++ 398:sin 383:999 321:100 297:for 264:100 240:for 219:int 167:100 143:for 122:int 104:are 27:In 2136:: 1714:. 1690:. 1665:. 1640:. 1615:. 1590:. 1578:^ 1569:. 1518:^ 1354::: 1336::: 1321:++ 1273::: 1255::: 1213:); 1195::: 1144:}; 1126:); 1114::: 1093:++ 949:++ 739:() 685:++ 649:); 613:); 529:), 478::: 330:++ 273:++ 176:++ 60:. 31:, 1762:e 1755:t 1748:v 1724:. 1700:. 1675:. 1650:. 1625:. 1601:. 1540:. 1512:. 1390:y 1386:y 1375:} 1372:; 1369:0 1363:} 1360:; 1345:y 1330:{ 1327:) 1324:i 1312:. 1309:y 1303:i 1300:; 1297:0 1294:= 1291:i 1285:( 1279:; 1243:; 1240:4 1237:+ 1234:) 1231:x 1228:( 1222:= 1219:y 1207:, 1204:0 1201:( 1189:= 1186:x 1177:{ 1174:) 1168:* 1162:, 1153:( 1141:} 1138:; 1135:b 1129:} 1123:a 1120:( 1108:= 1105:b 1102:{ 1099:) 1096:i 1084:. 1081:a 1075:i 1072:; 1069:0 1066:= 1063:i 1057:( 1045:. 1042:a 1039:( 1036:b 1030:{ 1027:) 1024:a 1012:( 991:} 988:; 985:c 979:} 976:; 973:b 970:+ 967:a 964:= 961:c 958:{ 955:) 952:i 940:. 937:a 931:i 928:; 925:0 922:= 919:i 913:( 901:. 898:a 895:( 892:c 886:{ 883:) 880:b 874:, 871:a 859:( 856:+ 835:} 832:; 823:{ 817:) 814:i 808:( 793:} 790:; 781:{ 778:) 775:i 769:( 757:} 754:; 745:{ 727:} 724:; 721:a 715:} 712:; 709:i 706:+ 700:= 697:a 694:{ 691:) 688:i 682:; 673:i 670:; 667:0 664:= 661:i 655:( 643:( 640:a 634:; 628:- 622:= 601:( 595:{ 592:) 583:, 574:( 556:: 550:} 547:{ 544:) 535:( 526:n 523:( 517:: 514:) 511:n 505:( 496:; 472:; 463:{ 416:; 413:4 410:+ 407:) 404:x 401:( 395:= 392:y 386:; 380:: 377:0 374:= 371:x 351:} 348:; 345:2 342:= 339:b 336:{ 333:) 327:i 324:; 315:i 312:; 309:0 306:= 303:i 300:( 294:} 291:; 288:1 285:= 282:a 279:{ 276:) 270:i 267:; 258:i 255:; 252:0 249:= 246:i 243:( 237:; 234:b 231:, 228:a 225:, 222:i 209:} 206:; 203:2 200:= 197:b 194:; 191:1 188:= 185:a 182:{ 179:) 173:i 170:; 161:i 158:; 155:0 152:= 149:i 146:( 140:; 137:b 134:, 131:a 128:, 125:i 20:)

Index

Loop fission
computer science
compiler optimization
loop
locality of reference
multi-core processors
processor
compiler optimization
loop transformation
loops
architectures
data locality
Julia
instruction-level parallelism
C++
clang
gcc
Expression templates
Loop transformation
The Compiler Design Handbook: Optimizations and Machine Code Generation, Second Edition
ISBN
978-1-4200-4383-9


ISBN
1-55860-286-0
Advanced Compiler Design Implementation
ISBN
978-1-55860-320-2

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

โ†‘