Knowledge

Granularity (parallel computing)

Source 📝

1982: 287:
Assume there are 100 processors that are responsible for processing the 10*10 image. Ignoring the communication overhead, the 100 processors can process the 10*10 image in 1 clock cycle. Each processor is working on 1 pixel of the image and then communicates the output to other processors. This is an
241:
In coarse-grained parallelism, a program is split into large tasks. Due to this, a large amount of computation takes place in processors. This might result in load imbalance, wherein certain tasks process the bulk of the data while others might be idle. Further, coarse-grained parallelism fails to
186:
In fine-grained parallelism, a program is broken down to a large number of small tasks. These tasks are assigned individually to many processors. The amount of work associated with a parallel task is low and the work is evenly distributed among the processors. Hence, fine-grained parallelism
300:
Further, if we reduce the processors to 2, then the processing will take 50 clock cycles. Each processor need to process 50 elements which increases the computation time, but the communication overhead decreases as the number of processors which share data decreases. This case illustrates
264:
Medium-grained parallelism is used relatively to fine-grained and coarse-grained parallelism. Medium-grained parallelism is a compromise between fine-grained and coarse-grained parallelism, where we have task size and communication time greater than fine-grained parallelism and lower than
862:
In order to reduce the communication overhead, granularity can be increased. Coarse grained tasks have less communication overhead but they often cause load imbalance. Hence optimal performance is achieved between the two extremes of fine-grained and coarse-grained parallelism.
50:
or processing elements. It defines granularity as the ratio of computation time to communication time, wherein computation time is the time required to perform the computation of a task and communication time is the time required to exchange data between processors.
866:
Various studies have proposed their solution to help determine the best granularity to aid parallel processing. Finding the best grain size, depends on a number of factors and varies greatly from problem-to-problem.
148: 242:
exploit the parallelism in the program as most of the computation is performed sequentially on a processor. The advantage of this type of parallelism is low communication and synchronization overhead.
767:
parallelism. At instruction and loop level, fine-grained parallelism is achieved. Typical grain size at instruction-level is 20 instructions, while the grain-size at loop-level is 500 instructions.
171:
Depending on the amount of work which is performed by a parallel task, parallelism can be classified into three categories: fine-grained, medium-grained and coarse-grained parallelism.
194:
As each task processes less data, the number of processors required to perform the complete processing is high. This in turn, increases the communication and synchronization overhead.
294:
Consider that there are 25 processors processing the 10*10 image. The processing of the image will now take 4 clock cycles. This is an example of medium-grained parallelism.
163:
in a particular task. Alternately, granularity can also be specified in terms of the execution time of a program, combining the computation time and communication time.
781:, parallel execution of programs takes place. Granularity can be in the range of tens of thousands of instructions. Coarse-grained parallelism is used at this level. 1308: 851:
Granularity affects the performance of parallel computers. Using fine grains or small tasks results in more parallelism and hence increases the
1398: 859:
strategies etc. can negatively impact the performance of fine-grained tasks. Increasing parallelism alone cannot give the best performance.
1250: 774:(or procedure) level the grain size is typically a few thousand instructions. Medium-grained parallelism is achieved at sub-routine level. 84: 1043:
Chen, Ding-Kai; Su, Hong-Men; Yew, Pen-Chung (1 January 1990). "The impact of synchronization and granularity on parallel systems".
2007: 1379: 1419: 1646: 248:
architecture takes a long time to communicate data among processes which makes it suitable for coarse-grained parallelism.
1014: 281:
Consider a 10*10 image that needs to be processed, given that, processing of the 100 pixels is independent of each other.
1669: 1558: 933: 1414: 1664: 1641: 1070: 1024: 980: 969: 742:
Granularity is closely tied to the level of processing. A program can be broken down into 4 levels of parallelism -
1243: 1636: 1451: 1743: 1657: 1606: 1967: 1801: 1652: 1339: 876: 784:
The below table shows the relationship between levels of parallelism, grain size and degree of parallelism
156: 952:
Kwiatkowski, Jan (9 September 2001). "Evaluation of Parallel Programs by Measurement of Its Granularity".
1986: 1932: 1392: 1236: 1094:
Yeung, Donald; Dally, William J.; Agarwal, Anant. "How to Choose the Grain Size of a Parallel Computer".
1911: 1706: 1591: 1553: 1403: 1293: 188: 1927: 1906: 1851: 1738: 1728: 1701: 1563: 1122: 1881: 1507: 1446: 1359: 1100: 1053: 201:
architecture which has a low communication overhead is most suitable for fine-grained parallelism.
211:
An example of a fine-grained system (from outside the parallel computing domain) is the system of
1942: 1937: 1796: 1387: 204:
It is difficult for programmers to detect parallelism in a program, therefore, it is usually the
1681: 1613: 1517: 1409: 1364: 1095: 1048: 923: 197:
Fine-grained parallelism is best exploited in architectures which support fast communication.
1773: 1733: 1686: 1676: 1471: 1334: 1273: 1109: 160: 47: 1713: 1601: 1596: 1586: 1573: 1369: 265:
coarse-grained parallelism. Most general-purpose parallel computers fall in this category.
233:
are examples of fine-grain parallel computers that have grain size in the range of 4-5 μs.
43: 1045:
Proceedings of the 17th annual international symposium on Computer Architecture - ISCA '90
8: 1876: 1831: 1631: 1497: 273:
is an example of medium-grained parallel computer which has a grain size of about 10ms.
1901: 1750: 1723: 1548: 1512: 1502: 1461: 1303: 1283: 1278: 1259: 1214: 1167: 1076: 256:
is an example of coarse-grained parallel computer which has a grain size of about 20s.
222: 20: 1947: 1623: 1581: 1476: 1206: 1159: 1066: 1020: 994: 976: 965: 929: 1171: 1080: 1957: 1756: 1691: 1538: 1354: 1349: 1344: 1313: 1218: 1198: 1149: 1058: 957: 881: 180: 32: 1821: 1761: 1696: 1543: 1533: 1466: 1456: 1298: 1288: 245: 1952: 1768: 1425: 1318: 2001: 1841: 1718: 1210: 1163: 961: 198: 1441: 1186: 1062: 1962: 1138:"Automatic Determination of Grain Size for Efficient Parallel Processing" 925:
Advanced Computer Architecture: Parallelism, Scalability, Programmability
36: 28: 1154: 1137: 956:. Lecture Notes in Computer Science. Vol. 2328. pp. 145–153. 856: 269: 42:
Another definition of granularity takes into account the communication
1836: 1811: 1228: 252: 229: 1202: 143:{\displaystyle G={\frac {T_{\mathrm {comp} }}{T_{\mathrm {comm} }}}} 1886: 1866: 1791: 205: 1016:
Parallel Algorithms for Regular Architectures: Meshes and Pyramids
1891: 1871: 1846: 1481: 852: 212: 16:
Measure of the amount of work needed to perform a computing task
1861: 1856: 216: 1896: 1826: 1816: 155:
Granularity is usually measured in terms of the number of
1806: 1783: 846: 87: 72:
denotes the communication time, then the granularity
1185:Kruatrachue, Boontee; Lewis, Ted (1 January 1988). 1136:McCreary, Carolyn; Gill, Helen (1 September 1989). 208:responsibility to detect fine-grained parallelism. 1187:"Grain Size Determination for Parallel Processing" 142: 759:The highest amount of parallelism is achieved at 1999: 1184: 313:Coarse-grain : Pseudocode for 2 processors 310:Medium-grain : Pseudocode for 25 processors 307:Fine-grain : Pseudocode for 100 processors 928:(1st ed.). McGraw-Hill Higher Education. 1244: 1093: 1019:. Cambridge, Mass.: MIT Press. pp. 5–6. 947: 945: 259: 236: 1135: 1012: 954:Parallel Processing and Applied Mathematics 951: 174: 1251: 1237: 942: 1153: 1099: 1052: 1042: 1013:Miller, Russ; Stout, Quentin F. (1996). 917: 737: 915: 913: 911: 909: 907: 905: 903: 901: 899: 897: 166: 35:is a measure of the amount of work (or 2000: 1258: 1038: 1036: 1008: 1006: 855:. However, synchronization overhead, 1232: 921: 288:example of fine-grained parallelism. 894: 847:Impact of granularity on performance 1033: 1003: 731:Computation time - 50 clock cycles 39:) which is performed by that task. 13: 1047:. Vol. 18. pp. 239–248. 996:Introduction to Parallel Computing 992: 132: 129: 126: 123: 111: 108: 105: 102: 14: 2019: 728:Computation time - 4 clock cycles 1981: 1980: 725:Computation time - 1 clock cycle 76:of a task can be calculated as: 2008:Analysis of parallel algorithms 1452:Analysis of parallel algorithms 1178: 1129: 1087: 986: 1: 1399:Simultaneous and heterogenous 877:Instruction-level parallelism 1987:Category: Parallel computing 887: 301:coarse-grained parallelism. 63:is the computation time and 7: 870: 298:Coarse-grained parallelism: 292:Medium-grained parallelism: 10: 2024: 1294:High-performance computing 276: 260:Medium-grained parallelism 237:Coarse-grained parallelism 178: 1976: 1928:Automatic parallelization 1920: 1782: 1622: 1572: 1564:Application checkpointing 1526: 1490: 1434: 1378: 1327: 1266: 285:Fine-grained parallelism: 224:Connection Machine (CM-2) 962:10.1007/3-540-48086-2_16 630: 477: 318: 175:Fine-grained parallelism 1943:Embarrassingly parallel 1938:Deterministic algorithm 1658:Associative processing 1614:Non-blocking algorithm 1420:Clustered multi-thread 1117:Cite journal requires 144: 1774:Hardware acceleration 1687:Superscalar processor 1677:Dataflow architecture 1274:Distributed computing 1063:10.1145/325164.325150 752:Sub-routine level and 738:Levels of parallelism 145: 1653:Pipelined processing 1602:Explicit parallelism 1597:Implicit parallelism 1587:Dataflow programming 167:Types of parallelism 85: 1877:Parallel Extensions 1682:Pipelined processor 1155:10.1145/66451.66454 922:Hwang, Kai (1992). 763:level, followed by 1751:Massively parallel 1729:distributed shared 1549:Cache invalidation 1513:Instruction window 1304:Manycore processor 1284:Massively parallel 1279:Parallel computing 1260:Parallel computing 746:Instruction level. 140: 21:parallel computing 1995: 1994: 1948:Parallel slowdown 1582:Stream processing 1472:Karp–Flatt metric 844: 843: 823:Sub-routine level 801:Instruction level 735: 734: 138: 46:between multiple 2015: 1984: 1983: 1958:Software lockout 1757:Computer cluster 1692:Vector processor 1647:Array processing 1632:Flynn's taxonomy 1539:Memory coherence 1314:Computer network 1253: 1246: 1239: 1230: 1229: 1223: 1222: 1182: 1176: 1175: 1157: 1148:(9): 1073–1078. 1133: 1127: 1126: 1120: 1115: 1113: 1105: 1103: 1091: 1085: 1084: 1056: 1040: 1031: 1030: 1010: 1001: 1000: 993:Barney, Blaise. 990: 984: 975: 949: 940: 939: 919: 882:Data Parallelism 787: 786: 718: 715: 712: 709: 706: 703: 700: 697: 694: 691: 688: 685: 682: 679: 676: 673: 670: 667: 664: 661: 658: 655: 652: 649: 646: 643: 640: 637: 634: 625: 622: 619: 616: 613: 610: 607: 604: 601: 598: 595: 592: 589: 586: 583: 580: 577: 574: 571: 568: 565: 562: 559: 556: 553: 550: 547: 544: 541: 538: 535: 532: 529: 526: 523: 520: 517: 514: 511: 508: 505: 502: 499: 496: 493: 490: 487: 484: 481: 472: 469: 466: 463: 460: 457: 454: 451: 448: 445: 442: 439: 436: 433: 430: 427: 424: 421: 418: 415: 412: 409: 406: 403: 400: 397: 394: 391: 388: 385: 382: 379: 376: 373: 370: 367: 364: 361: 358: 355: 352: 349: 346: 343: 340: 337: 334: 331: 328: 325: 322: 304: 303: 181:Microparallelism 149: 147: 146: 141: 139: 137: 136: 135: 116: 115: 114: 95: 75: 71: 62: 2023: 2022: 2018: 2017: 2016: 2014: 2013: 2012: 1998: 1997: 1996: 1991: 1972: 1916: 1822:Coarray Fortran 1778: 1762:Beowulf cluster 1618: 1568: 1559:Synchronization 1544:Cache coherence 1534:Multiprocessing 1522: 1486: 1467:Cost efficiency 1462:Gustafson's law 1430: 1374: 1323: 1299:Multiprocessing 1289:Cloud computing 1262: 1257: 1227: 1226: 1203:10.1109/52.1991 1183: 1179: 1134: 1130: 1118: 1116: 1107: 1106: 1092: 1088: 1073: 1041: 1034: 1027: 1011: 1004: 991: 987: 972: 950: 943: 936: 920: 895: 890: 873: 849: 740: 720: 719: 716: 713: 710: 707: 704: 701: 698: 695: 692: 689: 686: 683: 680: 677: 674: 671: 668: 665: 662: 659: 656: 653: 650: 647: 644: 641: 638: 635: 632: 627: 626: 623: 620: 617: 614: 611: 608: 605: 602: 599: 596: 593: 590: 587: 584: 581: 578: 575: 572: 569: 566: 563: 560: 557: 554: 551: 548: 545: 542: 539: 536: 533: 530: 527: 524: 521: 518: 515: 512: 509: 506: 503: 500: 497: 494: 491: 488: 485: 482: 479: 474: 473: 470: 467: 464: 461: 458: 455: 452: 449: 446: 443: 440: 437: 434: 431: 428: 425: 422: 419: 416: 413: 410: 407: 404: 401: 398: 395: 392: 389: 386: 383: 380: 377: 374: 371: 368: 365: 362: 359: 356: 353: 350: 347: 344: 341: 338: 335: 332: 329: 326: 323: 320: 279: 262: 246:Message-passing 239: 184: 177: 169: 122: 121: 117: 101: 100: 96: 94: 86: 83: 82: 73: 70: 64: 61: 55: 17: 12: 11: 5: 2021: 2011: 2010: 1993: 1992: 1990: 1989: 1977: 1974: 1973: 1971: 1970: 1965: 1960: 1955: 1953:Race condition 1950: 1945: 1940: 1935: 1930: 1924: 1922: 1918: 1917: 1915: 1914: 1909: 1904: 1899: 1894: 1889: 1884: 1879: 1874: 1869: 1864: 1859: 1854: 1849: 1844: 1839: 1834: 1829: 1824: 1819: 1814: 1809: 1804: 1799: 1794: 1788: 1786: 1780: 1779: 1777: 1776: 1771: 1766: 1765: 1764: 1754: 1748: 1747: 1746: 1741: 1736: 1731: 1726: 1721: 1711: 1710: 1709: 1704: 1697:Multiprocessor 1694: 1689: 1684: 1679: 1674: 1673: 1672: 1667: 1662: 1661: 1660: 1655: 1650: 1639: 1628: 1626: 1620: 1619: 1617: 1616: 1611: 1610: 1609: 1604: 1599: 1589: 1584: 1578: 1576: 1570: 1569: 1567: 1566: 1561: 1556: 1551: 1546: 1541: 1536: 1530: 1528: 1524: 1523: 1521: 1520: 1515: 1510: 1505: 1500: 1494: 1492: 1488: 1487: 1485: 1484: 1479: 1474: 1469: 1464: 1459: 1454: 1449: 1444: 1438: 1436: 1432: 1431: 1429: 1428: 1426:Hardware scout 1423: 1417: 1412: 1407: 1401: 1396: 1390: 1384: 1382: 1380:Multithreading 1376: 1375: 1373: 1372: 1367: 1362: 1357: 1352: 1347: 1342: 1337: 1331: 1329: 1325: 1324: 1322: 1321: 1319:Systolic array 1316: 1311: 1306: 1301: 1296: 1291: 1286: 1281: 1276: 1270: 1268: 1264: 1263: 1256: 1255: 1248: 1241: 1233: 1225: 1224: 1177: 1128: 1119:|journal= 1101:10.1.1.66.3298 1086: 1071: 1054:10.1.1.51.3389 1032: 1025: 1002: 985: 970: 941: 935:978-0070316225 934: 892: 891: 889: 886: 885: 884: 879: 872: 869: 848: 845: 842: 841: 838: 835: 831: 830: 827: 824: 820: 819: 816: 813: 809: 808: 805: 802: 798: 797: 794: 791: 757: 756: 753: 750: 747: 739: 736: 733: 732: 729: 726: 722: 721: 631: 628: 478: 475: 319: 315: 314: 311: 308: 278: 275: 261: 258: 238: 235: 189:load balancing 179:For more, see 176: 173: 168: 165: 153: 152: 151: 150: 134: 131: 128: 125: 120: 113: 110: 107: 104: 99: 93: 90: 68: 59: 15: 9: 6: 4: 3: 2: 2020: 2009: 2006: 2005: 2003: 1988: 1979: 1978: 1975: 1969: 1966: 1964: 1961: 1959: 1956: 1954: 1951: 1949: 1946: 1944: 1941: 1939: 1936: 1934: 1931: 1929: 1926: 1925: 1923: 1919: 1913: 1910: 1908: 1905: 1903: 1900: 1898: 1895: 1893: 1890: 1888: 1885: 1883: 1880: 1878: 1875: 1873: 1870: 1868: 1865: 1863: 1860: 1858: 1855: 1853: 1850: 1848: 1845: 1843: 1842:Global Arrays 1840: 1838: 1835: 1833: 1830: 1828: 1825: 1823: 1820: 1818: 1815: 1813: 1810: 1808: 1805: 1803: 1800: 1798: 1795: 1793: 1790: 1789: 1787: 1785: 1781: 1775: 1772: 1770: 1769:Grid computer 1767: 1763: 1760: 1759: 1758: 1755: 1752: 1749: 1745: 1742: 1740: 1737: 1735: 1732: 1730: 1727: 1725: 1722: 1720: 1717: 1716: 1715: 1712: 1708: 1705: 1703: 1700: 1699: 1698: 1695: 1693: 1690: 1688: 1685: 1683: 1680: 1678: 1675: 1671: 1668: 1666: 1663: 1659: 1656: 1654: 1651: 1648: 1645: 1644: 1643: 1640: 1638: 1635: 1634: 1633: 1630: 1629: 1627: 1625: 1621: 1615: 1612: 1608: 1605: 1603: 1600: 1598: 1595: 1594: 1593: 1590: 1588: 1585: 1583: 1580: 1579: 1577: 1575: 1571: 1565: 1562: 1560: 1557: 1555: 1552: 1550: 1547: 1545: 1542: 1540: 1537: 1535: 1532: 1531: 1529: 1525: 1519: 1516: 1514: 1511: 1509: 1506: 1504: 1501: 1499: 1496: 1495: 1493: 1489: 1483: 1480: 1478: 1475: 1473: 1470: 1468: 1465: 1463: 1460: 1458: 1455: 1453: 1450: 1448: 1445: 1443: 1440: 1439: 1437: 1433: 1427: 1424: 1421: 1418: 1416: 1413: 1411: 1408: 1405: 1402: 1400: 1397: 1394: 1391: 1389: 1386: 1385: 1383: 1381: 1377: 1371: 1368: 1366: 1363: 1361: 1358: 1356: 1353: 1351: 1348: 1346: 1343: 1341: 1338: 1336: 1333: 1332: 1330: 1326: 1320: 1317: 1315: 1312: 1310: 1307: 1305: 1302: 1300: 1297: 1295: 1292: 1290: 1287: 1285: 1282: 1280: 1277: 1275: 1272: 1271: 1269: 1265: 1261: 1254: 1249: 1247: 1242: 1240: 1235: 1234: 1231: 1220: 1216: 1212: 1208: 1204: 1200: 1196: 1192: 1188: 1181: 1173: 1169: 1165: 1161: 1156: 1151: 1147: 1143: 1139: 1132: 1124: 1111: 1102: 1097: 1090: 1082: 1078: 1074: 1072:0-89791-366-3 1068: 1064: 1060: 1055: 1050: 1046: 1039: 1037: 1028: 1026:9780262132336 1022: 1018: 1017: 1009: 1007: 998: 997: 989: 982: 981:9783540480860 978: 973: 971:9783540437925 967: 963: 959: 955: 948: 946: 937: 931: 927: 926: 918: 916: 914: 912: 910: 908: 906: 904: 902: 900: 898: 893: 883: 880: 878: 875: 874: 868: 864: 860: 858: 854: 839: 836: 834:Program level 833: 832: 828: 825: 822: 821: 817: 814: 811: 810: 806: 803: 800: 799: 795: 792: 789: 788: 785: 782: 780: 779:program-level 775: 773: 768: 766: 762: 755:Program-level 754: 751: 748: 745: 744: 743: 730: 727: 724: 723: 629: 476: 317: 316: 312: 309: 306: 305: 302: 299: 295: 293: 289: 286: 282: 274: 272: 271: 266: 257: 255: 254: 249: 247: 243: 234: 232: 231: 226: 225: 220: 218: 214: 209: 207: 202: 200: 199:Shared memory 195: 192: 190: 182: 172: 164: 162: 158: 118: 97: 91: 88: 81: 80: 79: 78: 77: 67: 58: 52: 49: 45: 40: 38: 34: 30: 26: 22: 1527:Coordination 1457:Amdahl's law 1393:Simultaneous 1197:(1): 23–32. 1194: 1190: 1180: 1145: 1141: 1131: 1110:cite journal 1089: 1044: 1015: 995: 988: 953: 924: 865: 861: 850: 796:Parallelism 783: 778: 776: 771: 769: 764: 760: 758: 741: 651:Processor_ID 498:Processor_ID 339:Processor_ID 297: 296: 291: 290: 284: 283: 280: 268: 267: 263: 251: 250: 244: 240: 228: 223: 221: 210: 203: 196: 193: 187:facilitates 185: 170: 157:instructions 154: 65: 56: 53: 41: 24: 18: 1963:Scalability 1724:distributed 1607:Concurrency 1574:Programming 1415:Cooperative 1404:Speculative 1340:Instruction 1142:Commun. ACM 772:sub-routine 761:instruction 37:computation 31:size) of a 25:granularity 1968:Starvation 1707:asymmetric 1442:PRAM model 1410:Preemptive 1191:IEEE Softw 857:scheduling 812:Loop level 793:Grain Size 765:loop-level 749:Loop level 270:Intel iPSC 206:compilers' 159:which are 48:processors 1702:symmetric 1447:PEM model 1211:0740-7459 1164:0001-0782 1096:CiteSeerX 1049:CiteSeerX 888:Citations 829:Moderate 818:Moderate 253:Cray Y-MP 230:J-Machine 2002:Category 1933:Deadlock 1921:Problems 1887:pthreads 1867:OpenHMPP 1792:Ateji PX 1753:computer 1624:Hardware 1491:Elements 1477:Slowdown 1388:Temporal 1370:Pipeline 1172:14807217 1081:16193537 871:See also 807:Highest 699:elements 672:elements 606:elements 573:elements 546:elements 519:elements 161:executed 44:overhead 1892:RaftLib 1872:OpenACC 1847:GPUOpen 1837:C++ AMP 1812:Charm++ 1554:Barrier 1498:Process 1482:Speedup 1267:General 1219:2034255 853:speedup 770:At the 696:Compute 669:Compute 603:Compute 570:Compute 543:Compute 516:Compute 453:element 450:Compute 414:element 411:Compute 387:element 384:Compute 360:element 357:Compute 277:Example 215:in our 213:neurons 1985:  1862:OpenCL 1857:OpenMP 1802:Chapel 1719:shared 1714:Memory 1649:(SIMT) 1592:Models 1503:Thread 1435:Theory 1406:(SpMT) 1360:Memory 1345:Thread 1328:Levels 1217:  1209:  1170:  1162:  1098:  1079:  1069:  1051:  1023:  979:  968:  932:  840:Least 837:Coarse 826:Medium 790:Levels 702:51-100 645:switch 609:97-100 492:switch 333:switch 1832:Dryad 1797:Boost 1518:Array 1508:Fiber 1422:(CMT) 1395:(SMT) 1309:GPGPU 1215:S2CID 1168:S2CID 1077:S2CID 708:break 681:break 615:break 582:break 555:break 528:break 462:break 423:break 396:break 369:break 217:brain 29:grain 1897:ROCm 1827:CUDA 1817:Cilk 1784:APIs 1744:COMA 1739:NUMA 1670:MIMD 1665:MISD 1642:SIMD 1637:SISD 1365:Loop 1355:Data 1350:Task 1207:ISSN 1160:ISSN 1123:help 1067:ISBN 1021:ISBN 977:ISBN 966:ISBN 930:ISBN 815:Fine 804:Fine 687:case 675:1-50 660:case 636:main 633:void 594:case 576:9-12 561:case 534:case 507:case 483:main 480:void 441:case 402:case 375:case 348:case 324:main 321:void 227:and 69:comm 60:comp 33:task 27:(or 1912:ZPL 1907:TBB 1902:UPC 1882:PVM 1852:MPI 1807:HPX 1734:UMA 1335:Bit 1199:doi 1150:doi 1059:doi 958:doi 777:At 549:5-8 522:1-4 456:100 444:100 54:If 19:In 2004:: 1213:. 1205:. 1193:. 1189:. 1166:. 1158:. 1146:32 1144:. 1140:. 1114:: 1112:}} 1108:{{ 1075:. 1065:. 1057:. 1035:^ 1005:^ 964:. 944:^ 896:^ 639:() 597:25 486:() 327:() 219:. 191:. 23:, 1252:e 1245:t 1238:v 1221:. 1201:: 1195:5 1174:. 1152:: 1125:) 1121:( 1104:. 1083:. 1061:: 1029:. 999:. 983:. 974:. 960:: 938:. 717:} 714:} 711:; 705:; 693:: 690:2 684:; 678:; 666:: 663:1 657:{ 654:) 648:( 642:{ 624:} 621:} 618:; 612:; 600:: 591:. 588:. 585:; 579:; 567:: 564:3 558:; 552:; 540:: 537:2 531:; 525:; 513:: 510:1 504:{ 501:) 495:( 489:{ 471:} 468:} 465:; 459:; 447:: 438:. 435:. 432:. 429:. 426:; 420:; 417:3 408:: 405:3 399:; 393:; 390:2 381:: 378:2 372:; 366:; 363:1 354:: 351:1 345:{ 342:) 336:( 330:{ 183:. 133:m 130:m 127:o 124:c 119:T 112:p 109:m 106:o 103:c 98:T 92:= 89:G 74:G 66:T 57:T

Index

parallel computing
grain
task
computation
overhead
processors
instructions
executed
Microparallelism
load balancing
Shared memory
compilers'
neurons
brain
Connection Machine (CM-2)
J-Machine
Message-passing
Cray Y-MP
Intel iPSC
speedup
scheduling
Instruction-level parallelism
Data Parallelism






Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.