Knowledge

AoS and SoA

Source 📝

621: 477: 359:
and load port architectures of modern processors. In particular, memory requests in modern processors have to be fulfilled in fixed width (e.g., size of a cacheline). The tiled storage of AoSoA aligns the memory access pattern to the requests' fixed width, leading to fewer access operations to complete a memory request and thus increasing the efficiency.
358:
is a hybrid approach between the previous layouts, in which data for different fields is interleaved using tiles or blocks with size equal to the SIMD vector size. This is often less intuitive, but can achieve the memory throughput of the SoA approach, while being more friendly to the cache locality
576:
4D vectors with the SIMD register to leverage the associated data path and instructions, while still providing programmer convenience, although this does not scale to SIMD units wider than four lanes.
118:). If only a specific part of the record is needed, only those parts need to be iterated over, allowing more data to fit onto a single cache line. The downside is requiring more 238:) is the opposite (and more conventional) layout, in which data for different fields is interleaved. This is often more intuitive, and supported directly by most 792: 561: 560:
libraries is to de-interleave data from the AoS format when loading sources into registers, and interleave when writing out results (facilitated by the
107: 859:
Fei, Yun (Raymond); Huang, Yuhan; Gao, Ming (2021), "Principles towards Real-Time Simulation of Material Point Method on Modern GPUs", pp. 1–16,
362:
For example, to store N points in 3D space using an array of structures of arrays with a SIMD register width of 8 floats (or 8×32 = 256 bits):
904: 459:
A different width may be needed depending on the actual SIMD register width. The interior arrays may be replaced with SIMD types such as
756:
The Julia package StructArrays.jl allows for accessing SoA as AoS to combine the performance of SoA with the intuitiveness of AoS.
569: 588:
data on machines with four-lane SIMD hardware. SIMD ISAs are usually designed for homogeneous data, however some provide a
879: 685: 600: 52: 657: 704: 520: 502: 664: 772: 642: 553: 487: 746: 671: 750: 638: 99: 813: 653: 911: 719: 83: 36: 742: 596: 91: 990: 631: 498: 834: 771:
Automated creation of AoSoA is more complex. An example of AoSoA in metaprogramming is found in
87: 533:
It is possible to split some subset of a structure (rather than each individual field) into a
557: 538: 239: 734: 542: 8: 723: 678: 565: 70: 44: 939: 860: 123: 40: 607:
using SoA instead of AoS can still give better performance due to memory coalescing.
494: 111: 775:'s Cabana library written in C++; it assumes a vector width of 16 lanes by default. 95: 730: 883: 604: 573: 534: 64: 556:
to load homogeneous data from the SoA format. Yet another option used in some
541:
if different pieces of fields are used at different times in the program (see
984: 103: 966: 592:
instruction and additional permutes, making the AoS case easier to handle.
760: 935: 589: 245:
For example, to store N points in 3D space using an array of structures:
129:
For example, to store N points in 3D space using a structure of arrays:
585: 119: 20: 620: 505:. Statements consisting only of original research should be removed. 865: 718:
Most languages support the AoS format more naturally by combining
764: 115: 753:'s DataFrames.jl package, are interfaces to access SoA like AoS. 549: 48: 793:"How to Manipulate Data Structure to Optimize Memory Use" 905:"Modern GPU Architecture (See Scalar Unified Pipelines)" 599:
hardware has moved away from 4D instructions to scalar
584:
AoS vs. SoA presents a choice when considering 3D or
94:. The motivation is easier manipulation with packed 645:. Unsourced material may be challenged and removed. 880:"Intel SSE4 Floating Point Dot Product Intrinsics" 537: – and this can actually improve 982: 729:SoA is mostly found in languages, libraries, or 342: 759:Code generators for the C language, including 35:are contrasting ways to arrange a sequence of 858: 864: 705:Learn how and when to remove this message 521:Learn how and when to remove this message 82:) is a layout separating elements of a 983: 226: 122:when traversing data, and inefficient 58: 16:Parallel computing data layout methods 33:array of structures of arrays (AoSoA) 643:adding citations to reliable sources 614: 470: 934: 610: 13: 14: 1002: 741:"Data frames," as implemented in 463:for languages with such support. 110:, possibly transferred by a wide 619: 475: 814:"Memory Layout Transformations" 630:needs additional citations for 554:strided load/store instructions 466: 460: 959: 940:"CUDA Optimization Strategies" 928: 897: 872: 852: 827: 806: 785: 90:) into one parallel array per 1: 778: 579: 348:Array of structures of arrays 343:Array of structures of arrays 100:instruction set architectures 7: 947:CS4803 Design Game Consoles 501:the claims made and adding 10: 1007: 68: 62: 47:, and are of interest in 29:structure of arrays (SoA) 25:array of structures (AoS) 967:"ECP-copa/Cabana: AoSoA" 910:. NVIDIA. Archived from 835:"Kernel Profiling Guide" 733:tools used to support a 364: 247: 131: 882:. Intel. Archived from 749:'s Pandas package, and 570:vector maths libraries 552:architectures provide 356:tiled array of structs 88:C programming language 539:locality of reference 240:programming languages 840:. NVIDIA. 2022-12-01 737:. Examples include: 735:data-oriented design 639:improve this article 543:data oriented design 86:(or 'struct' in the 816:. Intel. 2019-03-26 795:. Intel. 2012-02-09 724:abstract data types 232:Array of structures 227:Array of structures 76:Structure of arrays 71:Planar image format 59:Structure of arrays 722:and various array 603:pipelines, modern 486:possibly contains 124:indexed addressing 715: 714: 707: 689: 562:superscalar issue 531: 530: 523: 488:original research 112:internal datapath 102:, since a single 96:SIMD instructions 43:, with regard to 998: 975: 974: 963: 957: 956: 954: 953: 944: 932: 926: 925: 923: 922: 916: 909: 901: 895: 894: 892: 891: 876: 870: 869: 868: 856: 850: 848: 846: 845: 839: 831: 825: 824: 822: 821: 810: 804: 803: 801: 800: 789: 710: 703: 699: 696: 690: 688: 647: 623: 615: 611:Software support 526: 519: 515: 512: 506: 503:inline citations 479: 478: 471: 462: 455: 452: 449: 446: 443: 440: 437: 434: 431: 428: 425: 422: 419: 416: 413: 410: 407: 404: 401: 398: 395: 392: 389: 386: 383: 380: 377: 374: 371: 368: 338: 335: 332: 329: 326: 323: 320: 317: 314: 311: 308: 305: 302: 299: 296: 293: 290: 287: 284: 281: 278: 275: 272: 269: 266: 263: 260: 257: 254: 251: 222: 219: 216: 213: 210: 207: 204: 201: 198: 195: 192: 189: 186: 183: 180: 177: 174: 171: 168: 165: 162: 159: 156: 153: 150: 147: 144: 141: 138: 135: 108:homogeneous data 1006: 1005: 1001: 1000: 999: 997: 996: 995: 981: 980: 979: 978: 965: 964: 960: 951: 949: 942: 933: 929: 920: 918: 914: 907: 903: 902: 898: 889: 887: 878: 877: 873: 857: 853: 843: 841: 837: 833: 832: 828: 819: 817: 812: 811: 807: 798: 796: 791: 790: 786: 781: 731:metaprogramming 717: 711: 700: 694: 691: 648: 646: 636: 624: 613: 605:compute kernels 582: 527: 516: 510: 507: 492: 480: 476: 469: 457: 456: 453: 450: 447: 444: 441: 438: 435: 432: 429: 426: 423: 420: 417: 414: 411: 408: 405: 402: 399: 396: 393: 390: 387: 384: 381: 378: 375: 372: 369: 366: 345: 340: 339: 336: 333: 330: 327: 324: 321: 318: 315: 312: 309: 306: 303: 300: 297: 294: 291: 288: 285: 282: 279: 276: 273: 270: 267: 264: 261: 258: 255: 252: 249: 229: 224: 223: 220: 217: 214: 211: 208: 205: 202: 199: 196: 193: 190: 187: 184: 181: 178: 175: 172: 169: 166: 163: 160: 157: 154: 151: 148: 145: 142: 139: 136: 133: 73: 67: 61: 17: 12: 11: 5: 1004: 994: 993: 991:SIMD computing 977: 976: 958: 938:(2010-02-08). 927: 896: 871: 851: 826: 805: 783: 782: 780: 777: 769: 768: 757: 754: 713: 712: 627: 625: 618: 612: 609: 595:Although most 581: 578: 574:floating point 535:parallel array 529: 528: 483: 481: 474: 468: 465: 365: 344: 341: 248: 228: 225: 132: 65:Parallel array 63:Main article: 60: 57: 15: 9: 6: 4: 3: 2: 1003: 992: 989: 988: 986: 972: 968: 962: 948: 941: 937: 931: 917:on 2018-05-17 913: 906: 900: 886:on 2016-06-24 885: 881: 875: 867: 862: 855: 836: 830: 815: 809: 794: 788: 784: 776: 774: 766: 762: 758: 755: 752: 748: 744: 740: 739: 738: 736: 732: 727: 725: 721: 709: 706: 698: 687: 684: 680: 677: 673: 670: 666: 663: 659: 656: –  655: 654:"AoS and SoA" 651: 650:Find sources: 644: 640: 634: 633: 628:This article 626: 622: 617: 616: 608: 606: 602: 598: 593: 591: 587: 577: 575: 571: 567: 563: 559: 555: 551: 546: 544: 540: 536: 525: 522: 514: 504: 500: 496: 490: 489: 484:This section 482: 473: 472: 464: 363: 360: 357: 353: 349: 246: 243: 241: 237: 233: 130: 127: 125: 121: 117: 113: 109: 105: 104:SIMD register 101: 97: 93: 89: 85: 81: 77: 72: 66: 56: 55:programming. 54: 50: 46: 42: 38: 34: 30: 26: 22: 970: 961: 950:. Retrieved 946: 936:Kim, Hyesoon 930: 919:. Retrieved 912:the original 899: 888:. Retrieved 884:the original 874: 854: 842:. Retrieved 829: 818:. Retrieved 808: 797:. Retrieved 787: 770: 728: 716: 701: 692: 682: 675: 668: 661: 649: 637:Please help 632:verification 629: 594: 583: 547: 532: 517: 508: 485: 467:Alternatives 458: 361: 355: 351: 347: 346: 244: 235: 231: 230: 128: 79: 75: 74: 45:interleaving 32: 28: 24: 18: 590:dot product 511:August 2019 421:get_point_x 304:get_point_x 188:get_point_x 176:pointlist3D 137:pointlist3D 952:2019-03-17 921:2019-03-17 890:2019-03-17 866:2111.00699 844:2022-01-14 820:2019-06-02 799:2019-03-17 779:References 767:technique. 665:newspapers 580:4D vectors 495:improve it 120:cache ways 69:See also: 695:July 2023 586:4D vector 499:verifying 461:float32x8 409:point3Dx8 370:point3Dx8 106:can load 21:computing 985:Category 763:and the 761:Datadraw 568:). Some 566:permutes 98:in most 765:X Macro 720:records 679:scholar 493:Please 292:point3D 253:point3D 116:128-bit 37:records 971:GitHub 747:Python 681:  674:  667:  660:  652:  572:align 442:points 439:return 412:points 406:struct 367:struct 325:points 322:return 295:points 289:struct 250:struct 209:points 206:return 179:points 173:struct 134:struct 114:(e.g. 84:record 41:memory 943:(PDF) 915:(PDF) 908:(PDF) 861:arXiv 838:(PDF) 751:Julia 686:JSTOR 672:books 548:Some 418:float 394:float 385:float 376:float 354:) or 352:AoSoA 301:float 277:float 268:float 259:float 185:float 161:float 152:float 143:float 92:field 23:, an 773:LANL 658:news 601:SIMT 558:Cell 550:SIMD 53:SIMT 51:and 49:SIMD 641:by 597:GPU 564:of 545:). 497:by 427:int 310:int 236:AoS 194:int 80:SoA 39:in 31:or 19:In 987:: 969:. 945:. 745:, 726:. 403:}; 286:}; 242:. 170:}; 126:. 27:, 973:. 955:. 924:. 893:. 863:: 849:) 847:. 823:. 802:. 743:R 708:) 702:( 697:) 693:( 683:· 676:· 669:· 662:· 635:. 524:) 518:( 513:) 509:( 491:. 454:} 451:; 448:x 445:. 436:{ 433:) 430:i 424:( 415:; 400:; 397:z 391:; 388:y 382:; 379:x 373:{ 350:( 337:} 334:; 331:x 328:. 319:{ 316:) 313:i 307:( 298:; 283:; 280:z 274:; 271:y 265:; 262:x 256:{ 234:( 221:} 218:; 215:x 212:. 203:{ 200:) 197:i 191:( 182:; 167:; 164:z 158:; 155:y 149:; 146:x 140:{ 78:(

Index

computing
records
memory
interleaving
SIMD
SIMT
Parallel array
Planar image format
record
C programming language
field
SIMD instructions
instruction set architectures
SIMD register
homogeneous data
internal datapath
128-bit
cache ways
indexed addressing
programming languages
original research
improve it
verifying
inline citations
Learn how and when to remove this message
parallel array
locality of reference
data oriented design
SIMD
strided load/store instructions

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.