621:
477:
359:
and load port architectures of modern processors. In particular, memory requests in modern processors have to be fulfilled in fixed width (e.g., size of a cacheline). The tiled storage of AoSoA aligns the memory access pattern to the requests' fixed width, leading to fewer access operations to complete a memory request and thus increasing the efficiency.
358:
is a hybrid approach between the previous layouts, in which data for different fields is interleaved using tiles or blocks with size equal to the SIMD vector size. This is often less intuitive, but can achieve the memory throughput of the SoA approach, while being more friendly to the cache locality
576:
4D vectors with the SIMD register to leverage the associated data path and instructions, while still providing programmer convenience, although this does not scale to SIMD units wider than four lanes.
118:). If only a specific part of the record is needed, only those parts need to be iterated over, allowing more data to fit onto a single cache line. The downside is requiring more
238:) is the opposite (and more conventional) layout, in which data for different fields is interleaved. This is often more intuitive, and supported directly by most
792:
561:
560:
libraries is to de-interleave data from the AoS format when loading sources into registers, and interleave when writing out results (facilitated by the
107:
859:
Fei, Yun (Raymond); Huang, Yuhan; Gao, Ming (2021), "Principles towards Real-Time
Simulation of Material Point Method on Modern GPUs", pp. 1–16,
362:
For example, to store N points in 3D space using an array of structures of arrays with a SIMD register width of 8 floats (or 8×32 = 256 bits):
904:
459:
A different width may be needed depending on the actual SIMD register width. The interior arrays may be replaced with SIMD types such as
756:
The Julia package StructArrays.jl allows for accessing SoA as AoS to combine the performance of SoA with the intuitiveness of AoS.
569:
588:
data on machines with four-lane SIMD hardware. SIMD ISAs are usually designed for homogeneous data, however some provide a
879:
685:
600:
52:
657:
704:
520:
502:
664:
772:
642:
553:
487:
746:
671:
750:
638:
99:
813:
653:
911:
719:
83:
36:
742:
596:
91:
990:
631:
498:
834:
771:
Automated creation of AoSoA is more complex. An example of AoSoA in metaprogramming is found in
87:
533:
It is possible to split some subset of a structure (rather than each individual field) into a
557:
538:
239:
734:
542:
8:
723:
678:
565:
70:
44:
939:
860:
123:
40:
607:
using SoA instead of AoS can still give better performance due to memory coalescing.
494:
111:
775:'s Cabana library written in C++; it assumes a vector width of 16 lanes by default.
95:
730:
883:
604:
573:
534:
64:
556:
to load homogeneous data from the SoA format. Yet another option used in some
541:
if different pieces of fields are used at different times in the program (see
984:
103:
966:
592:
instruction and additional permutes, making the AoS case easier to handle.
760:
935:
589:
245:
For example, to store N points in 3D space using an array of structures:
129:
For example, to store N points in 3D space using a structure of arrays:
585:
119:
20:
620:
505:. Statements consisting only of original research should be removed.
865:
718:
Most languages support the AoS format more naturally by combining
764:
115:
753:'s DataFrames.jl package, are interfaces to access SoA like AoS.
549:
48:
793:"How to Manipulate Data Structure to Optimize Memory Use"
905:"Modern GPU Architecture (See Scalar Unified Pipelines)"
599:
hardware has moved away from 4D instructions to scalar
584:
AoS vs. SoA presents a choice when considering 3D or
94:. The motivation is easier manipulation with packed
645:. Unsourced material may be challenged and removed.
880:"Intel SSE4 Floating Point Dot Product Intrinsics"
537: – and this can actually improve
982:
729:SoA is mostly found in languages, libraries, or
342:
759:Code generators for the C language, including
35:are contrasting ways to arrange a sequence of
858:
864:
705:Learn how and when to remove this message
521:Learn how and when to remove this message
82:) is a layout separating elements of a
983:
226:
122:when traversing data, and inefficient
58:
16:Parallel computing data layout methods
33:array of structures of arrays (AoSoA)
643:adding citations to reliable sources
614:
470:
934:
610:
13:
14:
1002:
741:"Data frames," as implemented in
463:for languages with such support.
110:, possibly transferred by a wide
619:
475:
814:"Memory Layout Transformations"
630:needs additional citations for
554:strided load/store instructions
466:
460:
959:
940:"CUDA Optimization Strategies"
928:
897:
872:
852:
827:
806:
785:
90:) into one parallel array per
1:
778:
579:
348:Array of structures of arrays
343:Array of structures of arrays
100:instruction set architectures
7:
947:CS4803 Design Game Consoles
501:the claims made and adding
10:
1007:
68:
62:
47:, and are of interest in
29:structure of arrays (SoA)
25:array of structures (AoS)
967:"ECP-copa/Cabana: AoSoA"
910:. NVIDIA. Archived from
835:"Kernel Profiling Guide"
733:tools used to support a
364:
247:
131:
882:. Intel. Archived from
749:'s Pandas package, and
570:vector maths libraries
552:architectures provide
356:tiled array of structs
88:C programming language
539:locality of reference
240:programming languages
840:. NVIDIA. 2022-12-01
737:. Examples include:
735:data-oriented design
639:improve this article
543:data oriented design
86:(or 'struct' in the
816:. Intel. 2019-03-26
795:. Intel. 2012-02-09
724:abstract data types
232:Array of structures
227:Array of structures
76:Structure of arrays
71:Planar image format
59:Structure of arrays
722:and various array
603:pipelines, modern
486:possibly contains
124:indexed addressing
715:
714:
707:
689:
562:superscalar issue
531:
530:
523:
488:original research
112:internal datapath
102:, since a single
96:SIMD instructions
43:, with regard to
998:
975:
974:
963:
957:
956:
954:
953:
944:
932:
926:
925:
923:
922:
916:
909:
901:
895:
894:
892:
891:
876:
870:
869:
868:
856:
850:
848:
846:
845:
839:
831:
825:
824:
822:
821:
810:
804:
803:
801:
800:
789:
710:
703:
699:
696:
690:
688:
647:
623:
615:
611:Software support
526:
519:
515:
512:
506:
503:inline citations
479:
478:
471:
462:
455:
452:
449:
446:
443:
440:
437:
434:
431:
428:
425:
422:
419:
416:
413:
410:
407:
404:
401:
398:
395:
392:
389:
386:
383:
380:
377:
374:
371:
368:
338:
335:
332:
329:
326:
323:
320:
317:
314:
311:
308:
305:
302:
299:
296:
293:
290:
287:
284:
281:
278:
275:
272:
269:
266:
263:
260:
257:
254:
251:
222:
219:
216:
213:
210:
207:
204:
201:
198:
195:
192:
189:
186:
183:
180:
177:
174:
171:
168:
165:
162:
159:
156:
153:
150:
147:
144:
141:
138:
135:
108:homogeneous data
1006:
1005:
1001:
1000:
999:
997:
996:
995:
981:
980:
979:
978:
965:
964:
960:
951:
949:
942:
933:
929:
920:
918:
914:
907:
903:
902:
898:
889:
887:
878:
877:
873:
857:
853:
843:
841:
837:
833:
832:
828:
819:
817:
812:
811:
807:
798:
796:
791:
790:
786:
781:
731:metaprogramming
717:
711:
700:
694:
691:
648:
646:
636:
624:
613:
605:compute kernels
582:
527:
516:
510:
507:
492:
480:
476:
469:
457:
456:
453:
450:
447:
444:
441:
438:
435:
432:
429:
426:
423:
420:
417:
414:
411:
408:
405:
402:
399:
396:
393:
390:
387:
384:
381:
378:
375:
372:
369:
366:
345:
340:
339:
336:
333:
330:
327:
324:
321:
318:
315:
312:
309:
306:
303:
300:
297:
294:
291:
288:
285:
282:
279:
276:
273:
270:
267:
264:
261:
258:
255:
252:
249:
229:
224:
223:
220:
217:
214:
211:
208:
205:
202:
199:
196:
193:
190:
187:
184:
181:
178:
175:
172:
169:
166:
163:
160:
157:
154:
151:
148:
145:
142:
139:
136:
133:
73:
67:
61:
17:
12:
11:
5:
1004:
994:
993:
991:SIMD computing
977:
976:
958:
938:(2010-02-08).
927:
896:
871:
851:
826:
805:
783:
782:
780:
777:
769:
768:
757:
754:
713:
712:
627:
625:
618:
612:
609:
595:Although most
581:
578:
574:floating point
535:parallel array
529:
528:
483:
481:
474:
468:
465:
365:
344:
341:
248:
228:
225:
132:
65:Parallel array
63:Main article:
60:
57:
15:
9:
6:
4:
3:
2:
1003:
992:
989:
988:
986:
972:
968:
962:
948:
941:
937:
931:
917:on 2018-05-17
913:
906:
900:
886:on 2016-06-24
885:
881:
875:
867:
862:
855:
836:
830:
815:
809:
794:
788:
784:
776:
774:
766:
762:
758:
755:
752:
748:
744:
740:
739:
738:
736:
732:
727:
725:
721:
709:
706:
698:
687:
684:
680:
677:
673:
670:
666:
663:
659:
656: –
655:
654:"AoS and SoA"
651:
650:Find sources:
644:
640:
634:
633:
628:This article
626:
622:
617:
616:
608:
606:
602:
598:
593:
591:
587:
577:
575:
571:
567:
563:
559:
555:
551:
546:
544:
540:
536:
525:
522:
514:
504:
500:
496:
490:
489:
484:This section
482:
473:
472:
464:
363:
360:
357:
353:
349:
246:
243:
241:
237:
233:
130:
127:
125:
121:
117:
113:
109:
105:
104:SIMD register
101:
97:
93:
89:
85:
81:
77:
72:
66:
56:
55:programming.
54:
50:
46:
42:
38:
34:
30:
26:
22:
970:
961:
950:. Retrieved
946:
936:Kim, Hyesoon
930:
919:. Retrieved
912:the original
899:
888:. Retrieved
884:the original
874:
854:
842:. Retrieved
829:
818:. Retrieved
808:
797:. Retrieved
787:
770:
728:
716:
701:
692:
682:
675:
668:
661:
649:
637:Please help
632:verification
629:
594:
583:
547:
532:
517:
508:
485:
467:Alternatives
458:
361:
355:
351:
347:
346:
244:
235:
231:
230:
128:
79:
75:
74:
45:interleaving
32:
28:
24:
18:
590:dot product
511:August 2019
421:get_point_x
304:get_point_x
188:get_point_x
176:pointlist3D
137:pointlist3D
952:2019-03-17
921:2019-03-17
890:2019-03-17
866:2111.00699
844:2022-01-14
820:2019-06-02
799:2019-03-17
779:References
767:technique.
665:newspapers
580:4D vectors
495:improve it
120:cache ways
69:See also:
695:July 2023
586:4D vector
499:verifying
461:float32x8
409:point3Dx8
370:point3Dx8
106:can load
21:computing
985:Category
763:and the
761:Datadraw
568:). Some
566:permutes
98:in most
765:X Macro
720:records
679:scholar
493:Please
292:point3D
253:point3D
116:128-bit
37:records
971:GitHub
747:Python
681:
674:
667:
660:
652:
572:align
442:points
439:return
412:points
406:struct
367:struct
325:points
322:return
295:points
289:struct
250:struct
209:points
206:return
179:points
173:struct
134:struct
114:(e.g.
84:record
41:memory
943:(PDF)
915:(PDF)
908:(PDF)
861:arXiv
838:(PDF)
751:Julia
686:JSTOR
672:books
548:Some
418:float
394:float
385:float
376:float
354:) or
352:AoSoA
301:float
277:float
268:float
259:float
185:float
161:float
152:float
143:float
92:field
23:, an
773:LANL
658:news
601:SIMT
558:Cell
550:SIMD
53:SIMT
51:and
49:SIMD
641:by
597:GPU
564:of
545:).
497:by
427:int
310:int
236:AoS
194:int
80:SoA
39:in
31:or
19:In
987::
969:.
945:.
745:,
726:.
403:};
286:};
242:.
170:};
126:.
27:,
973:.
955:.
924:.
893:.
863::
849:)
847:.
823:.
802:.
743:R
708:)
702:(
697:)
693:(
683:·
676:·
669:·
662:·
635:.
524:)
518:(
513:)
509:(
491:.
454:}
451:;
448:x
445:.
436:{
433:)
430:i
424:(
415:;
400:;
397:z
391:;
388:y
382:;
379:x
373:{
350:(
337:}
334:;
331:x
328:.
319:{
316:)
313:i
307:(
298:;
283:;
280:z
274:;
271:y
265:;
262:x
256:{
234:(
221:}
218:;
215:x
212:.
203:{
200:)
197:i
191:(
182:;
167:;
164:z
158:;
155:y
149:;
146:x
140:{
78:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.