64:
184:
Loop interchange on this example can improve the cache performance of accessing b(j,i), but it will ruin the reuse of a(i) and c(i) in the inner loop, as it introduces two extra loads (for a(i) and for c(i)) and one extra store (for a(i)) during each iteration. As a result, the overall performance
80:
occur if the contiguously accessed array elements within the loop come from a different cache block, and loop interchange can help prevent this. The effectiveness of loop interchange depends on and must be considered in light of the cache model used by the underlying hardware and the array model
75:
when accessing array elements. When a processor accesses an array element for the first time, it will retrieve an entire block of data from memory to cache. That block is likely to have many more consecutive elements after the first one, so on the next array element access, it will be brought
193:
It is not always safe to exchange the iteration variables due to dependencies between statements for the order in which they must execute. To determine whether a compiler can safely interchange loops,
28:. The variable used in the inner loop switches to the outer loop, and vice versa. It is often done to ensure that the elements of a multi-dimensional
329:
476:
102:. Thus the order of two iteration variables in the first example is suitable for a C program while the second example is better for FORTRAN.
510:
114:
Loop interchange may lead to worse performance because cache performance is only part of the story. Take the following example:
290:
322:
682:
559:
460:
266:
708:
601:
315:
106:
can detect the improper ordering by programmers and interchange the order to achieve better cache performance.
496:
580:
631:
596:
520:
395:
278:
375:
85:
380:
216:
52:
528:
505:
481:
410:
657:
652:
606:
533:
357:
352:
33:
687:
611:
455:
241:
29:
8:
667:
538:
430:
338:
194:
103:
51:
On occasion, such a transformation may create opportunities to further optimize, such as
662:
626:
445:
435:
385:
543:
367:
296:
286:
677:
616:
486:
465:
425:
88:, array elements in the same row are stored consecutively in memory (a, a, a) ‒ in
25:
24:
is the process of exchanging the order of two iteration variables used by a nested
672:
90:
17:
77:
647:
621:
420:
415:
400:
274:
221:
206:
702:
98:
programs store array elements from the same column together (a, a, a), using
76:
directly from cache (which is faster than getting it from slow main memory).
63:
211:
271:
Optimizing
Compilers for Modern Architectures: A Dependence-Based Approach
390:
307:
32:
are accessed in the order in which they are present in memory, improving
470:
564:
72:
282:
300:
71:
The major purpose of loop interchange is to take advantage of the
95:
48:
for i from 0 to 10 for j from 0 to 20 a = i + j
42:
for j from 0 to 20 for i from 0 to 10 a = i + j
58:
700:
477:Induction variable recognition and elimination
323:
265:
249:Parallel Programming Guide for HP-UX Systems
67:Illustration of row- and column-major order
337:
330:
316:
185:may be degraded after loop interchange.
62:
511:Sparse conditional constant propagation
701:
273:(2011 digital print of 1st ed.).
311:
39:For example, in the code fragment:
13:
259:
45:loop interchange would result in:
14:
720:
461:Common subexpression elimination
602:Compile-time function execution
59:The utility of loop interchange
234:
1:
227:
581:Interprocedural optimization
7:
632:Profile-guided optimization
597:Bounds-checking elimination
200:
10:
725:
396:Loop-invariant code motion
279:Morgan Kaufmann Publishers
55:of the array assignments.
640:
589:
573:
552:
519:
495:
444:
376:Automatic parallelization
366:
345:
188:
109:
116:
381:Automatic vectorization
269:; Allen, Randy (2002).
217:Loop fission and fusion
53:automatic vectorization
709:Compiler optimizations
529:Instruction scheduling
506:Global value numbering
482:Live-variable analysis
411:Loop nest optimization
339:Compiler optimizations
86:C programming language
81:used by the compiler.
68:
658:Control-flow analysis
653:Array-access analysis
607:Dead-code elimination
565:Tail-call elimination
534:Instruction selection
358:Local value numbering
353:Peephole optimization
94:. On the other hand,
66:
34:locality of reference
688:Value range analysis
612:Expression templates
456:Available expression
104:Optimizing compilers
668:Dependence analysis
539:Register allocation
431:Software pipelining
195:dependence analysis
663:Data-flow analysis
627:Partial evaluation
436:Strength reduction
386:Induction variable
251:. HP. August 2003.
242:"Loop interchange"
69:
696:
695:
544:Rematerialization
292:978-1-55860-286-1
716:
678:Pointer analysis
617:Inline expansion
487:Use-define chain
466:Constant folding
426:Loop unswitching
406:Loop interchange
332:
325:
318:
309:
308:
304:
253:
252:
246:
238:
180:
177:
174:
171:
168:
165:
162:
159:
156:
153:
150:
147:
144:
141:
138:
135:
132:
129:
126:
123:
120:
22:loop interchange
724:
723:
719:
718:
717:
715:
714:
713:
699:
698:
697:
692:
673:Escape analysis
641:Static analysis
636:
585:
569:
548:
521:Code generation
515:
491:
447:
440:
362:
341:
336:
293:
262:
260:Further reading
257:
256:
244:
240:
239:
235:
230:
203:
191:
182:
181:
178:
175:
172:
169:
166:
163:
160:
157:
154:
151:
148:
145:
142:
139:
136:
133:
130:
127:
124:
121:
118:
112:
91:row-major order
61:
49:
43:
18:compiler theory
12:
11:
5:
722:
712:
711:
694:
693:
691:
690:
685:
683:Shape analysis
680:
675:
670:
665:
660:
655:
650:
648:Alias analysis
644:
642:
638:
637:
635:
634:
629:
624:
622:Jump threading
619:
614:
609:
604:
599:
593:
591:
587:
586:
584:
583:
577:
575:
571:
570:
568:
567:
562:
556:
554:
550:
549:
547:
546:
541:
536:
531:
525:
523:
517:
516:
514:
513:
508:
502:
500:
493:
492:
490:
489:
484:
479:
474:
468:
463:
458:
452:
450:
442:
441:
439:
438:
433:
428:
423:
421:Loop unrolling
418:
416:Loop splitting
413:
408:
403:
401:Loop inversion
398:
393:
388:
383:
378:
372:
370:
364:
363:
361:
360:
355:
349:
347:
343:
342:
335:
334:
327:
320:
312:
306:
305:
291:
275:Academic Press
261:
258:
255:
254:
232:
231:
229:
226:
225:
224:
222:Loop unrolling
219:
214:
209:
207:Loop splitting
202:
199:
190:
187:
117:
111:
108:
60:
57:
47:
41:
9:
6:
4:
3:
2:
721:
710:
707:
706:
704:
689:
686:
684:
681:
679:
676:
674:
671:
669:
666:
664:
661:
659:
656:
654:
651:
649:
646:
645:
643:
639:
633:
630:
628:
625:
623:
620:
618:
615:
613:
610:
608:
605:
603:
600:
598:
595:
594:
592:
588:
582:
579:
578:
576:
572:
566:
563:
561:
560:Deforestation
558:
557:
555:
551:
545:
542:
540:
537:
535:
532:
530:
527:
526:
524:
522:
518:
512:
509:
507:
504:
503:
501:
498:
494:
488:
485:
483:
480:
478:
475:
472:
469:
467:
464:
462:
459:
457:
454:
453:
451:
449:
443:
437:
434:
432:
429:
427:
424:
422:
419:
417:
414:
412:
409:
407:
404:
402:
399:
397:
394:
392:
389:
387:
384:
382:
379:
377:
374:
373:
371:
369:
365:
359:
356:
354:
351:
350:
348:
344:
340:
333:
328:
326:
321:
319:
314:
313:
310:
302:
298:
294:
288:
284:
280:
276:
272:
268:
264:
263:
250:
243:
237:
233:
223:
220:
218:
215:
213:
210:
208:
205:
204:
198:
197:is required.
196:
186:
115:
107:
105:
101:
97:
93:
92:
87:
82:
79:
74:
65:
56:
54:
46:
40:
37:
35:
31:
27:
23:
19:
405:
270:
267:Kennedy, Ken
248:
236:
212:Loop skewing
192:
183:
113:
100:column-major
99:
89:
83:
78:Cache misses
70:
50:
44:
38:
21:
15:
473:elimination
391:Loop fusion
346:Basic block
553:Functional
471:Dead store
301:2001092381
228:References
446:Data-flow
73:CPU cache
703:Category
448:analysis
283:Elsevier
201:See also
96:FORTRAN
574:Global
499:-based
299:
289:
189:Safety
179:end do
176:end do
110:Caveat
590:Other
245:(PDF)
134:10000
30:array
368:Loop
297:LCCN
287:ISBN
152:1000
26:loop
497:SSA
137:do
119:do
84:In
16:In
705::
295:.
285:.
281:/
277:/
247:.
36:.
20:,
331:e
324:t
317:v
303:.
173:c
170:*
167:b
164:+
161:a
158:=
155:a
149:,
146:1
143:=
140:j
131:,
128:1
125:=
122:i
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.