Knowledge

Cache control instruction

Source 📝

367:
traditional microprocessor can more easily run legacy code, which may then be accelerated by cache control instructions, whilst a scratchpad based machine requires dedicated coding from the ground up to even function. Cache control instructions are specific to a certain cache line size, which in practice may vary between generations of processors in the same architectural family. Caches may also help coalescing reads and writes from less predictable access patterns (e.g., during
22: 326:
devote more transistors to accelerating code written in traditional languages, e.g., performing automatic prefetch, with hardware to detect linear access patterns on the fly. However the techniques may remain valid for throughput-oriented processors, which have a different throughput vs latency
366:
The disadvantage is it requires significantly different programming techniques to use. It is very hard to adapt programs written in traditional languages such as C and C++ which present the programmer with a uniform view of a large address space (which is an illusion simulated by caches). A
441: 409:; calculations may be put on hold awaiting future data, whilst the execution units are devoted to working on data from past requests data that has already turned up. This is easier for programmers to leverage in conjunction with the appropriate programming models ( 268:
This hint is used to discard cache lines, without committing their contents to main memory. Care is needed since incorrect results are possible. Unlike other cache hints, the semantics of the program are significantly modified. This is used in conjunction with
448: 305:
instruction set, which suggests that data will only be used once, i.e., the cache line in question may be pushed to the head of the eviction queue, whilst keeping it in use if still directly needed.
405:
to achieve high throughput whilst working around memory latency (reducing the need for prefetching). Many read operations are issued in parallel, for subsequent invocations of a
281:
This hint requests the immediate eviction of a cache line, making way for future allocations. It is used when it is known that data is no longer part of the
355:. These allow greater control over memory traffic and locality (as the working set is managed by explicit transfers), and eliminates the need for expensive 252:
This hint is used to prepare cache lines before overwriting the contents completely. In this example, the CPU needn't load anything from
86: 39: 58: 318:
In recent times, cache control instructions have become less popular as increasingly advanced application processor designs from
65: 72: 54: 294: 210:, which is useful in a 'streaming' context for data that is traversed once, rather than held in the working set. The 105: 478: 43: 198:, the effect is to request loading the cache line associated with a given address. This is performed by the 79: 394: 229: 402: 371:), whilst scratchpad DMA requires reworking algorithms for more predictable 'linear' traversals. 219: 32: 158:. Most cache control instructions do not affect the semantics of a program, although some can. 374:
As such scratchpads are generally harder to use with traditional programming models, although
348: 215: 139: 131: 273:
for managing temporary data. This saves unneeded main memory bandwidth and cache pollution.
340: 8: 421: 222: 154:, reduce bandwidth requirement, bypass latencies, by providing better control over the 336: 211: 175: 390: 171: 368: 356: 352: 207: 167: 151: 127: 410: 406: 135: 472: 166:
Several such instructions, with variants, are supported by several processor
416:
The disadvantage is that many copies of temporary states may be held in the
260:
of a cache-line sized block to zero, but the operation is effectively free.
417: 323: 344: 282: 253: 155: 218:
of memory access, for example in a loop traversing memory linearly. The
379: 143: 119: 21: 398: 375: 360: 147: 327:
tradeoff, and may prefer to devote more area to execution units.
302: 179: 257: 319: 233: 214:
should occur sufficiently far ahead in time to mitigate the
206:
instruction set. Some variants bypass higher levels of the
203: 183: 442:"Power PC manual, see 1.10.3 Cache Control Instructions" 228:
can be used to invoke this in the programming languages
413:), but harder to apply to general purpose programming. 46:. Unsourced material may be challenged and removed. 256:. The semantic effect is equivalent to an aligned 247: 244:A variant of prefetch for the instruction cache. 470: 263: 297:that also imply cache hints. An example is 347:when needed. This approach is used by the 276: 106:Learn how and when to remove this message 339:into which temporaries may be put, and 239: 134:intended to improve the performance of 471: 313: 293:Some processors support a variant of 330: 44:adding citations to reliable sources 15: 343:(DMA) to transfer data to and from 13: 14: 490: 20: 385: 308: 31:needs additional citations for 434: 288: 248:Data cache block allocate zero 1: 427: 138:, using foreknowledge of the 7: 424:, awaiting data in flight. 264:Data cache block invalidate 189: 161: 55:"Cache control instruction" 10: 495: 382:) might be more suitable. 126:is a hint embedded in the 124:cache control instruction 395:graphics processing unit 335:Some processors support 295:load–store instructions 220:GNU Compiler Collection 170:architectures, such as 277:Data cache block flush 196:data cache block touch 479:Computer architecture 140:memory access pattern 393:(for example modern 341:direct memory access 240:Instruction prefetch 40:improve this article 202:instruction in the 422:processing element 314:Automatic prefetch 226:__builtin_prefetch 223:intrinsic function 150:. They may reduce 391:Vector processors 337:scratchpad memory 331:Scratchpad memory 116: 115: 108: 90: 486: 463: 462: 460: 459: 453: 447:. Archived from 446: 438: 378:models (such as 353:embedded systems 300: 272: 227: 201: 142:supplied by the 111: 104: 100: 97: 91: 89: 48: 24: 16: 494: 493: 489: 488: 487: 485: 484: 483: 469: 468: 467: 466: 457: 455: 451: 444: 440: 439: 435: 430: 411:compute kernels 388: 369:texture mapping 357:cache coherency 333: 316: 311: 298: 291: 279: 270: 266: 250: 242: 225: 208:cache hierarchy 199: 192: 168:instruction set 164: 152:cache pollution 136:hardware caches 112: 101: 95: 92: 49: 47: 37: 25: 12: 11: 5: 492: 482: 481: 465: 464: 432: 431: 429: 426: 407:compute kernel 401:) use massive 387: 384: 349:Cell processor 332: 329: 315: 312: 310: 307: 290: 287: 278: 275: 265: 262: 249: 246: 241: 238: 191: 188: 163: 160: 114: 113: 96:September 2016 28: 26: 19: 9: 6: 4: 3: 2: 491: 480: 477: 476: 474: 454:on 2016-10-13 450: 443: 437: 433: 425: 423: 419: 414: 412: 408: 404: 400: 396: 392: 383: 381: 377: 372: 370: 364: 362: 358: 354: 350: 346: 342: 338: 328: 325: 321: 306: 304: 296: 286: 284: 274: 271:allocate zero 261: 259: 255: 245: 237: 235: 231: 224: 221: 217: 213: 209: 205: 197: 187: 185: 181: 177: 173: 169: 159: 157: 153: 149: 145: 141: 137: 133: 129: 125: 121: 110: 107: 99: 88: 85: 81: 78: 74: 71: 67: 64: 60: 57: –  56: 52: 51:Find sources: 45: 41: 35: 34: 29:This article 27: 23: 18: 17: 456:. Retrieved 449:the original 436: 418:local memory 415: 389: 386:Vector fetch 373: 365: 334: 317: 309:Alternatives 292: 280: 267: 251: 243: 195: 194:Also termed 193: 165: 130:stream of a 123: 117: 102: 93: 83: 76: 69: 62: 50: 38:Please help 33:verification 30: 403:parallelism 397:(GPUs) and 351:, and some 345:main memory 289:Other hints 283:working set 254:main memory 156:working set 128:instruction 458:2016-06-11 428:References 380:TensorFlow 144:programmer 66:newspapers 363:machine. 299:load last 132:processor 120:computing 473:Category 399:Xeon Phi 376:dataflow 361:manycore 212:prefetch 200:PREFETCH 190:Prefetch 162:Examples 148:compiler 303:PowerPC 301:in the 216:latency 180:PowerPC 80:scholar 258:memset 182:, and 82:  75:  68:  61:  53:  452:(PDF) 445:(PDF) 420:of a 359:in a 320:Intel 87:JSTOR 73:books 322:and 176:MIPS 122:, a 59:news 324:ARM 234:C++ 232:or 204:x86 184:x86 172:ARM 146:or 118:In 42:by 475:: 285:. 236:. 186:. 178:, 174:, 461:. 230:C 109:) 103:( 98:) 94:( 84:· 77:· 70:· 63:· 36:.

Index


verification
improve this article
adding citations to reliable sources
"Cache control instruction"
news
newspapers
books
scholar
JSTOR
Learn how and when to remove this message
computing
instruction
processor
hardware caches
memory access pattern
programmer
compiler
cache pollution
working set
instruction set
ARM
MIPS
PowerPC
x86
x86
cache hierarchy
prefetch
latency
GNU Compiler Collection

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.