Knowledge

Hardware performance counter

Source 📝

228: 215:, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc. Another benefit of using them is that no source code modifications are needed in general. However, the types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations. 251:
AMD introduced methods to mitigate some of these drawbacks. For example, the Opteron processors have implemented in 2007 a technique known as Instruction Based Sampling (or IBS). AMD's implementation of IBS provides hardware counters for both fetch sampling (the front of the superscalar pipeline)
247:
at one time. These "in-flight" instructions can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors. This can cause performance counter events to be attributed to the wrong instructions, making precise performance analysis difficult or
218:
There can be difficulties correlating the low level performance metrics back to source code. The limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics.
60:
model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction.
252:
and op sampling (the back of the pipeline). This results in discrete performance data associating retired instructions with the "parent" AMD64 instruction.
78: 40:
to store the counts of hardware-related activities within computer systems. Advanced users often rely on those counters to conduct low-level
66: 310: 413: 212: 41: 380: 341: 57: 381:"Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors" 244: 90:
The following table shows some examples of CPUs and the number of available hardware counters:
362: 288:"Are hardware performance counters a cost effective way for integrity checking of programs" 175: 8: 418: 316: 45: 33: 306: 320: 64:
One of the first processors to implement such counter and an associated instruction
298: 83: 56:
The number of available hardware counters in a processor is limited while each
37: 407: 165: 155: 302: 261: 240: 232: 115: 266: 195: 295:
Proceedings of the sixth ACM workshop on Scalable trusted computing
287: 17: 74: 227: 185: 135: 145: 125: 104: 71: 387: 286:
Malone, Corey; Zahran, Mohamed; Karri, Ramesh (2011).
243:
processors schedule and execute multiple instructions
81:wrote an article about reverse engineering them in 285: 405: 222: 206: 355: 226: 406: 77:, but they were not documented until 13: 51: 14: 430: 363:"Documentation – Arm Developer" 373: 334: 279: 231:Output of an IBS profile from 1: 272: 32:are a set of special-purpose 22:hardware performance counters 7: 255: 10: 435: 223:Instruction based sampling 207:Versus software techniques 414:Central processing unit 303:10.1145/2046582.2046596 236: 99:available HW counters 230: 211:Compared to software 70:to access it was the 176:ARM Cortex-A9 MPCore 42:performance analysis 297:. pp. 71–76. 237: 36:built into modern 367:developer.arm.com 342:"Pentium Secrets" 204: 203: 30:hardware counters 426: 398: 397: 395: 394: 385: 377: 371: 370: 359: 353: 352: 350: 349: 338: 332: 331: 329: 327: 292: 283: 93: 92: 69: 434: 433: 429: 428: 427: 425: 424: 423: 404: 403: 402: 401: 392: 390: 383: 379: 378: 374: 361: 360: 356: 347: 345: 340: 339: 335: 325: 323: 313: 290: 284: 280: 275: 258: 225: 209: 65: 54: 52:Implementations 38:microprocessors 12: 11: 5: 432: 422: 421: 416: 400: 399: 372: 354: 333: 311: 277: 276: 274: 271: 270: 269: 264: 257: 254: 224: 221: 208: 205: 202: 201: 198: 192: 191: 188: 182: 181: 178: 172: 171: 168: 162: 161: 158: 152: 151: 148: 142: 141: 138: 132: 131: 128: 122: 121: 118: 112: 111: 108: 101: 100: 97: 79:Terje Mathisen 53: 50: 9: 6: 4: 3: 2: 431: 420: 417: 415: 412: 411: 409: 389: 382: 376: 368: 364: 358: 344:. Gamedev.net 343: 337: 322: 318: 314: 312:9781450310017 308: 304: 300: 296: 289: 282: 278: 268: 265: 263: 260: 259: 253: 249: 246: 242: 234: 229: 220: 216: 214: 199: 197: 194: 193: 189: 187: 184: 183: 179: 177: 174: 173: 169: 167: 166:ARM Cortex-A8 164: 163: 159: 157: 156:ARM Cortex-A5 154: 153: 149: 147: 144: 143: 139: 137: 134: 133: 129: 127: 124: 123: 119: 117: 114: 113: 109: 106: 103: 102: 98: 95: 94: 91: 88: 86: 85: 80: 76: 73: 68: 62: 59: 49: 47: 43: 39: 35: 31: 27: 23: 19: 391:. Retrieved 375: 366: 357: 346:. Retrieved 336: 324:. Retrieved 294: 281: 262:perf (Linux) 250: 248:impossible. 245:out-of-order 238: 217: 210: 89: 82: 63: 55: 29: 25: 21: 15: 326:17 November 241:superscalar 233:CodeAnalyst 116:Pentium III 87:July 1994. 408:Categories 393:2015-10-16 348:2012-02-14 273:References 267:Row hammer 136:AMD Athlon 105:UltraSparc 419:Profilers 213:profilers 196:Pentium 4 96:Processor 34:registers 18:computers 321:16409864 256:See also 239:Modern 75:Pentium 319:  309:  186:POWER4 46:tuning 28:), or 384:(PDF) 317:S2CID 291:(PDF) 146:IA-64 126:ARM11 72:Intel 67:RDPMC 328:2022 307:ISBN 84:Byte 388:AMD 299:doi 200:18 107:II 58:CPU 44:or 26:HPC 16:In 410:: 386:. 365:. 315:. 305:. 293:. 190:8 180:6 170:4 160:2 150:4 140:4 130:2 120:2 110:2 48:. 20:, 396:. 369:. 351:. 330:. 301:: 235:. 24:(

Index

computers
registers
microprocessors
performance analysis
tuning
CPU
RDPMC
Intel
Pentium
Terje Mathisen
Byte
UltraSparc
Pentium III
ARM11
AMD Athlon
IA-64
ARM Cortex-A5
ARM Cortex-A8
ARM Cortex-A9 MPCore
POWER4
Pentium 4
profilers

CodeAnalyst
superscalar
out-of-order
perf (Linux)
Row hammer
"Are hardware performance counters a cost effective way for integrity checking of programs"
doi

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.