228:
215:, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc. Another benefit of using them is that no source code modifications are needed in general. However, the types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations.
251:
AMD introduced methods to mitigate some of these drawbacks. For example, the
Opteron processors have implemented in 2007 a technique known as Instruction Based Sampling (or IBS). AMD's implementation of IBS provides hardware counters for both fetch sampling (the front of the superscalar pipeline)
247:
at one time. These "in-flight" instructions can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors. This can cause performance counter events to be attributed to the wrong instructions, making precise performance analysis difficult or
218:
There can be difficulties correlating the low level performance metrics back to source code. The limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics.
60:
model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction.
252:
and op sampling (the back of the pipeline). This results in discrete performance data associating retired instructions with the "parent" AMD64 instruction.
78:
40:
to store the counts of hardware-related activities within computer systems. Advanced users often rely on those counters to conduct low-level
66:
310:
413:
212:
41:
380:
341:
57:
381:"Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors"
244:
90:
The following table shows some examples of CPUs and the number of available hardware counters:
362:
288:"Are hardware performance counters a cost effective way for integrity checking of programs"
175:
8:
418:
316:
45:
33:
306:
320:
64:
One of the first processors to implement such counter and an associated instruction
298:
83:
56:
The number of available hardware counters in a processor is limited while each
37:
407:
165:
155:
302:
261:
240:
232:
115:
266:
195:
295:
Proceedings of the sixth ACM workshop on
Scalable trusted computing
287:
17:
74:
227:
185:
135:
145:
125:
104:
71:
387:
286:
Malone, Corey; Zahran, Mohamed; Karri, Ramesh (2011).
243:
processors schedule and execute multiple instructions
81:wrote an article about reverse engineering them in
285:
405:
222:
206:
355:
226:
406:
77:, but they were not documented until
13:
51:
14:
430:
363:"Documentation – Arm Developer"
373:
334:
279:
231:Output of an IBS profile from
1:
272:
32:are a set of special-purpose
22:hardware performance counters
7:
255:
10:
435:
223:Instruction based sampling
207:Versus software techniques
414:Central processing unit
303:10.1145/2046582.2046596
236:
99:available HW counters
230:
211:Compared to software
70:to access it was the
176:ARM Cortex-A9 MPCore
42:performance analysis
297:. pp. 71–76.
237:
36:built into modern
367:developer.arm.com
342:"Pentium Secrets"
204:
203:
30:hardware counters
426:
398:
397:
395:
394:
385:
377:
371:
370:
359:
353:
352:
350:
349:
338:
332:
331:
329:
327:
292:
283:
93:
92:
69:
434:
433:
429:
428:
427:
425:
424:
423:
404:
403:
402:
401:
392:
390:
383:
379:
378:
374:
361:
360:
356:
347:
345:
340:
339:
335:
325:
323:
313:
290:
284:
280:
275:
258:
225:
209:
65:
54:
52:Implementations
38:microprocessors
12:
11:
5:
432:
422:
421:
416:
400:
399:
372:
354:
333:
311:
277:
276:
274:
271:
270:
269:
264:
257:
254:
224:
221:
208:
205:
202:
201:
198:
192:
191:
188:
182:
181:
178:
172:
171:
168:
162:
161:
158:
152:
151:
148:
142:
141:
138:
132:
131:
128:
122:
121:
118:
112:
111:
108:
101:
100:
97:
79:Terje Mathisen
53:
50:
9:
6:
4:
3:
2:
431:
420:
417:
415:
412:
411:
409:
389:
382:
376:
368:
364:
358:
344:. Gamedev.net
343:
337:
322:
318:
314:
312:9781450310017
308:
304:
300:
296:
289:
282:
278:
268:
265:
263:
260:
259:
253:
249:
246:
242:
234:
229:
220:
216:
214:
199:
197:
194:
193:
189:
187:
184:
183:
179:
177:
174:
173:
169:
167:
166:ARM Cortex-A8
164:
163:
159:
157:
156:ARM Cortex-A5
154:
153:
149:
147:
144:
143:
139:
137:
134:
133:
129:
127:
124:
123:
119:
117:
114:
113:
109:
106:
103:
102:
98:
95:
94:
91:
88:
86:
85:
80:
76:
73:
68:
62:
59:
49:
47:
43:
39:
35:
31:
27:
23:
19:
391:. Retrieved
375:
366:
357:
346:. Retrieved
336:
324:. Retrieved
294:
281:
262:perf (Linux)
250:
248:impossible.
245:out-of-order
238:
217:
210:
89:
82:
63:
55:
29:
25:
21:
15:
326:17 November
241:superscalar
233:CodeAnalyst
116:Pentium III
87:July 1994.
408:Categories
393:2015-10-16
348:2012-02-14
273:References
267:Row hammer
136:AMD Athlon
105:UltraSparc
419:Profilers
213:profilers
196:Pentium 4
96:Processor
34:registers
18:computers
321:16409864
256:See also
239:Modern
75:Pentium
319:
309:
186:POWER4
46:tuning
28:), or
384:(PDF)
317:S2CID
291:(PDF)
146:IA-64
126:ARM11
72:Intel
67:RDPMC
328:2022
307:ISBN
84:Byte
388:AMD
299:doi
200:18
107:II
58:CPU
44:or
26:HPC
16:In
410::
386:.
365:.
315:.
305:.
293:.
190:8
180:6
170:4
160:2
150:4
140:4
130:2
120:2
110:2
48:.
20:,
396:.
369:.
351:.
330:.
301::
235:.
24:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.