1078:
1687:, which produce distorted probability distributions. It is particularly effective for max-margin methods such as SVMs and boosted trees, which show sigmoidal distortions in their predicted probabilities, but has less of an effect with well-
1297:
1401:
1074:, i.e. a classification that not only gives an answer, but also a degree of certainty about the answer. Some classification models do not provide such a probability, or give poor probability estimates.
1553:
1617:
1652:
to a model of out-of-sample data that has a uniform prior over the labels. The constants 1 and 2, on the numerator and denominator respectively, are derived from the application of
1195:
1136:
1430:
1072:
866:
904:
1751:
1795:
1775:
1710:
model to an ill-calibrated probability model. This has been shown to work better than Platt scaling, in particular when enough training data is available.
861:
851:
692:
899:
856:
707:
438:
1713:
Platt scaling can also be applied to deep neural network classifiers. For image classification, such as CIFAR-100, small networks like
939:
742:
1206:
1333:
818:
367:
876:
639:
174:
2026:
894:
1664:
727:
702:
651:
1488:
1660:
775:
770:
423:
433:
71:
1864:
1679:
Platt scaling has been shown to be effective for SVMs as well as other types of classification models, including
973:
1567:
932:
828:
592:
413:
2021:
1475:
1459:
803:
505:
281:
1896:
969:
760:
697:
607:
585:
428:
418:
1680:
911:
823:
808:
269:
91:
1968:
1648:
are the number of positive and negative samples, respectively. This transformation follows by applying
965:
871:
798:
548:
443:
231:
164:
124:
1869:"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods"
1200:
Platt scaling is an algorithm to solve the aforementioned problem. It produces probability estimates
1143:
1084:
925:
531:
299:
169:
1806:
1688:
553:
473:
396:
314:
144:
106:
101:
61:
56:
1718:
1330:
parameters that are learned by the algorithm. Note that predictions can now be made according to
977:
500:
349:
249:
76:
1684:
680:
656:
558:
319:
294:
254:
66:
1922:
1406:
1034:
1696:
997:
634:
456:
408:
264:
179:
51:
1327:
563:
513:
1960:
8:
1728:
1707:
1692:
1668:
985:
666:
602:
573:
478:
304:
237:
223:
209:
184:
134:
86:
46:
1780:
1760:
1653:
644:
568:
354:
149:
1004:, we want to determine whether they belong to one of two classes, arbitrarily labeled
984:, but can be applied to other classification models. Platt scaling works by fitting a
1432:
the probability estimates contain a correction compared to the old decision function
1304:
1012:. We assume that the classification problem will be solved by a real-valued function
737:
580:
493:
289:
259:
204:
199:
154:
96:
1977:
1939:
1905:
1868:
1754:
1462:
method that optimizes on the same training set as that for the original classifier
953:
765:
518:
468:
378:
362:
332:
194:
189:
139:
129:
27:
981:
793:
597:
463:
403:
1998:
1924:
1649:
1471:
813:
344:
81:
1982:
1944:
2015:
1923:
Olivier
Chapelle; Vladimir Vapnik; Olivier Bousquet; Sayan Mukherjee (2002).
1827:
1797:
is optimized on a held-out calibration set to minimize the calibration loss.
1721:
has high accuracy but is overconfident in predictions. A 2017 paper proposed
1700:
732:
661:
543:
274:
159:
1909:
1467:
538:
32:
1997:
Guo, Chuan; Pleiss, Geoff; Sun, Yu; Weinberger, Kilian Q. (2017-07-17).
1725:, which simply multiplies the output logits of a network by a constant
687:
383:
309:
1961:"A note on Platt's probabilistic outputs for support vector machines"
1478:
can be used, but Platt additionally suggests transforming the labels
846:
627:
2003:
Proceedings of the 34th
International Conference on Machine Learning
1077:
622:
1717:
have good calibration but low accuracy, and large networks like
1706:
An alternative approach to probability calibration is to fit an
1292:{\displaystyle \mathrm {P} (y=1|x)={\frac {1}{1+\exp(Af(x)+B)}}}
373:
1714:
617:
612:
339:
1396:{\displaystyle y=1{\text{ iff }}P(y=1|x)>{\frac {1}{2}};}
1031:. For many problems, it is convenient to get a probability
1925:"Choosing multiple parameters for support vector machines"
1809:: probabilistic alternative to the support vector machine
905:
List of datasets in computer vision and image processing
1996:
1959:
Lin, Hsuan-Tien; Lin, Chih-Jen; Weng, Ruby C. (2007).
1898:
Predicting good probabilities with supervised learning
1894:
1783:
1763:
1731:
1570:
1491:
1409:
1336:
1209:
1146:
1087:
1037:
1895:Niculescu-Mizil, Alexandru; Caruana, Rich (2005).
1789:
1769:
1745:
1611:
1547:
1424:
1395:
1291:
1189:
1130:
1066:
2013:
1841:is arbitrarily chosen to be either zero, or one.
1548:{\displaystyle t_{+}={\frac {N_{+}+1}{N_{+}+2}}}
900:List of datasets for machine-learning research
933:
1890:
1888:
1886:
1999:"On Calibration of Modern Neural Networks"
1958:
1612:{\displaystyle t_{-}={\frac {1}{N_{-}+2}}}
964:is a way of transforming the outputs of a
940:
926:
1981:
1943:
1859:
1857:
1883:
1307:transformation of the classifier scores
1076:
1667:was later proposed that should be more
2014:
1854:
16:Machine learning calibration technique
1863:
970:probability distribution over classes
1873:Advances in Large Margin Classifiers
895:Glossary of artificial intelligence
13:
1663:to optimize the parameters, but a
1659:Platt himself suggested using the
1211:
14:
2038:
1081:Standard logistic function where
980:, replacing an earlier method by
988:model to a classifier's scores.
1190:{\displaystyle L=1,k=1,x_{0}=0}
1131:{\displaystyle L=1,k=1,x_{0}=0}
1990:
1952:
1916:
1820:
1374:
1367:
1354:
1283:
1274:
1268:
1259:
1235:
1228:
1215:
1061:
1054:
1041:
1016:, by predicting a class label
991:
315:Relevance vector machine (RVM)
1:
1848:
1777:is set to 1. After training,
1661:LevenbergâMarquardt algorithm
972:. The method was invented by
804:Computational learning theory
368:Expectationâmaximization (EM)
761:Coefficient of determination
608:Convolutional neural network
320:Support vector machine (SVM)
7:
1800:
1674:
912:Outline of machine learning
809:Empirical risk minimization
10:
2043:
2027:Statistical classification
549:Feedforward neural network
300:Artificial neural networks
1983:10.1007/s10994-007-5018-6
532:Artificial neural network
1813:
1807:Relevance vector machine
1482:to target probabilities
1470:to this set, a held-out
1425:{\displaystyle B\neq 0,}
1067:{\displaystyle P(y=1|x)}
996:Consider the problem of
841:Journals and conferences
788:Mathematical foundations
698:Temporal difference (TD)
554:Recurrent neural network
474:Conditional random field
397:Dimensionality reduction
145:Dimensionality reduction
107:Quantum machine learning
102:Neuromorphic engineering
62:Self-supervised learning
57:Semi-supervised learning
1945:10.1023/a:1012450327387
1910:10.1145/1102351.1102430
1685:naive Bayes classifiers
978:support vector machines
250:Apprenticeship learning
1791:
1771:
1747:
1697:multilayer perceptrons
1619:for negative samples,
1613:
1555:for positive samples (
1549:
1458:are estimated using a
1426:
1397:
1293:
1191:
1138:
1132:
1068:
799:Biasâvariance tradeoff
681:Reinforcement learning
657:Spiking neural network
67:Reinforcement learning
1792:
1772:
1748:
1614:
1550:
1427:
1398:
1294:
1192:
1133:
1080:
1069:
998:binary classification
635:Neural radiance field
457:Structured prediction
180:Structured prediction
52:Unsupervised learning
2022:Probabilistic models
1781:
1761:
1729:
1568:
1489:
1407:
1334:
1207:
1144:
1085:
1035:
966:classification model
824:Statistical learning
722:Learning with humans
514:Local outlier factor
1757:. During training,
1746:{\displaystyle 1/T}
1723:temperature scaling
1708:isotonic regression
1693:logistic regression
986:logistic regression
667:Electrochemical RAM
574:reservoir computing
305:Logistic regression
224:Supervised learning
210:Multimodal learning
185:Feature engineering
130:Generative modeling
92:Rule-based learning
87:Curriculum learning
47:Supervised learning
22:Part of a series on
2005:. PMLR: 1321â1330.
1787:
1767:
1753:before taking the
1743:
1669:numerically stable
1609:
1545:
1460:maximum likelihood
1422:
1393:
1289:
1187:
1139:
1128:
1064:
976:in the context of
235: •
150:Density estimation
1790:{\displaystyle T}
1770:{\displaystyle T}
1654:Laplace smoothing
1607:
1543:
1388:
1349:
1287:
962:Platt calibration
950:
949:
755:Model diagnostics
738:Human-in-the-loop
581:Boltzmann machine
494:Anomaly detection
290:Linear regression
205:Ontology learning
200:Grammar induction
175:Semantic analysis
170:Association rules
155:Anomaly detection
97:Neuro-symbolic AI
2034:
2007:
2006:
1994:
1988:
1987:
1985:
1969:Machine Learning
1965:
1956:
1950:
1949:
1947:
1932:Machine Learning
1929:
1920:
1914:
1913:
1903:
1892:
1881:
1880:
1861:
1842:
1840:
1830:. The label for
1824:
1796:
1794:
1793:
1788:
1776:
1774:
1773:
1768:
1752:
1750:
1749:
1744:
1739:
1683:models and even
1665:Newton algorithm
1647:
1638:
1625:
1618:
1616:
1615:
1610:
1608:
1606:
1599:
1598:
1585:
1580:
1579:
1561:
1554:
1552:
1551:
1546:
1544:
1542:
1535:
1534:
1524:
1517:
1516:
1506:
1501:
1500:
1481:
1476:cross-validation
1465:
1457:
1453:
1446:
1431:
1429:
1428:
1423:
1402:
1400:
1399:
1394:
1389:
1381:
1370:
1350:
1347:
1325:
1321:
1317:
1298:
1296:
1295:
1290:
1288:
1286:
1242:
1231:
1214:
1196:
1194:
1193:
1188:
1180:
1179:
1137:
1135:
1134:
1129:
1121:
1120:
1073:
1071:
1070:
1065:
1057:
1030:
1015:
1011:
1007:
1003:
954:machine learning
942:
935:
928:
889:Related articles
766:Confusion matrix
519:Isolation forest
464:Graphical models
243:
242:
195:Learning to rank
190:Feature learning
28:Machine learning
19:
18:
2042:
2041:
2037:
2036:
2035:
2033:
2032:
2031:
2012:
2011:
2010:
1995:
1991:
1963:
1957:
1953:
1927:
1921:
1917:
1901:
1893:
1884:
1862:
1855:
1851:
1846:
1845:
1831:
1825:
1821:
1816:
1803:
1782:
1779:
1778:
1762:
1759:
1758:
1735:
1730:
1727:
1726:
1691:models such as
1677:
1646:
1640:
1637:
1631:
1620:
1594:
1590:
1589:
1584:
1575:
1571:
1569:
1566:
1565:
1556:
1530:
1526:
1525:
1512:
1508:
1507:
1505:
1496:
1492:
1490:
1487:
1486:
1479:
1472:calibration set
1463:
1455:
1451:
1450:The parameters
1433:
1408:
1405:
1404:
1380:
1366:
1348: iff
1346:
1335:
1332:
1331:
1323:
1319:
1308:
1246:
1241:
1227:
1210:
1208:
1205:
1204:
1175:
1171:
1145:
1142:
1141:
1116:
1112:
1086:
1083:
1082:
1053:
1036:
1033:
1032:
1017:
1013:
1009:
1005:
1001:
994:
946:
917:
916:
890:
882:
881:
842:
834:
833:
794:Kernel machines
789:
781:
780:
756:
748:
747:
728:Active learning
723:
715:
714:
683:
673:
672:
598:Diffusion model
534:
524:
523:
496:
486:
485:
459:
449:
448:
404:Factor analysis
399:
389:
388:
372:
335:
325:
324:
245:
244:
228:
227:
226:
215:
214:
120:
112:
111:
77:Online learning
42:
30:
17:
12:
11:
5:
2040:
2030:
2029:
2024:
2009:
2008:
1989:
1976:(3): 267â276.
1951:
1915:
1882:
1852:
1850:
1847:
1844:
1843:
1818:
1817:
1815:
1812:
1811:
1810:
1802:
1799:
1786:
1766:
1742:
1738:
1734:
1701:random forests
1676:
1673:
1644:
1635:
1628:
1627:
1605:
1602:
1597:
1593:
1588:
1583:
1578:
1574:
1563:
1541:
1538:
1533:
1529:
1523:
1520:
1515:
1511:
1504:
1499:
1495:
1421:
1418:
1415:
1412:
1392:
1387:
1384:
1379:
1376:
1373:
1369:
1365:
1362:
1359:
1356:
1353:
1345:
1342:
1339:
1301:
1300:
1285:
1282:
1279:
1276:
1273:
1270:
1267:
1264:
1261:
1258:
1255:
1252:
1249:
1245:
1240:
1237:
1234:
1230:
1226:
1223:
1220:
1217:
1213:
1186:
1183:
1178:
1174:
1170:
1167:
1164:
1161:
1158:
1155:
1152:
1149:
1127:
1124:
1119:
1115:
1111:
1108:
1105:
1102:
1099:
1096:
1093:
1090:
1063:
1060:
1056:
1052:
1049:
1046:
1043:
1040:
993:
990:
948:
947:
945:
944:
937:
930:
922:
919:
918:
915:
914:
909:
908:
907:
897:
891:
888:
887:
884:
883:
880:
879:
874:
869:
864:
859:
854:
849:
843:
840:
839:
836:
835:
832:
831:
826:
821:
816:
814:Occam learning
811:
806:
801:
796:
790:
787:
786:
783:
782:
779:
778:
773:
771:Learning curve
768:
763:
757:
754:
753:
750:
749:
746:
745:
740:
735:
730:
724:
721:
720:
717:
716:
713:
712:
711:
710:
700:
695:
690:
684:
679:
678:
675:
674:
671:
670:
664:
659:
654:
649:
648:
647:
637:
632:
631:
630:
625:
620:
615:
605:
600:
595:
590:
589:
588:
578:
577:
576:
571:
566:
561:
551:
546:
541:
535:
530:
529:
526:
525:
522:
521:
516:
511:
503:
497:
492:
491:
488:
487:
484:
483:
482:
481:
476:
471:
460:
455:
454:
451:
450:
447:
446:
441:
436:
431:
426:
421:
416:
411:
406:
400:
395:
394:
391:
390:
387:
386:
381:
376:
370:
365:
360:
352:
347:
342:
336:
331:
330:
327:
326:
323:
322:
317:
312:
307:
302:
297:
292:
287:
279:
278:
277:
272:
267:
257:
255:Decision trees
252:
246:
232:classification
222:
221:
220:
217:
216:
213:
212:
207:
202:
197:
192:
187:
182:
177:
172:
167:
162:
157:
152:
147:
142:
137:
132:
127:
125:Classification
121:
118:
117:
114:
113:
110:
109:
104:
99:
94:
89:
84:
82:Batch learning
79:
74:
69:
64:
59:
54:
49:
43:
40:
39:
36:
35:
24:
23:
15:
9:
6:
4:
3:
2:
2039:
2028:
2025:
2023:
2020:
2019:
2017:
2004:
2000:
1993:
1984:
1979:
1975:
1971:
1970:
1962:
1955:
1946:
1941:
1937:
1933:
1926:
1919:
1911:
1907:
1900:
1899:
1891:
1889:
1887:
1878:
1874:
1870:
1866:
1860:
1858:
1853:
1838:
1834:
1829:
1828:sign function
1823:
1819:
1808:
1805:
1804:
1798:
1784:
1764:
1756:
1740:
1736:
1732:
1724:
1720:
1716:
1711:
1709:
1704:
1702:
1698:
1694:
1690:
1686:
1682:
1672:
1670:
1666:
1662:
1657:
1655:
1651:
1643:
1634:
1623:
1603:
1600:
1595:
1591:
1586:
1581:
1576:
1572:
1564:
1559:
1539:
1536:
1531:
1527:
1521:
1518:
1513:
1509:
1502:
1497:
1493:
1485:
1484:
1483:
1477:
1473:
1469:
1461:
1448:
1444:
1440:
1436:
1419:
1416:
1413:
1410:
1390:
1385:
1382:
1377:
1371:
1363:
1360:
1357:
1351:
1343:
1340:
1337:
1329:
1315:
1311:
1306:
1280:
1277:
1271:
1265:
1262:
1256:
1253:
1250:
1247:
1243:
1238:
1232:
1224:
1221:
1218:
1203:
1202:
1201:
1198:
1184:
1181:
1176:
1172:
1168:
1165:
1162:
1159:
1156:
1153:
1150:
1147:
1125:
1122:
1117:
1113:
1109:
1106:
1103:
1100:
1097:
1094:
1091:
1088:
1079:
1075:
1058:
1050:
1047:
1044:
1038:
1028:
1024:
1020:
1000:: for inputs
999:
989:
987:
983:
979:
975:
971:
967:
963:
959:
958:Platt scaling
955:
943:
938:
936:
931:
929:
924:
923:
921:
920:
913:
910:
906:
903:
902:
901:
898:
896:
893:
892:
886:
885:
878:
875:
873:
870:
868:
865:
863:
860:
858:
855:
853:
850:
848:
845:
844:
838:
837:
830:
827:
825:
822:
820:
817:
815:
812:
810:
807:
805:
802:
800:
797:
795:
792:
791:
785:
784:
777:
774:
772:
769:
767:
764:
762:
759:
758:
752:
751:
744:
741:
739:
736:
734:
733:Crowdsourcing
731:
729:
726:
725:
719:
718:
709:
706:
705:
704:
701:
699:
696:
694:
691:
689:
686:
685:
682:
677:
676:
668:
665:
663:
662:Memtransistor
660:
658:
655:
653:
650:
646:
643:
642:
641:
638:
636:
633:
629:
626:
624:
621:
619:
616:
614:
611:
610:
609:
606:
604:
601:
599:
596:
594:
591:
587:
584:
583:
582:
579:
575:
572:
570:
567:
565:
562:
560:
557:
556:
555:
552:
550:
547:
545:
544:Deep learning
542:
540:
537:
536:
533:
528:
527:
520:
517:
515:
512:
510:
508:
504:
502:
499:
498:
495:
490:
489:
480:
479:Hidden Markov
477:
475:
472:
470:
467:
466:
465:
462:
461:
458:
453:
452:
445:
442:
440:
437:
435:
432:
430:
427:
425:
422:
420:
417:
415:
412:
410:
407:
405:
402:
401:
398:
393:
392:
385:
382:
380:
377:
375:
371:
369:
366:
364:
361:
359:
357:
353:
351:
348:
346:
343:
341:
338:
337:
334:
329:
328:
321:
318:
316:
313:
311:
308:
306:
303:
301:
298:
296:
293:
291:
288:
286:
284:
280:
276:
275:Random forest
273:
271:
268:
266:
263:
262:
261:
258:
256:
253:
251:
248:
247:
240:
239:
234:
233:
225:
219:
218:
211:
208:
206:
203:
201:
198:
196:
193:
191:
188:
186:
183:
181:
178:
176:
173:
171:
168:
166:
163:
161:
160:Data cleaning
158:
156:
153:
151:
148:
146:
143:
141:
138:
136:
133:
131:
128:
126:
123:
122:
116:
115:
108:
105:
103:
100:
98:
95:
93:
90:
88:
85:
83:
80:
78:
75:
73:
72:Meta-learning
70:
68:
65:
63:
60:
58:
55:
53:
50:
48:
45:
44:
38:
37:
34:
29:
26:
25:
21:
20:
2002:
1992:
1973:
1967:
1954:
1935:
1931:
1918:
1897:
1876:
1872:
1836:
1832:
1822:
1722:
1712:
1705:
1678:
1658:
1641:
1632:
1629:
1621:
1557:
1449:
1442:
1438:
1434:
1313:
1309:
1302:
1199:
1140:
1026:
1022:
1018:
995:
961:
957:
951:
819:PAC learning
506:
355:
350:Hierarchical
282:
236:
230:
1938:: 131â159.
1879:(3): 61â74.
1865:Platt, John
1650:Bayes' rule
1468:overfitting
1466:. To avoid
992:Description
703:Multi-agent
640:Transformer
539:Autoencoder
295:Naive Bayes
33:data mining
2016:Categories
1849:References
1689:calibrated
974:John Platt
688:Q-learning
586:Restricted
384:Mean shift
333:Clustering
310:Perceptron
238:regression
140:Clustering
135:Regression
1596:−
1577:−
1414:≠
1257:
847:ECML PKDD
829:VC theory
776:ROC curve
708:Self-play
628:DeepDream
469:Bayes net
260:Ensembles
41:Paradigms
1904:. ICML.
1867:(1999).
1801:See also
1675:Analysis
1326:are two
1318:, where
1305:logistic
1303:i.e., a
270:Boosting
119:Problems
1755:softmax
1715:LeNet-5
1681:boosted
1437:= sign(
1021:= sign(
968:into a
852:NeurIPS
669:(ECRAM)
623:AlexNet
265:Bagging
1719:ResNet
1699:, and
1630:Here,
1562:), and
1328:scalar
982:Vapnik
645:Vision
501:RANSAC
379:OPTICS
374:DBSCAN
358:-means
165:AutoML
1964:(PDF)
1928:(PDF)
1902:(PDF)
1839:) = 0
1814:Notes
867:IJCAI
693:SARSA
652:Mamba
618:LeNet
613:U-Net
439:t-SNE
363:Fuzzy
340:BIRCH
1826:See
1639:and
1624:= -1
1454:and
1378:>
1322:and
1008:and
877:JMLR
862:ICLR
857:ICML
743:RLHF
559:LSTM
345:CURE
31:and
1978:doi
1940:doi
1906:doi
1560:= 1
1474:or
1403:if
1254:exp
960:or
952:In
603:SOM
593:GAN
569:ESN
564:GRU
509:-NN
444:SDL
434:PGD
429:PCA
424:NMF
419:LDA
414:ICA
409:CCA
285:-NN
2018::
2001:.
1974:68
1972:.
1966:.
1936:46
1934:.
1930:.
1885:^
1877:10
1875:.
1871:.
1856:^
1703:.
1695:,
1671:.
1656:.
1447:.
1445:))
1197:.
1029:))
1010:â1
1006:+1
956:,
872:ML
1986:.
1980::
1948:.
1942::
1912:.
1908::
1837:x
1835:(
1833:f
1785:T
1765:T
1741:T
1737:/
1733:1
1645:â
1642:N
1636:+
1633:N
1626:.
1622:y
1604:2
1601:+
1592:N
1587:1
1582:=
1573:t
1558:y
1540:2
1537:+
1532:+
1528:N
1522:1
1519:+
1514:+
1510:N
1503:=
1498:+
1494:t
1480:y
1464:f
1456:B
1452:A
1443:x
1441:(
1439:f
1435:y
1420:,
1417:0
1411:B
1391:;
1386:2
1383:1
1375:)
1372:x
1368:|
1364:1
1361:=
1358:y
1355:(
1352:P
1344:1
1341:=
1338:y
1324:B
1320:A
1316:)
1314:x
1312:(
1310:f
1299:,
1284:)
1281:B
1278:+
1275:)
1272:x
1269:(
1266:f
1263:A
1260:(
1251:+
1248:1
1244:1
1239:=
1236:)
1233:x
1229:|
1225:1
1222:=
1219:y
1216:(
1212:P
1185:0
1182:=
1177:0
1173:x
1169:,
1166:1
1163:=
1160:k
1157:,
1154:1
1151:=
1148:L
1126:0
1123:=
1118:0
1114:x
1110:,
1107:1
1104:=
1101:k
1098:,
1095:1
1092:=
1089:L
1062:)
1059:x
1055:|
1051:1
1048:=
1045:y
1042:(
1039:P
1027:x
1025:(
1023:f
1019:y
1014:f
1002:x
941:e
934:t
927:v
507:k
356:k
283:k
241:)
229:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.