Platt scaling - Knowledge

1078: 1687:, which produce distorted probability distributions. It is particularly effective for max-margin methods such as SVMs and boosted trees, which show sigmoidal distortions in their predicted probabilities, but has less of an effect with well- 1297: 1401: 1074:, i.e. a classification that not only gives an answer, but also a degree of certainty about the answer. Some classification models do not provide such a probability, or give poor probability estimates. 1553: 1617: 1652:

to a model of out-of-sample data that has a uniform prior over the labels. The constants 1 and 2, on the numerator and denominator respectively, are derived from the application of

1195: 1136: 1430: 1072: 866: 904: 1751: 1795: 1775: 1710:

model to an ill-calibrated probability model. This has been shown to work better than Platt scaling, in particular when enough training data is available.

861: 851: 692: 899: 856: 707: 438: 1713:

Platt scaling can also be applied to deep neural network classifiers. For image classification, such as CIFAR-100, small networks like

939: 742: 1206: 1333: 818: 367: 876: 639: 174: 2026: 894: 1664: 727: 702: 651: 1488: 1660: 775: 770: 423: 433: 71: 1864: 1679:

Platt scaling has been shown to be effective for SVMs as well as other types of classification models, including

973: 1567: 932: 828: 592: 413: 2021: 1475: 1459: 803: 505: 281: 1896: 969: 760: 697: 607: 585: 428: 418: 1680: 911: 823: 808: 269: 91: 1968: 1648:

are the number of positive and negative samples, respectively. This transformation follows by applying

965: 871: 798: 548: 443: 231: 164: 124: 1869:"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods" 1200:

Platt scaling is an algorithm to solve the aforementioned problem. It produces probability estimates

1143: 1084: 925: 531: 299: 169: 1806: 1688: 553: 473: 396: 314: 144: 106: 101: 61: 56: 1718: 1330:

parameters that are learned by the algorithm. Note that predictions can now be made according to

977: 500: 349: 249: 76: 1684: 680: 656: 558: 319: 294: 254: 66: 1922: 1406: 1034: 1696: 997: 634: 456: 408: 264: 179: 51: 1327: 563: 513: 1960: 8: 1728: 1707: 1692: 1668: 985: 666: 602: 573: 478: 304: 237: 223: 209: 184: 134: 86: 46: 1780: 1760: 1653: 644: 568: 354: 149: 1004:, we want to determine whether they belong to one of two classes, arbitrarily labeled 984:, but can be applied to other classification models. Platt scaling works by fitting a 1432:

the probability estimates contain a correction compared to the old decision function

1304: 1012:. We assume that the classification problem will be solved by a real-valued function 737: 580: 493: 289: 259: 204: 199: 154: 96: 1977: 1939: 1905: 1868: 1754: 1462:

method that optimizes on the same training set as that for the original classifier

953: 765: 518: 468: 378: 362: 332: 194: 189: 139: 129: 27: 981: 793: 597: 463: 403: 1998: 1924: 1649: 1471: 813: 344: 81: 1982: 1944: 2015: 1923:

Olivier Chapelle; Vladimir Vapnik; Olivier Bousquet; Sayan Mukherjee (2002).

1827: 1797:

is optimized on a held-out calibration set to minimize the calibration loss.

1721:

has high accuracy but is overconfident in predictions. A 2017 paper proposed

1700: 732: 661: 543: 274: 159: 1909: 1467: 538: 32: 1997:

Guo, Chuan; Pleiss, Geoff; Sun, Yu; Weinberger, Kilian Q. (2017-07-17).

1725:, which simply multiplies the output logits of a network by a constant 687: 383: 309: 1961:"A note on Platt's probabilistic outputs for support vector machines" 1478:

can be used, but Platt additionally suggests transforming the labels

846: 627: 2003:

Proceedings of the 34th International Conference on Machine Learning

1077: 622: 1717:

have good calibration but low accuracy, and large networks like

1706:

An alternative approach to probability calibration is to fit an

1292:{\displaystyle \mathrm {P} (y=1|x)={\frac {1}{1+\exp(Af(x)+B)}}} 373: 1714: 617: 612: 339: 1396:{\displaystyle y=1{\text{ iff }}P(y=1|x)>{\frac {1}{2}};} 1031:. For many problems, it is convenient to get a probability 1925:"Choosing multiple parameters for support vector machines" 1809:: probabilistic alternative to the support vector machine 905:

List of datasets in computer vision and image processing

1996: 1959:

Lin, Hsuan-Tien; Lin, Chih-Jen; Weng, Ruby C. (2007).

1898:

Predicting good probabilities with supervised learning

1894: 1783: 1763: 1731: 1570: 1491: 1409: 1336: 1209: 1146: 1087: 1037: 1895:Niculescu-Mizil, Alexandru; Caruana, Rich (2005). 1789: 1769: 1745: 1611: 1547: 1424: 1395: 1291: 1189: 1130: 1066: 2013: 1841:is arbitrarily chosen to be either zero, or one. 1548:{\displaystyle t_{+}={\frac {N_{+}+1}{N_{+}+2}}} 900:List of datasets for machine-learning research 933: 1890: 1888: 1886: 1999:"On Calibration of Modern Neural Networks" 1958: 1612:{\displaystyle t_{-}={\frac {1}{N_{-}+2}}} 964:is a way of transforming the outputs of a 940: 926: 1981: 1943: 1859: 1857: 1883: 1307:transformation of the classifier scores 1076: 1667:was later proposed that should be more 2014: 1854: 16:Machine learning calibration technique 1863: 970:probability distribution over classes 1873:Advances in Large Margin Classifiers 895:Glossary of artificial intelligence 13: 1663:to optimize the parameters, but a 1659:Platt himself suggested using the 1211: 14: 2038: 1081:Standard logistic function where 980:, replacing an earlier method by 988:model to a classifier's scores. 1190:{\displaystyle L=1,k=1,x_{0}=0} 1131:{\displaystyle L=1,k=1,x_{0}=0} 1990: 1952: 1916: 1820: 1374: 1367: 1354: 1283: 1274: 1268: 1259: 1235: 1228: 1215: 1061: 1054: 1041: 1016:, by predicting a class label 991: 315:Relevance vector machine (RVM) 1: 1848: 1777:is set to 1. After training, 1661:Levenberg–Marquardt algorithm 972:. The method was invented by 804:Computational learning theory 368:Expectation–maximization (EM) 761:Coefficient of determination 608:Convolutional neural network 320:Support vector machine (SVM) 7: 1800: 1674: 912:Outline of machine learning 809:Empirical risk minimization 10: 2043: 2027:Statistical classification 549:Feedforward neural network 300:Artificial neural networks 1983:10.1007/s10994-007-5018-6 532:Artificial neural network 1813: 1807:Relevance vector machine 1482:to target probabilities 1470:to this set, a held-out 1425:{\displaystyle B\neq 0,} 1067:{\displaystyle P(y=1|x)} 996:Consider the problem of 841:Journals and conferences 788:Mathematical foundations 698:Temporal difference (TD) 554:Recurrent neural network 474:Conditional random field 397:Dimensionality reduction 145:Dimensionality reduction 107:Quantum machine learning 102:Neuromorphic engineering 62:Self-supervised learning 57:Semi-supervised learning 1945:10.1023/a:1012450327387 1910:10.1145/1102351.1102430 1685:naive Bayes classifiers 978:support vector machines 250:Apprenticeship learning 1791: 1771: 1747: 1697:multilayer perceptrons 1619:for negative samples, 1613: 1555:for positive samples ( 1549: 1458:are estimated using a 1426: 1397: 1293: 1191: 1138: 1132: 1068: 799:Bias–variance tradeoff 681:Reinforcement learning 657:Spiking neural network 67:Reinforcement learning 1792: 1772: 1748: 1614: 1550: 1427: 1398: 1294: 1192: 1133: 1080: 1069: 998:binary classification 635:Neural radiance field 457:Structured prediction 180:Structured prediction 52:Unsupervised learning 2022:Probabilistic models 1781: 1761: 1729: 1568: 1489: 1407: 1334: 1207: 1144: 1085: 1035: 966:classification model 824:Statistical learning 722:Learning with humans 514:Local outlier factor 1757:. During training, 1746:{\displaystyle 1/T} 1723:temperature scaling 1708:isotonic regression 1693:logistic regression 986:logistic regression 667:Electrochemical RAM 574:reservoir computing 305:Logistic regression 224:Supervised learning 210:Multimodal learning 185:Feature engineering 130:Generative modeling 92:Rule-based learning 87:Curriculum learning 47:Supervised learning 22:Part of a series on 2005:. PMLR: 1321–1330. 1787: 1767: 1753:before taking the 1743: 1669:numerically stable 1609: 1545: 1460:maximum likelihood 1422: 1393: 1289: 1187: 1139: 1128: 1064: 976:in the context of 235: • 150:Density estimation 1790:{\displaystyle T} 1770:{\displaystyle T} 1654:Laplace smoothing 1607: 1543: 1388: 1349: 1287: 962:Platt calibration 950: 949: 755:Model diagnostics 738:Human-in-the-loop 581:Boltzmann machine 494:Anomaly detection 290:Linear regression 205:Ontology learning 200:Grammar induction 175:Semantic analysis 170:Association rules 155:Anomaly detection 97:Neuro-symbolic AI 2034: 2007: 2006: 1994: 1988: 1987: 1985: 1969:Machine Learning 1965: 1956: 1950: 1949: 1947: 1932:Machine Learning 1929: 1920: 1914: 1913: 1903: 1892: 1881: 1880: 1861: 1842: 1840: 1830:. The label for 1824: 1796: 1794: 1793: 1788: 1776: 1774: 1773: 1768: 1752: 1750: 1749: 1744: 1739: 1683:models and even 1665:Newton algorithm 1647: 1638: 1625: 1618: 1616: 1615: 1610: 1608: 1606: 1599: 1598: 1585: 1580: 1579: 1561: 1554: 1552: 1551: 1546: 1544: 1542: 1535: 1534: 1524: 1517: 1516: 1506: 1501: 1500: 1481: 1476:cross-validation 1465: 1457: 1453: 1446: 1431: 1429: 1428: 1423: 1402: 1400: 1399: 1394: 1389: 1381: 1370: 1350: 1347: 1325: 1321: 1317: 1298: 1296: 1295: 1290: 1288: 1286: 1242: 1231: 1214: 1196: 1194: 1193: 1188: 1180: 1179: 1137: 1135: 1134: 1129: 1121: 1120: 1073: 1071: 1070: 1065: 1057: 1030: 1015: 1011: 1007: 1003: 954:machine learning 942: 935: 928: 889:Related articles 766:Confusion matrix 519:Isolation forest 464:Graphical models 243: 242: 195:Learning to rank 190:Feature learning 28:Machine learning 19: 18: 2042: 2041: 2037: 2036: 2035: 2033: 2032: 2031: 2012: 2011: 2010: 1995: 1991: 1963: 1957: 1953: 1927: 1921: 1917: 1901: 1893: 1884: 1862: 1855: 1851: 1846: 1845: 1831: 1825: 1821: 1816: 1803: 1782: 1779: 1778: 1762: 1759: 1758: 1735: 1730: 1727: 1726: 1691:models such as 1677: 1646: 1640: 1637: 1631: 1620: 1594: 1590: 1589: 1584: 1575: 1571: 1569: 1566: 1565: 1556: 1530: 1526: 1525: 1512: 1508: 1507: 1505: 1496: 1492: 1490: 1487: 1486: 1479: 1472:calibration set 1463: 1455: 1451: 1450:The parameters 1433: 1408: 1405: 1404: 1380: 1366: 1348: iff 1346: 1335: 1332: 1331: 1323: 1319: 1308: 1246: 1241: 1227: 1210: 1208: 1205: 1204: 1175: 1171: 1145: 1142: 1141: 1116: 1112: 1086: 1083: 1082: 1053: 1036: 1033: 1032: 1017: 1013: 1009: 1005: 1001: 994: 946: 917: 916: 890: 882: 881: 842: 834: 833: 794:Kernel machines 789: 781: 780: 756: 748: 747: 728:Active learning 723: 715: 714: 683: 673: 672: 598:Diffusion model 534: 524: 523: 496: 486: 485: 459: 449: 448: 404:Factor analysis 399: 389: 388: 372: 335: 325: 324: 245: 244: 228: 227: 226: 215: 214: 120: 112: 111: 77:Online learning 42: 30: 17: 12: 11: 5: 2040: 2030: 2029: 2024: 2009: 2008: 1989: 1976:(3): 267–276. 1951: 1915: 1882: 1852: 1850: 1847: 1844: 1843: 1818: 1817: 1815: 1812: 1811: 1810: 1802: 1799: 1786: 1766: 1742: 1738: 1734: 1701:random forests 1676: 1673: 1644: 1635: 1628: 1627: 1605: 1602: 1597: 1593: 1588: 1583: 1578: 1574: 1563: 1541: 1538: 1533: 1529: 1523: 1520: 1515: 1511: 1504: 1499: 1495: 1421: 1418: 1415: 1412: 1392: 1387: 1384: 1379: 1376: 1373: 1369: 1365: 1362: 1359: 1356: 1353: 1345: 1342: 1339: 1301: 1300: 1285: 1282: 1279: 1276: 1273: 1270: 1267: 1264: 1261: 1258: 1255: 1252: 1249: 1245: 1240: 1237: 1234: 1230: 1226: 1223: 1220: 1217: 1213: 1186: 1183: 1178: 1174: 1170: 1167: 1164: 1161: 1158: 1155: 1152: 1149: 1127: 1124: 1119: 1115: 1111: 1108: 1105: 1102: 1099: 1096: 1093: 1090: 1063: 1060: 1056: 1052: 1049: 1046: 1043: 1040: 993: 990: 948: 947: 945: 944: 937: 930: 922: 919: 918: 915: 914: 909: 908: 907: 897: 891: 888: 887: 884: 883: 880: 879: 874: 869: 864: 859: 854: 849: 843: 840: 839: 836: 835: 832: 831: 826: 821: 816: 814:Occam learning 811: 806: 801: 796: 790: 787: 786: 783: 782: 779: 778: 773: 771:Learning curve 768: 763: 757: 754: 753: 750: 749: 746: 745: 740: 735: 730: 724: 721: 720: 717: 716: 713: 712: 711: 710: 700: 695: 690: 684: 679: 678: 675: 674: 671: 670: 664: 659: 654: 649: 648: 647: 637: 632: 631: 630: 625: 620: 615: 605: 600: 595: 590: 589: 588: 578: 577: 576: 571: 566: 561: 551: 546: 541: 535: 530: 529: 526: 525: 522: 521: 516: 511: 503: 497: 492: 491: 488: 487: 484: 483: 482: 481: 476: 471: 460: 455: 454: 451: 450: 447: 446: 441: 436: 431: 426: 421: 416: 411: 406: 400: 395: 394: 391: 390: 387: 386: 381: 376: 370: 365: 360: 352: 347: 342: 336: 331: 330: 327: 326: 323: 322: 317: 312: 307: 302: 297: 292: 287: 279: 278: 277: 272: 267: 257: 255:Decision trees 252: 246: 232:classification 222: 221: 220: 217: 216: 213: 212: 207: 202: 197: 192: 187: 182: 177: 172: 167: 162: 157: 152: 147: 142: 137: 132: 127: 125:Classification 121: 118: 117: 114: 113: 110: 109: 104: 99: 94: 89: 84: 82:Batch learning 79: 74: 69: 64: 59: 54: 49: 43: 40: 39: 36: 35: 24: 23: 15: 9: 6: 4: 3: 2: 2039: 2028: 2025: 2023: 2020: 2019: 2017: 2004: 2000: 1993: 1984: 1979: 1975: 1971: 1970: 1962: 1955: 1946: 1941: 1937: 1933: 1926: 1919: 1911: 1907: 1900: 1899: 1891: 1889: 1887: 1878: 1874: 1870: 1866: 1860: 1858: 1853: 1838: 1834: 1829: 1828:sign function 1823: 1819: 1808: 1805: 1804: 1798: 1784: 1764: 1756: 1740: 1736: 1732: 1724: 1720: 1716: 1711: 1709: 1704: 1702: 1698: 1694: 1690: 1686: 1682: 1672: 1670: 1666: 1662: 1657: 1655: 1651: 1643: 1634: 1623: 1603: 1600: 1595: 1591: 1586: 1581: 1576: 1572: 1564: 1559: 1539: 1536: 1531: 1527: 1521: 1518: 1513: 1509: 1502: 1497: 1493: 1485: 1484: 1483: 1477: 1473: 1469: 1461: 1448: 1444: 1440: 1436: 1419: 1416: 1413: 1410: 1390: 1385: 1382: 1377: 1371: 1363: 1360: 1357: 1351: 1343: 1340: 1337: 1329: 1315: 1311: 1306: 1280: 1277: 1271: 1265: 1262: 1256: 1253: 1250: 1247: 1243: 1238: 1232: 1224: 1221: 1218: 1203: 1202: 1201: 1198: 1184: 1181: 1176: 1172: 1168: 1165: 1162: 1159: 1156: 1153: 1150: 1147: 1125: 1122: 1117: 1113: 1109: 1106: 1103: 1100: 1097: 1094: 1091: 1088: 1079: 1075: 1058: 1050: 1047: 1044: 1038: 1028: 1024: 1020: 1000:: for inputs 999: 989: 987: 983: 979: 975: 971: 967: 963: 959: 958:Platt scaling 955: 943: 938: 936: 931: 929: 924: 923: 921: 920: 913: 910: 906: 903: 902: 901: 898: 896: 893: 892: 886: 885: 878: 875: 873: 870: 868: 865: 863: 860: 858: 855: 853: 850: 848: 845: 844: 838: 837: 830: 827: 825: 822: 820: 817: 815: 812: 810: 807: 805: 802: 800: 797: 795: 792: 791: 785: 784: 777: 774: 772: 769: 767: 764: 762: 759: 758: 752: 751: 744: 741: 739: 736: 734: 733:Crowdsourcing 731: 729: 726: 725: 719: 718: 709: 706: 705: 704: 701: 699: 696: 694: 691: 689: 686: 685: 682: 677: 676: 668: 665: 663: 662:Memtransistor 660: 658: 655: 653: 650: 646: 643: 642: 641: 638: 636: 633: 629: 626: 624: 621: 619: 616: 614: 611: 610: 609: 606: 604: 601: 599: 596: 594: 591: 587: 584: 583: 582: 579: 575: 572: 570: 567: 565: 562: 560: 557: 556: 555: 552: 550: 547: 545: 544:Deep learning 542: 540: 537: 536: 533: 528: 527: 520: 517: 515: 512: 510: 508: 504: 502: 499: 498: 495: 490: 489: 480: 479:Hidden Markov 477: 475: 472: 470: 467: 466: 465: 462: 461: 458: 453: 452: 445: 442: 440: 437: 435: 432: 430: 427: 425: 422: 420: 417: 415: 412: 410: 407: 405: 402: 401: 398: 393: 392: 385: 382: 380: 377: 375: 371: 369: 366: 364: 361: 359: 357: 353: 351: 348: 346: 343: 341: 338: 337: 334: 329: 328: 321: 318: 316: 313: 311: 308: 306: 303: 301: 298: 296: 293: 291: 288: 286: 284: 280: 276: 275:Random forest 273: 271: 268: 266: 263: 262: 261: 258: 256: 253: 251: 248: 247: 240: 239: 234: 233: 225: 219: 218: 211: 208: 206: 203: 201: 198: 196: 193: 191: 188: 186: 183: 181: 178: 176: 173: 171: 168: 166: 163: 161: 160:Data cleaning 158: 156: 153: 151: 148: 146: 143: 141: 138: 136: 133: 131: 128: 126: 123: 122: 116: 115: 108: 105: 103: 100: 98: 95: 93: 90: 88: 85: 83: 80: 78: 75: 73: 72:Meta-learning 70: 68: 65: 63: 60: 58: 55: 53: 50: 48: 45: 44: 38: 37: 34: 29: 26: 25: 21: 20: 2002: 1992: 1973: 1967: 1954: 1935: 1931: 1918: 1897: 1876: 1872: 1836: 1832: 1822: 1722: 1712: 1705: 1678: 1658: 1641: 1632: 1629: 1621: 1557: 1449: 1442: 1438: 1434: 1313: 1309: 1302: 1199: 1140: 1026: 1022: 1018: 995: 961: 957: 951: 819:PAC learning 506: 355: 350:Hierarchical 282: 236: 230: 1938:: 131–159. 1879:(3): 61–74. 1865:Platt, John 1650:Bayes' rule 1468:overfitting 1466:. To avoid 992:Description 703:Multi-agent 640:Transformer 539:Autoencoder 295:Naive Bayes 33:data mining 2016:Categories 1849:References 1689:calibrated 974:John Platt 688:Q-learning 586:Restricted 384:Mean shift 333:Clustering 310:Perceptron 238:regression 140:Clustering 135:Regression 1596:− 1577:− 1414:≠ 1257:⁡ 847:ECML PKDD 829:VC theory 776:ROC curve 708:Self-play 628:DeepDream 469:Bayes net 260:Ensembles 41:Paradigms 1904:. ICML. 1867:(1999). 1801:See also 1675:Analysis 1326:are two 1318:, where 1305:logistic 1303:i.e., a 270:Boosting 119:Problems 1755:softmax 1715:LeNet-5 1681:boosted 1437:= sign( 1021:= sign( 968:into a 852:NeurIPS 669:(ECRAM) 623:AlexNet 265:Bagging 1719:ResNet 1699:, and 1630:Here, 1562:), and 1328:scalar 982:Vapnik 645:Vision 501:RANSAC 379:OPTICS 374:DBSCAN 358:-means 165:AutoML 1964:(PDF) 1928:(PDF) 1902:(PDF) 1839:) = 0 1814:Notes 867:IJCAI 693:SARSA 652:Mamba 618:LeNet 613:U-Net 439:t-SNE 363:Fuzzy 340:BIRCH 1826:See 1639:and 1624:= -1 1454:and 1378:> 1322:and 1008:and 877:JMLR 862:ICLR 857:ICML 743:RLHF 559:LSTM 345:CURE 31:and 1978:doi 1940:doi 1906:doi 1560:= 1 1474:or 1403:if 1254:exp 960:or 952:In 603:SOM 593:GAN 569:ESN 564:GRU 509:-NN 444:SDL 434:PGD 429:PCA 424:NMF 419:LDA 414:ICA 409:CCA 285:-NN 2018:: 2001:. 1974:68 1972:. 1966:. 1936:46 1934:. 1930:. 1885:^ 1877:10 1875:. 1871:. 1856:^ 1703:. 1695:, 1671:. 1656:. 1447:. 1445:)) 1197:. 1029:)) 1010:−1 1006:+1 956:, 872:ML 1986:. 1980:: 1948:. 1942:: 1912:. 1908:: 1837:x 1835:( 1833:f 1785:T 1765:T 1741:T 1737:/ 1733:1 1645:− 1642:N 1636:+ 1633:N 1626:. 1622:y 1604:2 1601:+ 1592:N 1587:1 1582:= 1573:t 1558:y 1540:2 1537:+ 1532:+ 1528:N 1522:1 1519:+ 1514:+ 1510:N 1503:= 1498:+ 1494:t 1480:y 1464:f 1456:B 1452:A 1443:x 1441:( 1439:f 1435:y 1420:, 1417:0 1411:B 1391:; 1386:2 1383:1 1375:) 1372:x 1368:| 1364:1 1361:= 1358:y 1355:( 1352:P 1344:1 1341:= 1338:y 1324:B 1320:A 1316:) 1314:x 1312:( 1310:f 1299:, 1284:) 1281:B 1278:+ 1275:) 1272:x 1269:( 1266:f 1263:A 1260:( 1251:+ 1248:1 1244:1 1239:= 1236:) 1233:x 1229:| 1225:1 1222:= 1219:y 1216:( 1212:P 1185:0 1182:= 1177:0 1173:x 1169:, 1166:1 1163:= 1160:k 1157:, 1154:1 1151:= 1148:L 1126:0 1123:= 1118:0 1114:x 1110:, 1107:1 1104:= 1101:k 1098:, 1095:1 1092:= 1089:L 1062:) 1059:x 1055:| 1051:1 1048:= 1045:y 1042:( 1039:P 1027:x 1025:( 1023:f 1019:y 1014:f 1002:x 941:e 934:t 927:v 507:k 356:k 283:k 241:) 229:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index