Knowledge

Optimal matching

Source 📝

1895:
Although optimal matching techniques are widely used in sociology and demography, such techniques also have their flaws. As was pointed out by several authors (for example L. L. Wu), the main problem in the application of optimal matching is to appropriately define the costs
854: 454: 41:, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations (e.g. individuals in a 609: 1597: 1257: 249: 919: 144: 1874: 1369: 1095: 674: 278: 1756: 1297: 1701: 195: 1796: 669: 1930: 461: 1652: 1625: 1443: 1416: 1169: 1142: 1031: 1004: 973: 946: 639: 171: 1449: 1389: 1115: 273: 49:) can be used. The method was tailored to social sciences from a technique originally introduced to study molecular biology (protein or genetic) sequences (see 1883:
Considering a set composed of only the three basic operations described above, this proximity measure satisfies the triangular inequality.
1174: 1371:, that represents the total cost of the transformation. One should consider at this point that there might exist different such sequences 207: 2003: 34: 866: 68: 1805: 1302: 1036: 2043: 849:{\displaystyle a_{s_{1},s'_{1}}^{\rm {Sub}}(s_{1},s_{2},s_{3},\ldots s_{T})=(s'_{1},s_{2},s_{3},\ldots s_{T})} 449:{\displaystyle a_{s'}^{\rm {Ins}}(s_{1},s_{2},s_{3},\ldots s_{T})=(s_{1},s_{2},s_{3},\ldots ,s',\ldots s_{T})} 251:. In the most simple approach, a set composed of only three basic operations to transform sequences is used: 2038: 20: 1960:-package for analyzing and visualizing states and events sequences, including optimal matching analysis. 54: 1944:
is a powerful program, offering access to some of the latest developments in transition data analysis.
1706: 1262: 1657: 176: 2001:
Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect"
1957: 1761: 604:{\displaystyle a_{s_{2}}^{\rm {Del}}(s_{1},s_{2},s_{3},\ldots s_{T})=(s_{1},s_{3},\ldots s_{T})} 1899: 1947: 24: 1592:{\displaystyle d(S_{1},S_{2})=\min _{A}\left\{c(A)~{\rm {such~that}}~S_{2}=A(S_{1})\right\}} 644: 2033: 1630: 1603: 1421: 1394: 1147: 1120: 1097:
be a sequence of operators such that the application of all the operators of this sequence
1009: 982: 951: 924: 617: 149: 8: 1884: 1445:; a reasonable choice is to select the cheapest of such sequences. We thus call distance 42: 1374: 1100: 258: 50: 1978: 2012: 1983: 46: 2000: 1703:
is by definition nonnegative since it is the sum of positive costs, and trivially
2007: 2016: 1987: 1979:
Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect
38: 1941: 2027: 1600:
that is, the cost of the least expensive set of transformations that turn
1887:
however, depends on the definition of the set of elementary operations.
197:
the sequence space, i.e. the set of all possible sequences of states.
1799: 1252:{\displaystyle S_{2}=a_{1}\circ a_{2}\circ \ldots \circ a_{n}(S_{1})} 201: 1880:
cost usually refers to the common cost of insertion and deletion.
1299:
denotes the compound operator. To this set we associate the cost
244:{\displaystyle a_{i}:{\mathbf {S} }\rightarrow {\mathbf {S} }} 200:
Optimal matching algorithms work by defining simple operator
1953: 173:
belonging to a finite set of possible states. Let us denote
1950:
has implemented a package to run optimal matching analysis.
23:
in graph theory or the statistical problem of finding an
1982:
Sociological Methods & Research], Vol. 29, 3-33.
1902: 1808: 1798:, that is there is no cost. The distance function is 1764: 1709: 1660: 1633: 1606: 1452: 1424: 1397: 1377: 1305: 1265: 1177: 1150: 1123: 1103: 1039: 1012: 985: 954: 927: 869: 677: 647: 620: 464: 281: 261: 210: 179: 152: 71: 921:
is associated to each operator. Given two sequences
204:that manipulate sequences, i.e. a set of operators 1924: 1868: 1790: 1750: 1695: 1646: 1619: 1591: 1437: 1410: 1383: 1363: 1291: 1251: 1163: 1136: 1109: 1089: 1025: 998: 967: 940: 914:{\displaystyle c(a_{i})\in {\mathbf {R} }_{0}^{+}} 913: 848: 663: 633: 603: 448: 267: 243: 189: 165: 139:{\displaystyle S=(s_{1},s_{2},s_{3},\ldots s_{T})} 138: 1869:{\displaystyle c(a^{\rm {Ins}})=c(a^{\rm {Del}})} 2025: 1489: 2011:Sociological Methods & Research, 29 41-64. 1364:{\displaystyle c(A)=\sum _{i=1}^{n}c(a_{i})} 1090:{\displaystyle A={a_{1},a_{2},\ldots a_{n}}} 1802:if insertion and deletion costs are equal 1964: 458:one state is deleted from the sequence 2026: 1033:using operators from the algebra. Let 641:is replaced (substituted) by state 16:Sequence analysis in social science 13: 1857: 1854: 1851: 1827: 1824: 1821: 1544: 1541: 1538: 1535: 1529: 1526: 1523: 1520: 718: 715: 712: 489: 486: 483: 304: 301: 298: 14: 2055: 1751:{\displaystyle d(S_{1},S_{2})=0} 1292:{\displaystyle a_{1}\circ a_{2}} 895: 236: 226: 182: 1992: 1976:A. Abbott and A. Tsay, (2000) 1970: 1919: 1906: 1863: 1842: 1833: 1812: 1739: 1713: 1696:{\displaystyle d(S_{1},S_{2})} 1690: 1664: 1581: 1568: 1512: 1506: 1482: 1456: 1358: 1345: 1315: 1309: 1246: 1233: 886: 873: 843: 785: 779: 724: 598: 556: 550: 495: 443: 371: 365: 310: 231: 190:{\displaystyle {\mathbf {S} }} 133: 78: 1: 975:, the idea is to measure the 53:). Optimal matching uses the 1890: 275:is inserted in the sequence 60: 7: 2017:10.1177/0049124100029001003 1988:10.1177/0049124100029001001 1935: 1791:{\displaystyle S_{1}=S_{2}} 45:) classical tools (such as 10: 2060: 1144:gives the second sequence 55:Needleman-Wunsch algorithm 18: 1925:{\displaystyle c(a_{i})} 146:be a sequence of states 19:Not to be confused with 1926: 1870: 1792: 1752: 1697: 1648: 1621: 1593: 1439: 1412: 1385: 1365: 1341: 1293: 1253: 1165: 1138: 1117:to the first sequence 1111: 1091: 1027: 1000: 969: 942: 915: 850: 665: 664:{\displaystyle s'_{1}} 635: 605: 450: 269: 245: 191: 167: 140: 2044:Quantitative research 1927: 1871: 1793: 1753: 1698: 1649: 1647:{\displaystyle S_{2}} 1622: 1620:{\displaystyle S_{1}} 1594: 1440: 1438:{\displaystyle S_{2}} 1413: 1411:{\displaystyle S_{1}} 1386: 1366: 1321: 1294: 1254: 1166: 1164:{\displaystyle S_{2}} 1139: 1137:{\displaystyle S_{1}} 1112: 1092: 1028: 1026:{\displaystyle S_{1}} 1001: 999:{\displaystyle S_{2}} 970: 968:{\displaystyle S_{2}} 943: 941:{\displaystyle S_{1}} 916: 851: 666: 636: 634:{\displaystyle s_{1}} 606: 451: 270: 246: 192: 168: 166:{\displaystyle s_{i}} 141: 27:for causal inference. 2039:Statistical distance 1965:References and notes 1900: 1806: 1762: 1707: 1658: 1631: 1604: 1450: 1422: 1395: 1375: 1303: 1263: 1175: 1148: 1121: 1101: 1037: 1010: 983: 952: 925: 867: 675: 645: 618: 462: 279: 259: 208: 177: 150: 69: 910: 860:Imagine now that a 800: 723: 708: 660: 494: 309: 2006:2006-10-24 at the 1956:is an open source 1922: 1866: 1788: 1748: 1693: 1644: 1617: 1589: 1497: 1435: 1408: 1381: 1361: 1289: 1249: 1161: 1134: 1107: 1087: 1023: 996: 965: 938: 911: 892: 846: 788: 696: 678: 661: 648: 631: 601: 465: 446: 282: 265: 241: 187: 163: 136: 51:sequence alignment 1998:L. L. Wu. (2000) 1551: 1534: 1517: 1488: 1384:{\displaystyle A} 1110:{\displaystyle A} 268:{\displaystyle s} 35:sequence analysis 2051: 2019: 1996: 1990: 1974: 1931: 1929: 1928: 1923: 1918: 1917: 1875: 1873: 1872: 1867: 1862: 1861: 1860: 1832: 1831: 1830: 1797: 1795: 1794: 1789: 1787: 1786: 1774: 1773: 1757: 1755: 1754: 1749: 1738: 1737: 1725: 1724: 1702: 1700: 1699: 1694: 1689: 1688: 1676: 1675: 1653: 1651: 1650: 1645: 1643: 1642: 1626: 1624: 1623: 1618: 1616: 1615: 1598: 1596: 1595: 1590: 1588: 1584: 1580: 1579: 1561: 1560: 1549: 1548: 1547: 1532: 1515: 1496: 1481: 1480: 1468: 1467: 1444: 1442: 1441: 1436: 1434: 1433: 1417: 1415: 1414: 1409: 1407: 1406: 1391:that transform 1390: 1388: 1387: 1382: 1370: 1368: 1367: 1362: 1357: 1356: 1340: 1335: 1298: 1296: 1295: 1290: 1288: 1287: 1275: 1274: 1258: 1256: 1255: 1250: 1245: 1244: 1232: 1231: 1213: 1212: 1200: 1199: 1187: 1186: 1170: 1168: 1167: 1162: 1160: 1159: 1143: 1141: 1140: 1135: 1133: 1132: 1116: 1114: 1113: 1108: 1096: 1094: 1093: 1088: 1086: 1085: 1084: 1069: 1068: 1056: 1055: 1032: 1030: 1029: 1024: 1022: 1021: 1005: 1003: 1002: 997: 995: 994: 974: 972: 971: 966: 964: 963: 947: 945: 944: 939: 937: 936: 920: 918: 917: 912: 909: 904: 899: 898: 885: 884: 855: 853: 852: 847: 842: 841: 826: 825: 813: 812: 796: 778: 777: 762: 761: 749: 748: 736: 735: 722: 721: 709: 704: 692: 691: 670: 668: 667: 662: 656: 640: 638: 637: 632: 630: 629: 610: 608: 607: 602: 597: 596: 581: 580: 568: 567: 549: 548: 533: 532: 520: 519: 507: 506: 493: 492: 480: 479: 478: 455: 453: 452: 447: 442: 441: 426: 409: 408: 396: 395: 383: 382: 364: 363: 348: 347: 335: 334: 322: 321: 308: 307: 295: 294: 274: 272: 271: 266: 250: 248: 247: 242: 240: 239: 230: 229: 220: 219: 196: 194: 193: 188: 186: 185: 172: 170: 169: 164: 162: 161: 145: 143: 142: 137: 132: 131: 116: 115: 103: 102: 90: 89: 47:cluster analysis 31:Optimal matching 21:maximum matching 2059: 2058: 2054: 2053: 2052: 2050: 2049: 2048: 2024: 2023: 2022: 2008:Wayback Machine 1997: 1993: 1975: 1971: 1967: 1938: 1913: 1909: 1901: 1898: 1897: 1893: 1850: 1849: 1845: 1820: 1819: 1815: 1807: 1804: 1803: 1782: 1778: 1769: 1765: 1763: 1760: 1759: 1758:if and only if 1733: 1729: 1720: 1716: 1708: 1705: 1704: 1684: 1680: 1671: 1667: 1659: 1656: 1655: 1638: 1634: 1632: 1629: 1628: 1611: 1607: 1605: 1602: 1601: 1599: 1575: 1571: 1556: 1552: 1519: 1518: 1502: 1498: 1492: 1476: 1472: 1463: 1459: 1451: 1448: 1447: 1446: 1429: 1425: 1423: 1420: 1419: 1402: 1398: 1396: 1393: 1392: 1376: 1373: 1372: 1352: 1348: 1336: 1325: 1304: 1301: 1300: 1283: 1279: 1270: 1266: 1264: 1261: 1260: 1240: 1236: 1227: 1223: 1208: 1204: 1195: 1191: 1182: 1178: 1176: 1173: 1172: 1155: 1151: 1149: 1146: 1145: 1128: 1124: 1122: 1119: 1118: 1102: 1099: 1098: 1080: 1076: 1064: 1060: 1051: 1047: 1046: 1038: 1035: 1034: 1017: 1013: 1011: 1008: 1007: 990: 986: 984: 981: 980: 959: 955: 953: 950: 949: 932: 928: 926: 923: 922: 905: 900: 894: 893: 880: 876: 868: 865: 864: 837: 833: 821: 817: 808: 804: 792: 773: 769: 757: 753: 744: 740: 731: 727: 711: 710: 700: 687: 683: 682: 676: 673: 672: 652: 646: 643: 642: 625: 621: 619: 616: 615: 592: 588: 576: 572: 563: 559: 544: 540: 528: 524: 515: 511: 502: 498: 482: 481: 474: 470: 469: 463: 460: 459: 437: 433: 419: 404: 400: 391: 387: 378: 374: 359: 355: 343: 339: 330: 326: 317: 313: 297: 296: 287: 286: 280: 277: 276: 260: 257: 256: 235: 234: 225: 224: 215: 211: 209: 206: 205: 181: 180: 178: 175: 174: 157: 153: 151: 148: 147: 127: 123: 111: 107: 98: 94: 85: 81: 70: 67: 66: 63: 37:method used in 28: 17: 12: 11: 5: 2057: 2047: 2046: 2041: 2036: 2021: 2020: 1991: 1968: 1966: 1963: 1962: 1961: 1951: 1945: 1937: 1934: 1921: 1916: 1912: 1908: 1905: 1892: 1889: 1865: 1859: 1856: 1853: 1848: 1844: 1841: 1838: 1835: 1829: 1826: 1823: 1818: 1814: 1811: 1785: 1781: 1777: 1772: 1768: 1747: 1744: 1741: 1736: 1732: 1728: 1723: 1719: 1715: 1712: 1692: 1687: 1683: 1679: 1674: 1670: 1666: 1663: 1654:. Notice that 1641: 1637: 1614: 1610: 1587: 1583: 1578: 1574: 1570: 1567: 1564: 1559: 1555: 1546: 1543: 1540: 1537: 1531: 1528: 1525: 1522: 1514: 1511: 1508: 1505: 1501: 1495: 1491: 1487: 1484: 1479: 1475: 1471: 1466: 1462: 1458: 1455: 1432: 1428: 1405: 1401: 1380: 1360: 1355: 1351: 1347: 1344: 1339: 1334: 1331: 1328: 1324: 1320: 1317: 1314: 1311: 1308: 1286: 1282: 1278: 1273: 1269: 1248: 1243: 1239: 1235: 1230: 1226: 1222: 1219: 1216: 1211: 1207: 1203: 1198: 1194: 1190: 1185: 1181: 1158: 1154: 1131: 1127: 1106: 1083: 1079: 1075: 1072: 1067: 1063: 1059: 1054: 1050: 1045: 1042: 1020: 1016: 993: 989: 979:of obtaining 962: 958: 935: 931: 908: 903: 897: 891: 888: 883: 879: 875: 872: 858: 857: 845: 840: 836: 832: 829: 824: 820: 816: 811: 807: 803: 799: 795: 791: 787: 784: 781: 776: 772: 768: 765: 760: 756: 752: 747: 743: 739: 734: 730: 726: 720: 717: 714: 707: 703: 699: 695: 690: 686: 681: 659: 655: 651: 628: 624: 612: 600: 595: 591: 587: 584: 579: 575: 571: 566: 562: 558: 555: 552: 547: 543: 539: 536: 531: 527: 523: 518: 514: 510: 505: 501: 497: 491: 488: 485: 477: 473: 468: 456: 445: 440: 436: 432: 429: 425: 422: 418: 415: 412: 407: 403: 399: 394: 390: 386: 381: 377: 373: 370: 367: 362: 358: 354: 351: 346: 342: 338: 333: 329: 325: 320: 316: 312: 306: 303: 300: 293: 290: 285: 264: 238: 233: 228: 223: 218: 214: 184: 160: 156: 135: 130: 126: 122: 119: 114: 110: 106: 101: 97: 93: 88: 84: 80: 77: 74: 62: 59: 39:social science 15: 9: 6: 4: 3: 2: 2056: 2045: 2042: 2040: 2037: 2035: 2032: 2031: 2029: 2018: 2014: 2010: 2009: 2005: 2002: 1995: 1989: 1985: 1981: 1980: 1973: 1969: 1959: 1955: 1952: 1949: 1946: 1943: 1940: 1939: 1933: 1914: 1910: 1903: 1888: 1886: 1881: 1879: 1846: 1839: 1836: 1816: 1809: 1801: 1783: 1779: 1775: 1770: 1766: 1745: 1742: 1734: 1730: 1726: 1721: 1717: 1710: 1685: 1681: 1677: 1672: 1668: 1661: 1639: 1635: 1612: 1608: 1585: 1576: 1572: 1565: 1562: 1557: 1553: 1509: 1503: 1499: 1493: 1485: 1477: 1473: 1469: 1464: 1460: 1453: 1430: 1426: 1403: 1399: 1378: 1353: 1349: 1342: 1337: 1332: 1329: 1326: 1322: 1318: 1312: 1306: 1284: 1280: 1276: 1271: 1267: 1241: 1237: 1228: 1224: 1220: 1217: 1214: 1209: 1205: 1201: 1196: 1192: 1188: 1183: 1179: 1156: 1152: 1129: 1125: 1104: 1081: 1077: 1073: 1070: 1065: 1061: 1057: 1052: 1048: 1043: 1040: 1018: 1014: 991: 987: 978: 960: 956: 933: 929: 906: 901: 889: 881: 877: 870: 863: 838: 834: 830: 827: 822: 818: 814: 809: 805: 801: 797: 793: 789: 782: 774: 770: 766: 763: 758: 754: 750: 745: 741: 737: 732: 728: 705: 701: 697: 693: 688: 684: 679: 657: 653: 649: 626: 622: 613: 593: 589: 585: 582: 577: 573: 569: 564: 560: 553: 545: 541: 537: 534: 529: 525: 521: 516: 512: 508: 503: 499: 475: 471: 466: 457: 438: 434: 430: 427: 423: 420: 416: 413: 410: 405: 401: 397: 392: 388: 384: 379: 375: 368: 360: 356: 352: 349: 344: 340: 336: 331: 327: 323: 318: 314: 291: 288: 283: 262: 254: 253: 252: 221: 216: 212: 203: 198: 158: 154: 128: 124: 120: 117: 112: 108: 104: 99: 95: 91: 86: 82: 75: 72: 58: 56: 52: 48: 44: 40: 36: 32: 26: 25:optimal match 22: 1999: 1994: 1977: 1972: 1894: 1885:Transitivity 1882: 1877: 976: 861: 859: 199: 64: 30: 29: 2034:Data mining 1876:; the term 2028:Categories 255:one state 1891:Criticism 1800:symmetric 1323:∑ 1277:∘ 1221:∘ 1218:… 1215:∘ 1202:∘ 1074:… 890:∈ 831:… 767:… 586:… 538:… 431:… 414:… 353:… 232:→ 121:… 61:Algorithm 2004:Archived 1954:TraMineR 1936:Software 798:′ 706:′ 658:′ 614:a state 424:′ 292:′ 202:algebras 1550:  1533:  1516:  1259:where 1006:from 43:cohort 1948:STATA 1878:indel 1627:into 1418:into 33:is a 977:cost 948:and 862:cost 65:Let 2013:doi 1984:doi 1942:TDA 1490:min 611:and 2030:: 1932:. 1171:: 671:, 57:. 2015:: 1986:: 1958:R 1920:) 1915:i 1911:a 1907:( 1904:c 1864:) 1858:l 1855:e 1852:D 1847:a 1843:( 1840:c 1837:= 1834:) 1828:s 1825:n 1822:I 1817:a 1813:( 1810:c 1784:2 1780:S 1776:= 1771:1 1767:S 1746:0 1743:= 1740:) 1735:2 1731:S 1727:, 1722:1 1718:S 1714:( 1711:d 1691:) 1686:2 1682:S 1678:, 1673:1 1669:S 1665:( 1662:d 1640:2 1636:S 1613:1 1609:S 1586:} 1582:) 1577:1 1573:S 1569:( 1566:A 1563:= 1558:2 1554:S 1545:t 1542:a 1539:h 1536:t 1530:h 1527:c 1524:u 1521:s 1513:) 1510:A 1507:( 1504:c 1500:{ 1494:A 1486:= 1483:) 1478:2 1474:S 1470:, 1465:1 1461:S 1457:( 1454:d 1431:2 1427:S 1404:1 1400:S 1379:A 1359:) 1354:i 1350:a 1346:( 1343:c 1338:n 1333:1 1330:= 1327:i 1319:= 1316:) 1313:A 1310:( 1307:c 1285:2 1281:a 1272:1 1268:a 1247:) 1242:1 1238:S 1234:( 1229:n 1225:a 1210:2 1206:a 1197:1 1193:a 1189:= 1184:2 1180:S 1157:2 1153:S 1130:1 1126:S 1105:A 1082:n 1078:a 1071:, 1066:2 1062:a 1058:, 1053:1 1049:a 1044:= 1041:A 1019:1 1015:S 992:2 988:S 961:2 957:S 934:1 930:S 907:+ 902:0 896:R 887:) 882:i 878:a 874:( 871:c 856:. 844:) 839:T 835:s 828:, 823:3 819:s 815:, 810:2 806:s 802:, 794:1 790:s 786:( 783:= 780:) 775:T 771:s 764:, 759:3 755:s 751:, 746:2 742:s 738:, 733:1 729:s 725:( 719:b 716:u 713:S 702:1 698:s 694:, 689:1 685:s 680:a 654:1 650:s 627:1 623:s 599:) 594:T 590:s 583:, 578:3 574:s 570:, 565:1 561:s 557:( 554:= 551:) 546:T 542:s 535:, 530:3 526:s 522:, 517:2 513:s 509:, 504:1 500:s 496:( 490:l 487:e 484:D 476:2 472:s 467:a 444:) 439:T 435:s 428:, 421:s 417:, 411:, 406:3 402:s 398:, 393:2 389:s 385:, 380:1 376:s 372:( 369:= 366:) 361:T 357:s 350:, 345:3 341:s 337:, 332:2 328:s 324:, 319:1 315:s 311:( 305:s 302:n 299:I 289:s 284:a 263:s 237:S 227:S 222:: 217:i 213:a 183:S 159:i 155:s 134:) 129:T 125:s 118:, 113:3 109:s 105:, 100:2 96:s 92:, 87:1 83:s 79:( 76:= 73:S

Index

maximum matching
optimal match
sequence analysis
social science
cohort
cluster analysis
sequence alignment
Needleman-Wunsch algorithm
algebras
symmetric
Transitivity
TDA
STATA
TraMineR
R
Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect
doi
10.1177/0049124100029001001
Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect"
Archived
Wayback Machine
doi
10.1177/0049124100029001003
Categories
Data mining
Statistical distance
Quantitative research

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.