1895:
Although optimal matching techniques are widely used in sociology and demography, such techniques also have their flaws. As was pointed out by several authors (for example L. L. Wu), the main problem in the application of optimal matching is to appropriately define the costs
854:
454:
41:, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations (e.g. individuals in a
609:
1597:
1257:
249:
919:
144:
1874:
1369:
1095:
674:
278:
1756:
1297:
1701:
195:
1796:
669:
1930:
461:
1652:
1625:
1443:
1416:
1169:
1142:
1031:
1004:
973:
946:
639:
171:
1449:
1389:
1115:
273:
49:) can be used. The method was tailored to social sciences from a technique originally introduced to study molecular biology (protein or genetic) sequences (see
1883:
Considering a set composed of only the three basic operations described above, this proximity measure satisfies the triangular inequality.
1174:
1371:, that represents the total cost of the transformation. One should consider at this point that there might exist different such sequences
207:
2003:
34:
866:
68:
1805:
1302:
1036:
2043:
849:{\displaystyle a_{s_{1},s'_{1}}^{\rm {Sub}}(s_{1},s_{2},s_{3},\ldots s_{T})=(s'_{1},s_{2},s_{3},\ldots s_{T})}
449:{\displaystyle a_{s'}^{\rm {Ins}}(s_{1},s_{2},s_{3},\ldots s_{T})=(s_{1},s_{2},s_{3},\ldots ,s',\ldots s_{T})}
251:. In the most simple approach, a set composed of only three basic operations to transform sequences is used:
2038:
20:
1960:-package for analyzing and visualizing states and events sequences, including optimal matching analysis.
54:
1944:
is a powerful program, offering access to some of the latest developments in transition data analysis.
1706:
1262:
1657:
176:
2001:
Some
Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect"
1957:
1761:
604:{\displaystyle a_{s_{2}}^{\rm {Del}}(s_{1},s_{2},s_{3},\ldots s_{T})=(s_{1},s_{3},\ldots s_{T})}
1899:
1947:
24:
1592:{\displaystyle d(S_{1},S_{2})=\min _{A}\left\{c(A)~{\rm {such~that}}~S_{2}=A(S_{1})\right\}}
644:
2033:
1630:
1603:
1421:
1394:
1147:
1120:
1097:
be a sequence of operators such that the application of all the operators of this sequence
1009:
982:
951:
924:
617:
149:
8:
1884:
1445:; a reasonable choice is to select the cheapest of such sequences. We thus call distance
42:
1374:
1100:
258:
50:
1978:
2012:
1983:
46:
2000:
1703:
is by definition nonnegative since it is the sum of positive costs, and trivially
2007:
2016:
1987:
1979:
Sequence
Analysis and Optimal Matching Methods in Sociology: Review and Prospect
38:
1941:
2027:
1600:
that is, the cost of the least expensive set of transformations that turn
1887:
however, depends on the definition of the set of elementary operations.
197:
the sequence space, i.e. the set of all possible sequences of states.
1799:
1252:{\displaystyle S_{2}=a_{1}\circ a_{2}\circ \ldots \circ a_{n}(S_{1})}
201:
1880:
cost usually refers to the common cost of insertion and deletion.
1299:
denotes the compound operator. To this set we associate the cost
244:{\displaystyle a_{i}:{\mathbf {S} }\rightarrow {\mathbf {S} }}
200:
Optimal matching algorithms work by defining simple operator
1953:
173:
belonging to a finite set of possible states. Let us denote
1950:
has implemented a package to run optimal matching analysis.
23:
in graph theory or the statistical problem of finding an
1982:
Sociological
Methods & Research], Vol. 29, 3-33.
1902:
1808:
1798:, that is there is no cost. The distance function is
1764:
1709:
1660:
1633:
1606:
1452:
1424:
1397:
1377:
1305:
1265:
1177:
1150:
1123:
1103:
1039:
1012:
985:
954:
927:
869:
677:
647:
620:
464:
281:
261:
210:
179:
152:
71:
921:
is associated to each operator. Given two sequences
204:that manipulate sequences, i.e. a set of operators
1924:
1868:
1790:
1750:
1695:
1646:
1619:
1591:
1437:
1410:
1383:
1363:
1291:
1251:
1163:
1136:
1109:
1089:
1025:
998:
967:
940:
914:{\displaystyle c(a_{i})\in {\mathbf {R} }_{0}^{+}}
913:
848:
663:
633:
603:
448:
267:
243:
189:
165:
139:{\displaystyle S=(s_{1},s_{2},s_{3},\ldots s_{T})}
138:
1869:{\displaystyle c(a^{\rm {Ins}})=c(a^{\rm {Del}})}
2025:
1489:
2011:Sociological Methods & Research, 29 41-64.
1364:{\displaystyle c(A)=\sum _{i=1}^{n}c(a_{i})}
1090:{\displaystyle A={a_{1},a_{2},\ldots a_{n}}}
1802:if insertion and deletion costs are equal
1964:
458:one state is deleted from the sequence
2026:
1033:using operators from the algebra. Let
641:is replaced (substituted) by state
16:Sequence analysis in social science
13:
1857:
1854:
1851:
1827:
1824:
1821:
1544:
1541:
1538:
1535:
1529:
1526:
1523:
1520:
718:
715:
712:
489:
486:
483:
304:
301:
298:
14:
2055:
1751:{\displaystyle d(S_{1},S_{2})=0}
1292:{\displaystyle a_{1}\circ a_{2}}
895:
236:
226:
182:
1992:
1976:A. Abbott and A. Tsay, (2000)
1970:
1919:
1906:
1863:
1842:
1833:
1812:
1739:
1713:
1696:{\displaystyle d(S_{1},S_{2})}
1690:
1664:
1581:
1568:
1512:
1506:
1482:
1456:
1358:
1345:
1315:
1309:
1246:
1233:
886:
873:
843:
785:
779:
724:
598:
556:
550:
495:
443:
371:
365:
310:
231:
190:{\displaystyle {\mathbf {S} }}
133:
78:
1:
975:, the idea is to measure the
53:). Optimal matching uses the
1890:
275:is inserted in the sequence
60:
7:
2017:10.1177/0049124100029001003
1988:10.1177/0049124100029001001
1935:
1791:{\displaystyle S_{1}=S_{2}}
45:) classical tools (such as
10:
2060:
1144:gives the second sequence
55:Needleman-Wunsch algorithm
18:
1925:{\displaystyle c(a_{i})}
146:be a sequence of states
19:Not to be confused with
1926:
1870:
1792:
1752:
1697:
1648:
1621:
1593:
1439:
1412:
1385:
1365:
1341:
1293:
1253:
1165:
1138:
1117:to the first sequence
1111:
1091:
1027:
1000:
969:
942:
915:
850:
665:
664:{\displaystyle s'_{1}}
635:
605:
450:
269:
245:
191:
167:
140:
2044:Quantitative research
1927:
1871:
1793:
1753:
1698:
1649:
1647:{\displaystyle S_{2}}
1622:
1620:{\displaystyle S_{1}}
1594:
1440:
1438:{\displaystyle S_{2}}
1413:
1411:{\displaystyle S_{1}}
1386:
1366:
1321:
1294:
1254:
1166:
1164:{\displaystyle S_{2}}
1139:
1137:{\displaystyle S_{1}}
1112:
1092:
1028:
1026:{\displaystyle S_{1}}
1001:
999:{\displaystyle S_{2}}
970:
968:{\displaystyle S_{2}}
943:
941:{\displaystyle S_{1}}
916:
851:
666:
636:
634:{\displaystyle s_{1}}
606:
451:
270:
246:
192:
168:
166:{\displaystyle s_{i}}
141:
27:for causal inference.
2039:Statistical distance
1965:References and notes
1900:
1806:
1762:
1707:
1658:
1631:
1604:
1450:
1422:
1395:
1375:
1303:
1263:
1175:
1148:
1121:
1101:
1037:
1010:
983:
952:
925:
867:
675:
645:
618:
462:
279:
259:
208:
177:
150:
69:
910:
860:Imagine now that a
800:
723:
708:
660:
494:
309:
2006:2006-10-24 at the
1956:is an open source
1922:
1866:
1788:
1748:
1693:
1644:
1617:
1589:
1497:
1435:
1408:
1381:
1361:
1289:
1249:
1161:
1134:
1107:
1087:
1023:
996:
965:
938:
911:
892:
846:
788:
696:
678:
661:
648:
631:
601:
465:
446:
282:
265:
241:
187:
163:
136:
51:sequence alignment
1998:L. L. Wu. (2000)
1551:
1534:
1517:
1488:
1384:{\displaystyle A}
1110:{\displaystyle A}
268:{\displaystyle s}
35:sequence analysis
2051:
2019:
1996:
1990:
1974:
1931:
1929:
1928:
1923:
1918:
1917:
1875:
1873:
1872:
1867:
1862:
1861:
1860:
1832:
1831:
1830:
1797:
1795:
1794:
1789:
1787:
1786:
1774:
1773:
1757:
1755:
1754:
1749:
1738:
1737:
1725:
1724:
1702:
1700:
1699:
1694:
1689:
1688:
1676:
1675:
1653:
1651:
1650:
1645:
1643:
1642:
1626:
1624:
1623:
1618:
1616:
1615:
1598:
1596:
1595:
1590:
1588:
1584:
1580:
1579:
1561:
1560:
1549:
1548:
1547:
1532:
1515:
1496:
1481:
1480:
1468:
1467:
1444:
1442:
1441:
1436:
1434:
1433:
1417:
1415:
1414:
1409:
1407:
1406:
1391:that transform
1390:
1388:
1387:
1382:
1370:
1368:
1367:
1362:
1357:
1356:
1340:
1335:
1298:
1296:
1295:
1290:
1288:
1287:
1275:
1274:
1258:
1256:
1255:
1250:
1245:
1244:
1232:
1231:
1213:
1212:
1200:
1199:
1187:
1186:
1170:
1168:
1167:
1162:
1160:
1159:
1143:
1141:
1140:
1135:
1133:
1132:
1116:
1114:
1113:
1108:
1096:
1094:
1093:
1088:
1086:
1085:
1084:
1069:
1068:
1056:
1055:
1032:
1030:
1029:
1024:
1022:
1021:
1005:
1003:
1002:
997:
995:
994:
974:
972:
971:
966:
964:
963:
947:
945:
944:
939:
937:
936:
920:
918:
917:
912:
909:
904:
899:
898:
885:
884:
855:
853:
852:
847:
842:
841:
826:
825:
813:
812:
796:
778:
777:
762:
761:
749:
748:
736:
735:
722:
721:
709:
704:
692:
691:
670:
668:
667:
662:
656:
640:
638:
637:
632:
630:
629:
610:
608:
607:
602:
597:
596:
581:
580:
568:
567:
549:
548:
533:
532:
520:
519:
507:
506:
493:
492:
480:
479:
478:
455:
453:
452:
447:
442:
441:
426:
409:
408:
396:
395:
383:
382:
364:
363:
348:
347:
335:
334:
322:
321:
308:
307:
295:
294:
274:
272:
271:
266:
250:
248:
247:
242:
240:
239:
230:
229:
220:
219:
196:
194:
193:
188:
186:
185:
172:
170:
169:
164:
162:
161:
145:
143:
142:
137:
132:
131:
116:
115:
103:
102:
90:
89:
47:cluster analysis
31:Optimal matching
21:maximum matching
2059:
2058:
2054:
2053:
2052:
2050:
2049:
2048:
2024:
2023:
2022:
2008:Wayback Machine
1997:
1993:
1975:
1971:
1967:
1938:
1913:
1909:
1901:
1898:
1897:
1893:
1850:
1849:
1845:
1820:
1819:
1815:
1807:
1804:
1803:
1782:
1778:
1769:
1765:
1763:
1760:
1759:
1758:if and only if
1733:
1729:
1720:
1716:
1708:
1705:
1704:
1684:
1680:
1671:
1667:
1659:
1656:
1655:
1638:
1634:
1632:
1629:
1628:
1611:
1607:
1605:
1602:
1601:
1599:
1575:
1571:
1556:
1552:
1519:
1518:
1502:
1498:
1492:
1476:
1472:
1463:
1459:
1451:
1448:
1447:
1446:
1429:
1425:
1423:
1420:
1419:
1402:
1398:
1396:
1393:
1392:
1376:
1373:
1372:
1352:
1348:
1336:
1325:
1304:
1301:
1300:
1283:
1279:
1270:
1266:
1264:
1261:
1260:
1240:
1236:
1227:
1223:
1208:
1204:
1195:
1191:
1182:
1178:
1176:
1173:
1172:
1155:
1151:
1149:
1146:
1145:
1128:
1124:
1122:
1119:
1118:
1102:
1099:
1098:
1080:
1076:
1064:
1060:
1051:
1047:
1046:
1038:
1035:
1034:
1017:
1013:
1011:
1008:
1007:
990:
986:
984:
981:
980:
959:
955:
953:
950:
949:
932:
928:
926:
923:
922:
905:
900:
894:
893:
880:
876:
868:
865:
864:
837:
833:
821:
817:
808:
804:
792:
773:
769:
757:
753:
744:
740:
731:
727:
711:
710:
700:
687:
683:
682:
676:
673:
672:
652:
646:
643:
642:
625:
621:
619:
616:
615:
592:
588:
576:
572:
563:
559:
544:
540:
528:
524:
515:
511:
502:
498:
482:
481:
474:
470:
469:
463:
460:
459:
437:
433:
419:
404:
400:
391:
387:
378:
374:
359:
355:
343:
339:
330:
326:
317:
313:
297:
296:
287:
286:
280:
277:
276:
260:
257:
256:
235:
234:
225:
224:
215:
211:
209:
206:
205:
181:
180:
178:
175:
174:
157:
153:
151:
148:
147:
127:
123:
111:
107:
98:
94:
85:
81:
70:
67:
66:
63:
37:method used in
28:
17:
12:
11:
5:
2057:
2047:
2046:
2041:
2036:
2021:
2020:
1991:
1968:
1966:
1963:
1962:
1961:
1951:
1945:
1937:
1934:
1921:
1916:
1912:
1908:
1905:
1892:
1889:
1865:
1859:
1856:
1853:
1848:
1844:
1841:
1838:
1835:
1829:
1826:
1823:
1818:
1814:
1811:
1785:
1781:
1777:
1772:
1768:
1747:
1744:
1741:
1736:
1732:
1728:
1723:
1719:
1715:
1712:
1692:
1687:
1683:
1679:
1674:
1670:
1666:
1663:
1654:. Notice that
1641:
1637:
1614:
1610:
1587:
1583:
1578:
1574:
1570:
1567:
1564:
1559:
1555:
1546:
1543:
1540:
1537:
1531:
1528:
1525:
1522:
1514:
1511:
1508:
1505:
1501:
1495:
1491:
1487:
1484:
1479:
1475:
1471:
1466:
1462:
1458:
1455:
1432:
1428:
1405:
1401:
1380:
1360:
1355:
1351:
1347:
1344:
1339:
1334:
1331:
1328:
1324:
1320:
1317:
1314:
1311:
1308:
1286:
1282:
1278:
1273:
1269:
1248:
1243:
1239:
1235:
1230:
1226:
1222:
1219:
1216:
1211:
1207:
1203:
1198:
1194:
1190:
1185:
1181:
1158:
1154:
1131:
1127:
1106:
1083:
1079:
1075:
1072:
1067:
1063:
1059:
1054:
1050:
1045:
1042:
1020:
1016:
993:
989:
979:of obtaining
962:
958:
935:
931:
908:
903:
897:
891:
888:
883:
879:
875:
872:
858:
857:
845:
840:
836:
832:
829:
824:
820:
816:
811:
807:
803:
799:
795:
791:
787:
784:
781:
776:
772:
768:
765:
760:
756:
752:
747:
743:
739:
734:
730:
726:
720:
717:
714:
707:
703:
699:
695:
690:
686:
681:
659:
655:
651:
628:
624:
612:
600:
595:
591:
587:
584:
579:
575:
571:
566:
562:
558:
555:
552:
547:
543:
539:
536:
531:
527:
523:
518:
514:
510:
505:
501:
497:
491:
488:
485:
477:
473:
468:
456:
445:
440:
436:
432:
429:
425:
422:
418:
415:
412:
407:
403:
399:
394:
390:
386:
381:
377:
373:
370:
367:
362:
358:
354:
351:
346:
342:
338:
333:
329:
325:
320:
316:
312:
306:
303:
300:
293:
290:
285:
264:
238:
233:
228:
223:
218:
214:
184:
160:
156:
135:
130:
126:
122:
119:
114:
110:
106:
101:
97:
93:
88:
84:
80:
77:
74:
62:
59:
39:social science
15:
9:
6:
4:
3:
2:
2056:
2045:
2042:
2040:
2037:
2035:
2032:
2031:
2029:
2018:
2014:
2010:
2009:
2005:
2002:
1995:
1989:
1985:
1981:
1980:
1973:
1969:
1959:
1955:
1952:
1949:
1946:
1943:
1940:
1939:
1933:
1914:
1910:
1903:
1888:
1886:
1881:
1879:
1846:
1839:
1836:
1816:
1809:
1801:
1783:
1779:
1775:
1770:
1766:
1745:
1742:
1734:
1730:
1726:
1721:
1717:
1710:
1685:
1681:
1677:
1672:
1668:
1661:
1639:
1635:
1612:
1608:
1585:
1576:
1572:
1565:
1562:
1557:
1553:
1509:
1503:
1499:
1493:
1485:
1477:
1473:
1469:
1464:
1460:
1453:
1430:
1426:
1403:
1399:
1378:
1353:
1349:
1342:
1337:
1332:
1329:
1326:
1322:
1318:
1312:
1306:
1284:
1280:
1276:
1271:
1267:
1241:
1237:
1228:
1224:
1220:
1217:
1214:
1209:
1205:
1201:
1196:
1192:
1188:
1183:
1179:
1156:
1152:
1129:
1125:
1104:
1081:
1077:
1073:
1070:
1065:
1061:
1057:
1052:
1048:
1043:
1040:
1018:
1014:
991:
987:
978:
960:
956:
933:
929:
906:
901:
889:
881:
877:
870:
863:
838:
834:
830:
827:
822:
818:
814:
809:
805:
801:
797:
793:
789:
782:
774:
770:
766:
763:
758:
754:
750:
745:
741:
737:
732:
728:
705:
701:
697:
693:
688:
684:
679:
657:
653:
649:
626:
622:
613:
593:
589:
585:
582:
577:
573:
569:
564:
560:
553:
545:
541:
537:
534:
529:
525:
521:
516:
512:
508:
503:
499:
475:
471:
466:
457:
438:
434:
430:
427:
423:
420:
416:
413:
410:
405:
401:
397:
392:
388:
384:
379:
375:
368:
360:
356:
352:
349:
344:
340:
336:
331:
327:
323:
318:
314:
291:
288:
283:
262:
254:
253:
252:
221:
216:
212:
203:
198:
158:
154:
128:
124:
120:
117:
112:
108:
104:
99:
95:
91:
86:
82:
75:
72:
58:
56:
52:
48:
44:
40:
36:
32:
26:
25:optimal match
22:
1999:
1994:
1977:
1972:
1894:
1885:Transitivity
1882:
1877:
976:
861:
859:
199:
64:
30:
29:
2034:Data mining
1876:; the term
2028:Categories
255:one state
1891:Criticism
1800:symmetric
1323:∑
1277:∘
1221:∘
1218:…
1215:∘
1202:∘
1074:…
890:∈
831:…
767:…
586:…
538:…
431:…
414:…
353:…
232:→
121:…
61:Algorithm
2004:Archived
1954:TraMineR
1936:Software
798:′
706:′
658:′
614:a state
424:′
292:′
202:algebras
1550:
1533:
1516:
1259:where
1006:from
43:cohort
1948:STATA
1878:indel
1627:into
1418:into
33:is a
977:cost
948:and
862:cost
65:Let
2013:doi
1984:doi
1942:TDA
1490:min
611:and
2030::
1932:.
1171::
671:,
57:.
2015::
1986::
1958:R
1920:)
1915:i
1911:a
1907:(
1904:c
1864:)
1858:l
1855:e
1852:D
1847:a
1843:(
1840:c
1837:=
1834:)
1828:s
1825:n
1822:I
1817:a
1813:(
1810:c
1784:2
1780:S
1776:=
1771:1
1767:S
1746:0
1743:=
1740:)
1735:2
1731:S
1727:,
1722:1
1718:S
1714:(
1711:d
1691:)
1686:2
1682:S
1678:,
1673:1
1669:S
1665:(
1662:d
1640:2
1636:S
1613:1
1609:S
1586:}
1582:)
1577:1
1573:S
1569:(
1566:A
1563:=
1558:2
1554:S
1545:t
1542:a
1539:h
1536:t
1530:h
1527:c
1524:u
1521:s
1513:)
1510:A
1507:(
1504:c
1500:{
1494:A
1486:=
1483:)
1478:2
1474:S
1470:,
1465:1
1461:S
1457:(
1454:d
1431:2
1427:S
1404:1
1400:S
1379:A
1359:)
1354:i
1350:a
1346:(
1343:c
1338:n
1333:1
1330:=
1327:i
1319:=
1316:)
1313:A
1310:(
1307:c
1285:2
1281:a
1272:1
1268:a
1247:)
1242:1
1238:S
1234:(
1229:n
1225:a
1210:2
1206:a
1197:1
1193:a
1189:=
1184:2
1180:S
1157:2
1153:S
1130:1
1126:S
1105:A
1082:n
1078:a
1071:,
1066:2
1062:a
1058:,
1053:1
1049:a
1044:=
1041:A
1019:1
1015:S
992:2
988:S
961:2
957:S
934:1
930:S
907:+
902:0
896:R
887:)
882:i
878:a
874:(
871:c
856:.
844:)
839:T
835:s
828:,
823:3
819:s
815:,
810:2
806:s
802:,
794:1
790:s
786:(
783:=
780:)
775:T
771:s
764:,
759:3
755:s
751:,
746:2
742:s
738:,
733:1
729:s
725:(
719:b
716:u
713:S
702:1
698:s
694:,
689:1
685:s
680:a
654:1
650:s
627:1
623:s
599:)
594:T
590:s
583:,
578:3
574:s
570:,
565:1
561:s
557:(
554:=
551:)
546:T
542:s
535:,
530:3
526:s
522:,
517:2
513:s
509:,
504:1
500:s
496:(
490:l
487:e
484:D
476:2
472:s
467:a
444:)
439:T
435:s
428:,
421:s
417:,
411:,
406:3
402:s
398:,
393:2
389:s
385:,
380:1
376:s
372:(
369:=
366:)
361:T
357:s
350:,
345:3
341:s
337:,
332:2
328:s
324:,
319:1
315:s
311:(
305:s
302:n
299:I
289:s
284:a
263:s
237:S
227:S
222::
217:i
213:a
183:S
159:i
155:s
134:)
129:T
125:s
118:,
113:3
109:s
105:,
100:2
96:s
92:,
87:1
83:s
79:(
76:=
73:S
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.