212:
The examples below point out the main differences of the two methods. Knowing that in both examples the truth is provided by source 1, in the single truth case (first table) we can say that sources 2 and 3 oppose to the truth and as a result provide wrong values. On the other hand, in the second case
751:, and sources can provide multiple values for a single data item, it is not possible to consider values individually. An alternative is to consider mappings and relations between set of provided values and sources providing them. The trustworthiness of a source is then computed based on the
422:
Detecting copying behaviors is very important, in fact, copy allows to spread false values easily making truth discovery very hard, since many sources would vote for the wrong values. Usually systems decrease the weight of votes associated to copied values or even don’t count them at all.
465:. The vote assigned to a value is computed as the sum of the trustworthiness of the sources that provide that particular value, while the trustworthiness of a source is computed as the sum of the votes assigned to the values that the source provides.
619:
411:
is refined, improving the assessment of the true values that in turn leads to a better estimation of the trustworthiness of the sources. This process usually ends when all the values reach a convergence state.
1357:
445:
is the simplest method, the most popular value is selected as the true one. Majority voting is commonly used as a baseline when assessing the performances of more complex methods.
699:
144:
after hand-crafted labeling of the provided values; unfortunately, this is not feasible since the number of needed labeled examples should be proportional to the number of
668:
643:
1312:
Zhao, Bo; Rubinstein, Benjamin I. P.; Gemmell, Jim; Han, Jiawei (2012-02-01). "A Bayesian approach to discovering truth from conflicting sources for data integration".
213:(second table), sources 2 and 3 are neither correct nor erroneous, they instead provide a subset of the true values and at the same time they do not oppose the truth.
489:) between the set of values provided by the source and the set of values considered true (either selected in a probabilistic way or obtained from a ground truth).
103:. This, together with the fact that we are increasing our reliance on data to derive important decisions, motivates the need of developing good truth discovery
434:
Below are reported some of the characteristics of the most relevant typologies of single-truth methods and how different systems model source trustworthiness.
209:
Multi-truth discovery has unique features that make the problem more complex and should be taken into consideration when developing truth-discovery solutions.
389:
and, at the end, the value with the highest vote is select as the true one. In the more sophisticated methods, votes do not have the same weight for all the
60:(e.g birthday of a person, capital city of a country). While in the second case multiple true values are allowed (e.g. cast of a movie, authors of a book).
506:
56:
Truth discovery problems can be divided into two sub-classes: single-truth and multi-truth. In the first case only one true value is allowed for a
770:
to automatically define the set of true values of given data item and also to assess source quality without need of any supervision.
876:
Li, Yaliang; Gao, Jing; Meng, Chuishi; Li, Qi; Su, Lu; Zhao, Bo; Fan, Wei; Han, Jiawei (2016-02-25). "A Survey on Truth
Discovery".
778:
Many real-world applications can benefit from the use of truth discovery algorithms. Typical domains of application include:
758:
More sophisticated methods also consider domain coverage and copying behaviors to better estimate source trustworthiness.
1079:; Lyons, Kenneth; Meng, Weiyi; Srivastava, Divesh (2012-12-01). "Truth finding on the deep web: is the problem solved?".
1277:
Xiaoxin Yin; Jiawei Han; Yu, P.S. (2008). "Truth
Discovery with Multiple Conflicting Information Providers on the Web".
1151:; Berti-Equille, Laure; Srivastava, Divesh (2009-08-01). "Integrating conflicting data: the role of source dependence".
385:
The vast majority of truth discovery methods are based on a voting approach: each source votes for a value of a certain
431:
Most of the currently available truth discovery methods have been designed to work well only in the single-truth case.
1253:
955:
1130:
Proceedings of the 14th
International Conference on Neural Information Processing Systems: Natural and Synthetic
364:
346:
328:
306:
673:
743:
to define the probability of a group of values being true conditioned on the values provided by all the
453:
These methods estimate source trustworthiness exploiting a similar technique to the one used to measure
501:
to define the probability of a value being true conditioned on the values provided by all the sources.
1126:"On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes"
651:
626:
839:
986:
Lin, Xueling; Chen, Lei (2018). "Domain-aware Multi-truth
Discovery from Conflicting Sources".
930:
Proceedings of the 24th ACM International on
Conference on Information and Knowledge Management
822:
791:
752:
709:
416:
478:
716:
to detect copying behaviors and use these insights to better assess source trustworthiness.
1228:
8:
137:
1232:
924:
Wang, Xianzhi; Sheng, Quan Z.; Fang, Xiu Susie; Yao, Lina; Xu, Xiaofei; Li, Xue (2015).
1380:
1339:
1321:
1259:
1209:
1168:
1106:
1088:
961:
893:
740:
713:
498:
486:
474:
1037:
1020:
925:
1294:
1249:
1213:
1042:
951:
810:
482:
270:
254:
238:
1125:
965:
731:
Below are reported two typologies of multi-truth methods and their characteristics.
614:{\displaystyle P(v\mid \psi (o))={\frac {P(\psi (o)\mid v)\cdot P(v)}{P(\psi (o))}}}
1343:
1331:
1286:
1263:
1241:
1234:
Proceedings of the third ACM international conference on Web search and data mining
1199:
1172:
1160:
1110:
1098:
1032:
995:
941:
933:
897:
885:
834:
64:
767:
333:
311:
76:
68:
801:
Truth discovery algorithms could be also used to revolutionize the way in which
701:
is the set of the observed values provided by all the sources for that specific
795:
454:
1358:"The huge implications of Google's idea to rank sites based on their accuracy"
403:
but estimated with an iterative approach. At each step of the truth discovery
1374:
1335:
1298:
1290:
1164:
1148:
1102:
1076:
1046:
1016:
814:
787:
442:
369:
351:
315:
119:
111:
46:
1245:
1227:
Galland, Alban; Abiteboul, Serge; Marian, Amélie; Senellart, Pierre (2010).
999:
937:
889:
728:, less attention has been devoted to the study of the multi-truth discovery
45:
have been proposed to tackle this problem, ranging from simple methods like
850:
783:
419:
of provided values, copying values from other sources and domain coverage.
1204:
1187:
844:
779:
393:, more importance is indeed given to votes coming from trusted sources.
156:
Single-truth and multi-truth discovery are two very different problems.
946:
725:
926:"An Integrated Bayesian Approach for Effective Multi-Truth Discovery"
748:
702:
646:
462:
404:
386:
199:
192:
170:
different values provided for a given data item oppose to each other;
164:
159:
Single-truth discovery is characterized by the following properties:
148:, and in many applications the number of sources can be prohibitive.
129:
The solution to this problem is to assess the trustworthiness of the
123:
115:
104:
100:
80:
57:
42:
31:
118:. Nevertheless, recent studies, have shown that, if we rely only on
1093:
818:
802:
747:. In this case, since there could be multiple true values for each
744:
458:
408:
390:
174:
145:
141:
130:
96:
92:
72:
50:
35:
1326:
415:
Source trustworthiness can be based on different metrics, such as
806:
712:
of the values that provides. Other more complex methods exploit
99:
provide (partially or completely) different values for the same
1226:
140:
techniques could be exploited to assign a reliability score to
133:
and give more importance to votes coming from trusted sources.
708:
The trustworthiness of a source is then computed based on the
181:
While in the multi-truth case the following properties hold:
49:
to more complex ones able to estimate the trustworthiness of
1147:
16:
Process of choosing the actual true value for a data item
1311:
1240:. New York, New York, USA: ACM Press. pp. 131–140.
151:
1276:
761:
677:
655:
630:
932:. Melbourne, Australia: ACM Press. pp. 493–502.
676:
654:
629:
509:
1188:"Authoritative sources in a hyperlinked environment"
473:
These methods estimate source trustworthiness using
95:
makes more and more probable to find that different
1279:
IEEE Transactions on
Knowledge and Data Engineering
1229:"Corroborating information from disagreeing views"
693:
662:
637:
613:
821:, to procedures that rank web pages based on the
63:Typically, truth discovery is the last step of a
1372:
923:
195:does not imply opposing to all the other values;
122:, we could get wrong results even in 30% of the
1074:
188:different values could provide a partial truth;
1015:
468:
481:. Source trustworthiness is computed as the
396:Source trustworthiness usually is not known
875:
110:Many currently available methods rely on a
285:Who wrote “The nature of space and time”?
1325:
1203:
1185:
1124:Ng, Andrew Y; Jordan, Michael I. (2001).
1092:
1036:
945:
380:
185:the truth is composed by a set of values;
1123:
163:only one true value is allowed for each
26:) is the process of choosing the actual
985:
426:
91:The abundance of data available on the
38:provide conflicting information on it.
1373:
813:, going from current methods based on
719:
79:and the records referring to the same
1025:Synthesis Lectures on Data Management
152:Single-truth vs multi-truth discovery
86:
1143:
1141:
1139:
1070:
1068:
1066:
1064:
1062:
1060:
1058:
1056:
1011:
1009:
981:
979:
977:
975:
919:
917:
915:
913:
911:
909:
907:
871:
869:
867:
865:
762:Probabilistic Graphical Models based
1019:; Srivastava, Divesh (2015-02-15).
734:
694:{\displaystyle \textstyle \psi (o)}
198:the number of true values for each
177:can either be correct or erroneous.
13:
878:ACM SIGKDD Explorations Newsletter
437:
407:the trustworthiness score of each
14:
1392:
1314:Proceedings of the VLDB Endowment
1153:Proceedings of the VLDB Endowment
1136:
1081:Proceedings of the VLDB Endowment
1053:
1038:10.2200/S00578ED1V01Y201404DTM040
1006:
972:
904:
862:
825:of the information they provide.
492:
448:
217:When was George Washington born?
1186:Kleinberg, Jon M. (1999-09-01).
1350:
1305:
1270:
773:
191:claiming one value for a given
1220:
1179:
1117:
768:probabilistic graphical models
687:
681:
605:
602:
596:
590:
582:
576:
567:
558:
552:
546:
534:
531:
525:
513:
114:to define the true value of a
1:
856:
755:of the values that provides.
663:{\displaystyle \textstyle o}
638:{\displaystyle \textstyle v}
365:The nature of space and time
347:The nature of space and time
329:The nature of space and time
307:The nature of space and time
7:
828:
469:Information-retrieval based
10:
1397:
645:is a value provided for a
1336:10.14778/2168651.2168656
1291:10.1109/TKDE.2007.190745
1165:10.14778/1687627.1687690
1103:10.14778/2535568.2448943
1246:10.1145/1718487.1718504
1000:10.1145/3187009.3177739
938:10.1145/2806416.2806443
890:10.1145/2897350.2897352
840:Information Integration
1362:www.washingtonpost.com
1021:"Big Data Integration"
792:information extraction
695:
664:
639:
615:
381:Source trustworthiness
1205:10.1145/324133.324140
696:
665:
640:
616:
479:information retrieval
784:crowd/social sensing
674:
652:
627:
507:
427:Single-truth methods
83:have been detected.
720:Multi-truth methods
487:similarity measures
475:similarity measures
286:
218:
138:supervised learning
67:pipeline, when the
1192:Journal of the ACM
847:(data integration)
766:These methods use
741:Bayesian inference
739:These methods use
714:Bayesian inference
691:
690:
660:
659:
635:
634:
611:
499:Bayesian inference
497:These methods use
477:typically used in
284:
216:
87:General principles
609:
483:cosine similarity
378:
377:
282:
281:
271:George Washington
255:George Washington
239:George Washington
1388:
1366:
1365:
1354:
1348:
1347:
1329:
1309:
1303:
1302:
1274:
1268:
1267:
1239:
1224:
1218:
1217:
1207:
1183:
1177:
1176:
1145:
1134:
1133:
1121:
1115:
1114:
1096:
1072:
1051:
1050:
1040:
1013:
1004:
1003:
983:
970:
969:
949:
921:
902:
901:
873:
835:Data Integration
700:
698:
697:
692:
669:
667:
666:
661:
644:
642:
641:
636:
620:
618:
617:
612:
610:
608:
585:
541:
287:
283:
219:
215:
65:data integration
1396:
1395:
1391:
1390:
1389:
1387:
1386:
1385:
1371:
1370:
1369:
1356:
1355:
1351:
1310:
1306:
1275:
1271:
1256:
1237:
1225:
1221:
1184:
1180:
1146:
1137:
1122:
1118:
1073:
1054:
1014:
1007:
984:
973:
958:
922:
905:
874:
863:
859:
831:
776:
764:
737:
722:
675:
672:
671:
653:
650:
649:
628:
625:
624:
586:
542:
540:
508:
505:
504:
495:
471:
451:
443:Majority voting
440:
438:Majority voting
429:
383:
334:Stephen Hawking
312:Stephen Hawking
154:
120:majority voting
112:voting strategy
89:
47:majority voting
34:when different
22:(also known as
20:Truth discovery
17:
12:
11:
5:
1394:
1384:
1383:
1368:
1367:
1349:
1320:(6): 550–561.
1304:
1285:(6): 796–808.
1269:
1254:
1219:
1198:(5): 604–632.
1178:
1159:(1): 550–561.
1149:Dong, Xin Luna
1135:
1116:
1077:Dong, Xin Luna
1052:
1017:Dong, Xin Luna
1005:
994:(5): 635–647.
988:VLDB Endowment
971:
956:
903:
860:
858:
855:
854:
853:
848:
842:
837:
830:
827:
811:search engines
798:construction.
796:knowledge base
775:
772:
763:
760:
736:
735:Bayesian based
733:
721:
718:
689:
686:
683:
680:
658:
633:
607:
604:
601:
598:
595:
592:
589:
584:
581:
578:
575:
572:
569:
566:
563:
560:
557:
554:
551:
548:
545:
539:
536:
533:
530:
527:
524:
521:
518:
515:
512:
494:
493:Bayesian based
491:
470:
467:
450:
449:Web-link based
447:
439:
436:
428:
425:
382:
379:
376:
375:
372:
367:
362:
358:
357:
356:Partial truth
354:
349:
344:
340:
339:
338:Partial truth
336:
331:
326:
322:
321:
318:
309:
304:
300:
299:
297:
294:
291:
280:
279:
276:
273:
268:
264:
263:
260:
257:
252:
248:
247:
244:
241:
236:
232:
231:
229:
226:
223:
207:
206:
196:
189:
186:
179:
178:
171:
168:
153:
150:
88:
85:
15:
9:
6:
4:
3:
2:
1393:
1382:
1379:
1378:
1376:
1363:
1359:
1353:
1345:
1341:
1337:
1333:
1328:
1323:
1319:
1315:
1308:
1300:
1296:
1292:
1288:
1284:
1280:
1273:
1265:
1261:
1257:
1255:9781605588896
1251:
1247:
1243:
1236:
1235:
1230:
1223:
1215:
1211:
1206:
1201:
1197:
1193:
1189:
1182:
1174:
1170:
1166:
1162:
1158:
1154:
1150:
1144:
1142:
1140:
1131:
1127:
1120:
1112:
1108:
1104:
1100:
1095:
1090:
1087:(2): 97–108.
1086:
1082:
1078:
1071:
1069:
1067:
1065:
1063:
1061:
1059:
1057:
1048:
1044:
1039:
1034:
1030:
1026:
1022:
1018:
1012:
1010:
1001:
997:
993:
989:
982:
980:
978:
976:
967:
963:
959:
957:9781450337946
953:
948:
943:
939:
935:
931:
927:
920:
918:
916:
914:
912:
910:
908:
899:
895:
891:
887:
883:
879:
872:
870:
868:
866:
861:
852:
849:
846:
843:
841:
838:
836:
833:
832:
826:
824:
820:
816:
815:link analysis
812:
808:
804:
799:
797:
793:
790:aggregation,
789:
788:crowdsourcing
785:
781:
771:
769:
759:
756:
754:
750:
746:
742:
732:
729:
727:
717:
715:
711:
706:
704:
684:
678:
656:
648:
631:
621:
599:
593:
587:
579:
573:
570:
564:
561:
555:
549:
543:
537:
528:
522:
519:
516:
510:
502:
500:
490:
488:
484:
480:
476:
466:
464:
460:
456:
446:
444:
435:
432:
424:
420:
418:
413:
410:
406:
402:
399:
394:
392:
388:
373:
371:
370:J. K. Rowling
368:
366:
363:
360:
359:
355:
353:
352:Roger Penrose
350:
348:
345:
342:
341:
337:
335:
332:
330:
327:
324:
323:
319:
317:
316:Roger Penrose
313:
310:
308:
305:
302:
301:
298:
295:
292:
289:
288:
277:
274:
272:
269:
266:
265:
261:
258:
256:
253:
250:
249:
245:
242:
240:
237:
234:
233:
230:
227:
224:
221:
220:
214:
210:
205:
202:is not known
201:
197:
194:
190:
187:
184:
183:
182:
176:
172:
169:
166:
162:
161:
160:
157:
149:
147:
143:
139:
134:
132:
127:
125:
121:
117:
113:
108:
106:
102:
98:
94:
84:
82:
78:
74:
71:of different
70:
66:
61:
59:
54:
52:
48:
44:
39:
37:
33:
29:
25:
24:truth finding
21:
1361:
1352:
1317:
1313:
1307:
1282:
1278:
1272:
1233:
1222:
1195:
1191:
1181:
1156:
1152:
1129:
1119:
1084:
1080:
1031:(1): 1–198.
1028:
1024:
991:
987:
929:
881:
877:
851:Data Quality
800:
777:
774:Applications
765:
757:
745:data sources
738:
730:
723:
707:
622:
503:
496:
472:
452:
441:
433:
430:
421:
414:
400:
397:
395:
391:data sources
384:
211:
208:
203:
180:
158:
155:
135:
128:
109:
90:
73:data sources
62:
55:
51:data sources
40:
36:data sources
27:
23:
19:
18:
947:2440/110033
884:(2): 1–16.
845:Data Fusion
724:Due to its
409:data source
275:1734-10-23
259:1738-09-17
243:1732-02-22
173:values and
1132:: 841–848.
1094:1503.00303
1075:Li, Xian;
857:References
780:healthcare
726:complexity
485:(or other
374:Erroneous
278:Erroneous
262:Erroneous
228:Birthdate
124:data items
105:algorithms
75:have been
43:algorithms
28:true value
1381:Databases
1327:1203.0058
1299:1041-4347
1214:221584113
1047:2153-5418
803:web pages
749:data item
703:data item
679:ψ
647:data item
594:ψ
571:⋅
562:∣
550:ψ
523:ψ
520:∣
463:web links
461:based on
459:web pages
455:authority
405:algorithm
387:data item
204:a priori.
200:data item
193:data item
165:data item
136:Ideally,
116:data item
107:.
101:data item
81:data item
58:data item
32:data item
1375:Category
966:16207808
829:See also
823:accuracy
819:PageRank
753:accuracy
710:accuracy
417:accuracy
320:Correct
296:Authors
246:Correct
41:Several
1364:. 2015.
1344:8837716
1264:1761360
1173:9664056
1111:3133027
898:9060471
290:Source
222:Source
175:sources
146:sources
142:sources
131:sources
97:sources
77:unified
69:schemas
1342:
1297:
1262:
1252:
1212:
1171:
1109:
1045:
964:
954:
896:
807:ranked
623:where
401:priori
293:Title
30:for a
1340:S2CID
1322:arXiv
1260:S2CID
1238:(PDF)
1210:S2CID
1169:S2CID
1107:S2CID
1089:arXiv
962:S2CID
894:S2CID
817:like
225:Name
1295:ISSN
1250:ISBN
1043:ISSN
952:ISBN
805:are
794:and
670:and
1332:doi
1287:doi
1242:doi
1200:doi
1161:doi
1099:doi
1033:doi
996:doi
942:hdl
934:doi
886:doi
809:in
457:of
361:S4
343:S3
325:S2
303:S1
267:S3
251:S2
235:S1
93:web
1377::
1360:.
1338:.
1330:.
1316:.
1293:.
1283:20
1281:.
1258:.
1248:.
1231:.
1208:.
1196:46
1194:.
1190:.
1167:.
1155:.
1138:^
1128:.
1105:.
1097:.
1083:.
1055:^
1041:.
1027:.
1023:.
1008:^
992:11
990:.
974:^
960:.
950:.
940:.
928:.
906:^
892:.
882:17
880:.
864:^
786:,
782:,
705:.
314:,
126:.
53:.
1346:.
1334::
1324::
1318:5
1301:.
1289::
1266:.
1244::
1216:.
1202::
1175:.
1163::
1157:2
1113:.
1101::
1091::
1085:6
1049:.
1035::
1029:7
1002:.
998::
968:.
944::
936::
900:.
888::
688:)
685:o
682:(
657:o
632:v
606:)
603:)
600:o
597:(
591:(
588:P
583:)
580:v
577:(
574:P
568:)
565:v
559:)
556:o
553:(
547:(
544:P
538:=
535:)
532:)
529:o
526:(
517:v
514:(
511:P
398:a
167:;
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.