Knowledge

Data integrity

Source 📝

261:). To achieve data integrity, these rules are consistently and routinely applied to all data entering the system, and any relaxation of enforcement could cause errors in the data. Implementing checks on the data as close as possible to the source of input (such as human data entry), causes less erroneous data to enter the system. Strict enforcement of data integrity rules results in lower error rates, and time saved troubleshooting and tracing erroneous data and the errors it causes to algorithms. 480:(FINRA), noting data integrity problems with automated trading and money movement surveillance systems, stated it would make "the development of a data integrity program to monitor the accuracy of the submitted data" a priority. In early 2018, FINRA said it would expand its approach on data integrity to firms' "technology change management policies and procedures" and Treasury securities reviews. 65:, is failure of data integrity. If the changes are the result of unauthorized access, it may also be a failure of data security. Depending on the data involved this could manifest itself as benign as a single pixel in an image appearing a different color than was originally recorded, to the loss of vacation pictures or a business-critical database, to even catastrophic loss of human life in a 388:
so that no child record can exist without a parent (also called being orphaned) and that no parent loses their child records. It also ensures that no parent record can be deleted while the parent record owns any child records. All of this is handled at the database level and does not require coding integrity checks into each application.
322:. The referential integrity rule states that any foreign-key value can only be in one of two states. The usual state of affairs is that the foreign-key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the data owner, a foreign-key value can be 443:
and improving data integrity. If a corruption is detected that way and internal RAID mechanisms provided by those filesystems are also used, such filesystems can additionally reconstruct corrupted data in a transparent way. This approach allows improved data integrity protection covering the entire
387:
An example of a data-integrity mechanism is the parent-and-child relationship of related records. If a parent record owns one or more related child records all of the referential integrity processes are handled by the database itself, which automatically ensures the accuracy and integrity of the data
280:
or a predefined set of rules. An example being textual data entered where a date-time value is required. Rules for data derivation are also applicable, specifying how a data value is derived based on algorithm, contributors and conditions. It also specifies the conditions on how the data value could
332:
specifies that all columns in a relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of
461:
has created draft guidance on data integrity for the pharmaceutical manufacturers required to adhere to U.S. Code of Federal Regulations 21 CFR Parts 210–212. Outside the U.S., similar data integrity guidance has been issued by the United Kingdom (2015), Switzerland (2016), and Australia
26:. It is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context even under the same general umbrella of 241:
Physical and logical integrity often share many challenges such as human errors and design flaws, and both must appropriately deal with concurrent requests to record and retrieve data, the latter of which is entirely a subject on its own.
165:. These are used to maintain data integrity after manual transcription from one computer system to another by a human intermediary (e.g. credit card or bank routing numbers). Computer-induced transcription errors can be detected through 854: 379:), and it has become the de facto responsibility of the database to ensure data integrity. Companies, and indeed many database systems, offer products and services to migrate legacy systems to modern databases. 50:. The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a database correctly rejecting mutually exclusive possibilities). Moreover, upon later 943: 54:, ensure the data is the same as when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confused with 347:
for the data storage and retrieval. If a database does not support these features, it is the responsibility of the applications to ensure data integrity while the database supports the
865: 310:. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null. 483:
Other sectors such as mining and product manufacturing are increasingly focusing on the importance of data integrity in associated automation and production monitoring assets.
245:
If a data sector only has a logical error, it can be reused by overwriting it with new data. In case of a physical error, the affected data sector is permanently unusable.
222:
or correctly ignoring impossible sensor data in robotic systems. These concerns involve ensuring that the data "makes sense" given its environment. Challenges include
326:. In this case, we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown. 82:
Physical integrity deals with challenges which are associated with correctly storing and fetching the data itself. Challenges with physical integrity may include
1019:
Zafar, F.; Khan, A.; Malik, S.U.R.; et al. (2017). "A survey of cloud computing data integrity schemes: Design challenges, taxonomy and future trends".
936: 376: 61:
Any unintended changes to data as the result of a storage, retrieval or processing operation, including malicious intent, unexpected hardware failure, and
676: 486:
Cloud storage providers have long faced significant challenges ensuring the integrity or provenance of customer data and tracking violations.
827: 582: 652: 531: 801: 606: 506: 477: 968: 716: 172:
In production systems, these techniques are used together to ensure various degrees of data integrity. For example, a computer
339:
refers to a set of rules specified by a user, which do not belong to the entity, domain and referential integrity categories.
465:
Various standards for the manufacture of medical devices address data integrity either directly or indirectly, including
343:
If a database supports these features, it is the responsibility of the database to ensure data integrity as well as the
157:. Human-induced data integrity errors are often detected through the use of simpler checks and algorithms, such as the 264:
Data integrity also includes rules defining the relations a piece of data can have to other pieces of data, such as a
1003: 176:
may be configured on a fault-tolerant RAID array, but might not provide block-level checksums to detect and prevent
23: 914: 891: 1414: 293:
by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the
1122: 203: 571: 561:
What is Data Integrity? Learn How to Ensure Database Data Integrity via Checks, Tests, & Best Practices
458: 257:, specifying or guaranteeing the length of time data can be retained in a particular database (typically a 111: 741: 511: 143: 735: 445: 226:, design flaws, and human errors. Common methods of ensuring logical integrity include things such as 1409: 996:
Proceedings from the 2012 International Conference on Advances in Engineering, Science and Management
496: 757: 669: 1206: 1202: 361:
performance (all data integrity operations are performed in the same tier as the consistency model)
107: 1131: 645: 583:
From the book: Uberveillance and the Social Implications of Microchip Implants: Emerging Page 40
1233: 752: 560: 501: 440: 231: 177: 154: 794: 631: 539: 1404: 737: 314: 235: 211: 127: 184:
properties, but the RAID controller or hard disk drive's internal write cache might not be.
1363: 1059: 405: 66: 22:
is the maintenance of, and the assurance of, data accuracy and consistency over its entire
855:"Data Integrity: A perspective from the medical device regulatory and standards framework" 8: 1303: 1288: 1216: 276:. Data integrity often includes checks and correction for invalid data, based on a fixed 258: 219: 1063: 1115: 1082: 1047: 762: 595: 364:
re-usability (all applications benefit from a single centralized data integrity system)
119: 99: 87: 1293: 1283: 1147: 1087: 999: 413: 348: 344: 354:
Having a single, well-controlled, and well-defined data-integrity system increases:
153:
Physical integrity often makes extensive use of error detecting algorithms known as
1243: 1192: 1177: 1157: 1142: 1077: 1067: 1028: 770: 766: 401: 302: 294: 227: 215: 83: 1373: 1308: 1298: 1268: 1211: 1182: 1172: 1072: 1048:"Provenance based data integrity checking and verification in cloud environments" 290: 277: 47: 35: 1383: 1378: 1343: 1323: 1318: 1273: 1167: 1032: 367:
maintainability (one centralized system for all data integrity administration).
254: 180:. As another example, a database management system might be compliant with the 166: 162: 158: 147: 51: 333:
values from which actual values appearing in the columns of a table are drawn.
1398: 1348: 1338: 1313: 1187: 1152: 1108: 969:"Industry 4.0 and Cyber-Physical Systems Raise the Data Integrity Imperative" 223: 210:
of a piece of data, given a particular context. This includes topics such as
197: 55: 994:
Priyadharshini, B.; Parvathi, P. (2012). "Data integrity in cloud storage".
774: 1358: 1353: 1333: 1328: 1258: 1253: 1228: 1221: 1197: 1091: 139: 95: 31: 1278: 1238: 424:
solutions provide sufficient protection against data integrity problems.
358:
stability (one centralized system performs all data integrity operations)
319: 307: 207: 173: 62: 397: 323: 123: 421: 1368: 1263: 470: 466: 98:, natural disasters, and other special environmental hazards such as 91: 27: 1162: 937:"Data Integrity: Enabling Effective Decisions in Mining Operations" 532:"IS Practitioners' Views on Core Concepts of Information Integrity" 436: 372: 131: 451: 103: 795:"Data Integrity and Compliance with CGMP: Guidance for Industry" 297:: entity integrity, referential integrity and domain integrity. 1045: 58:, the discipline of protecting data from unauthorized parties. 1018: 993: 742:"End-to-end data integrity for file systems: a ZFS case study" 864:. Parenteral Drug Association. pp. 10–57. Archived from 428: 193: 417: 181: 115: 138:, storage arrays that compute parity calculations such as 1046:
Imran, M.; Hlavacs, H.; Haq, I.U.I.; et al. (2017).
432: 409: 135: 693: 1100: 536:
International Journal of Accounting Information Systems
106:. Ensuring physical integrity includes methods such as 396:
Various research results show that neither widespread
646:"An Analysis of Data Corruption in the Storage Stack" 987: 377:
Comparison of relational database management systems
915:"2018 Regulatory and Examination Priorities Letter" 892:"2017 Regulatory and Examination Priorities Letter" 593: 929: 749:USENIX Conference on File and Storage Technologies 1039: 800:. U.S. Food and Drug Administration. April 2016. 284: 1396: 715:Bierman, Margaret; Grimmer, Lenz (August 2012). 670:"Impact of Disk Corruption on Open-Source DBMS" 452:Data integrity as applied to various industries 961: 819: 717:"How I Use the Advanced Capabilities of Btrfs" 714: 16:Maintenance of data over its entire life-cycle 1116: 202:This type of integrity is concerned with the 130:, using file systems that employ block level 1012: 825: 906: 883: 852: 572:What is Data Integrity? Data Protection 101 1123: 1109: 828:"Data Integrity Guidance Around the World" 268:record being allowed to link to purchased 30:. It is at times used as a proxy term for 1081: 1071: 917:. Financial Industry Regulatory Authority 894:. Financial Industry Regulatory Authority 846: 787: 756: 729: 603:Doctor of Philosophy in Computer Sciences 289:Data integrity is normally enforced in a 912: 889: 439:checksumming that is used for detecting 507:National Information Assurance Glossary 478:Financial Industry Regulatory Authority 253:Data integrity contains guidelines for 1397: 444:data paths, which is usually known as 102:, extreme temperatures, pressures and 38:is a prerequisite for data integrity. 1104: 77: 351:for the data storage and retrieval. 272:, but not to unrelated data such as 238:, and other run-time sanity checks. 187: 605:. University of Wisconsin-Madison. 13: 740:; Remzi H. Arpaci-Dusseau (2010). 529: 72: 46:Data integrity is the opposite of 14: 1426: 632:"Parity Lost and Parity Regained" 736:Yupu Zhang; Abhishek Rajimwale; 682:from the original on 2022-10-09. 658:from the original on 2022-10-09. 949:from the original on 2022-10-09 807:from the original on 2022-10-09 708: 612:from the original on 2022-10-09 391: 86:faults, design flaws, material 686: 662: 638: 624: 587: 576: 565: 554: 523: 285:Types of integrity constraints 1: 826:Davidson, J. (18 July 2017). 517: 41: 1073:10.1371/journal.pone.0177576 975:. Nymi, Inc. 24 October 2017 594:Vijayan Prabhakaran (2006). 459:Food and Drug Administration 435:) provide internal data and 427:Some filesystems (including 375:support these features (see 248: 112:uninterruptible power supply 7: 913:Cook, R. (8 January 2018). 890:Cook, R. (4 January 2017). 853:Scannel, P. (12 May 2015). 512:Single version of the truth 490: 382: 144:cryptographic hash function 10: 1431: 1033:10.1016/j.cose.2016.10.006 538:. Elsevier. Archived from 446:end-to-end data protection 318:concerns the concept of a 306:concerns the concept of a 191: 1138: 497:End-to-end data integrity 1130: 1021:Computers & Security 150:on critical subsystems. 232:foreign key constraints 124:error-correcting memory 1415:Transaction processing 862:Data Integrity Seminar 502:Message authentication 441:silent data corruption 337:User-defined integrity 178:silent data corruption 155:error-correcting codes 738:Andrea Arpaci-Dusseau 315:Referential integrity 295:relational data model 212:referential integrity 128:clustered file system 1304:Protection (privacy) 67:life-critical system 1064:2017PLoSO..1277576I 942:. Accenture. 2016. 596:"IRON FILE SYSTEMS" 476:In early 2017, the 259:relational database 220:relational database 114:, certain types of 871:on 20 January 2018 146:and even having a 120:radiation hardened 100:ionizing radiation 78:Physical integrity 1392: 1391: 1384:Wrangling/munging 1234:Format management 542:on 5 October 2011 349:consistency model 345:consistency model 228:check constraints 188:Logical integrity 84:electromechanical 1422: 1410:Relational model 1125: 1118: 1111: 1102: 1101: 1096: 1095: 1085: 1075: 1043: 1037: 1036: 1016: 1010: 1009: 991: 985: 984: 982: 980: 965: 959: 958: 956: 954: 948: 941: 933: 927: 926: 924: 922: 910: 904: 903: 901: 899: 887: 881: 880: 878: 876: 870: 859: 850: 844: 843: 841: 839: 823: 817: 816: 814: 812: 806: 799: 791: 785: 784: 782: 781: 760: 746: 733: 727: 726: 724: 723: 712: 706: 705: 703: 701: 690: 684: 683: 681: 674: 666: 660: 659: 657: 650: 642: 636: 635: 628: 622: 621: 619: 617: 611: 600: 591: 585: 580: 574: 569: 563: 558: 552: 551: 549: 547: 527: 330:Domain integrity 303:Entity integrity 274:Corporate Assets 216:entity integrity 1430: 1429: 1425: 1424: 1423: 1421: 1420: 1419: 1395: 1394: 1393: 1388: 1364:Synchronization 1134: 1129: 1099: 1058:(5): e0177576. 1044: 1040: 1017: 1013: 1006: 992: 988: 978: 976: 967: 966: 962: 952: 950: 946: 939: 935: 934: 930: 920: 918: 911: 907: 897: 895: 888: 884: 874: 872: 868: 857: 851: 847: 837: 835: 832:Contract Pharma 824: 820: 810: 808: 804: 797: 793: 792: 788: 779: 777: 758:10.1.1.154.3979 744: 734: 730: 721: 719: 713: 709: 699: 697: 692: 691: 687: 679: 672: 668: 667: 663: 655: 648: 644: 643: 639: 630: 629: 625: 615: 613: 609: 598: 592: 588: 581: 577: 570: 566: 559: 555: 545: 543: 528: 524: 520: 493: 473:, and ISO 5840. 454: 394: 385: 291:database system 287: 281:be re-derived. 251: 200: 190: 80: 75: 73:Integrity types 48:data corruption 44: 36:data validation 17: 12: 11: 5: 1428: 1418: 1417: 1412: 1407: 1390: 1389: 1387: 1386: 1381: 1376: 1371: 1366: 1361: 1356: 1351: 1346: 1341: 1336: 1331: 1326: 1321: 1316: 1311: 1306: 1301: 1296: 1291: 1289:Pre-processing 1286: 1281: 1276: 1271: 1266: 1261: 1256: 1251: 1246: 1241: 1236: 1231: 1226: 1225: 1224: 1219: 1214: 1200: 1195: 1190: 1185: 1180: 1175: 1170: 1165: 1160: 1155: 1150: 1145: 1139: 1136: 1135: 1128: 1127: 1120: 1113: 1105: 1098: 1097: 1038: 1011: 1004: 986: 960: 928: 905: 882: 845: 834:. Rodman Media 818: 786: 728: 707: 685: 661: 637: 623: 586: 575: 564: 553: 521: 519: 516: 515: 514: 509: 504: 499: 492: 489: 488: 487: 484: 481: 474: 463: 453: 450: 393: 390: 384: 381: 369: 368: 365: 362: 359: 341: 340: 334: 327: 311: 286: 283: 255:data retention 250: 247: 189: 186: 167:hash functions 163:Luhn algorithm 159:Damm algorithm 148:watchdog timer 79: 76: 74: 71: 43: 40: 20:Data integrity 15: 9: 6: 4: 3: 2: 1427: 1416: 1413: 1411: 1408: 1406: 1403: 1402: 1400: 1385: 1382: 1380: 1377: 1375: 1372: 1370: 1367: 1365: 1362: 1360: 1357: 1355: 1352: 1350: 1347: 1345: 1342: 1340: 1337: 1335: 1332: 1330: 1327: 1325: 1322: 1320: 1317: 1315: 1312: 1310: 1307: 1305: 1302: 1300: 1297: 1295: 1292: 1290: 1287: 1285: 1282: 1280: 1277: 1275: 1272: 1270: 1267: 1265: 1262: 1260: 1257: 1255: 1252: 1250: 1247: 1245: 1242: 1240: 1237: 1235: 1232: 1230: 1227: 1223: 1220: 1218: 1215: 1213: 1210: 1209: 1208: 1204: 1201: 1199: 1196: 1194: 1191: 1189: 1186: 1184: 1181: 1179: 1176: 1174: 1171: 1169: 1166: 1164: 1161: 1159: 1156: 1154: 1151: 1149: 1146: 1144: 1141: 1140: 1137: 1133: 1126: 1121: 1119: 1114: 1112: 1107: 1106: 1103: 1093: 1089: 1084: 1079: 1074: 1069: 1065: 1061: 1057: 1053: 1049: 1042: 1034: 1030: 1026: 1022: 1015: 1007: 1005:9788190904223 1001: 997: 990: 974: 970: 964: 945: 938: 932: 916: 909: 893: 886: 867: 863: 856: 849: 833: 829: 822: 803: 796: 790: 776: 772: 768: 764: 759: 754: 750: 743: 739: 732: 718: 711: 695: 689: 678: 671: 665: 654: 647: 641: 633: 627: 608: 604: 597: 590: 584: 579: 573: 568: 562: 557: 541: 537: 533: 526: 522: 513: 510: 508: 505: 503: 500: 498: 495: 494: 485: 482: 479: 475: 472: 468: 464: 460: 456: 455: 449: 447: 442: 438: 434: 430: 425: 423: 422:hardware RAID 419: 415: 411: 407: 403: 399: 389: 380: 378: 374: 366: 363: 360: 357: 356: 355: 352: 350: 346: 338: 335: 331: 328: 325: 321: 317: 316: 312: 309: 305: 304: 300: 299: 298: 296: 292: 282: 279: 275: 271: 267: 262: 260: 256: 246: 243: 239: 237: 233: 229: 225: 224:software bugs 221: 217: 213: 209: 205: 199: 198:Copy-on-write 195: 185: 183: 179: 175: 170: 168: 164: 160: 156: 151: 149: 145: 141: 137: 133: 129: 125: 121: 117: 113: 110:hardware, an 109: 105: 101: 97: 96:power outages 93: 89: 85: 70: 68: 64: 59: 57: 56:data security 53: 49: 39: 37: 33: 29: 25: 21: 1405:Data quality 1294:Preservation 1284:Philanthropy 1248: 1148:Augmentation 1055: 1051: 1041: 1027:(3): 29–49. 1024: 1020: 1014: 995: 989: 977:. Retrieved 972: 963: 951:. Retrieved 931: 919:. Retrieved 908: 896:. Retrieved 885: 873:. Retrieved 866:the original 861: 848: 836:. Retrieved 831: 821: 809:. Retrieved 789: 778:. Retrieved 748: 731: 720:. Retrieved 710: 698:. Retrieved 688: 664: 640: 626: 614:. Retrieved 602: 589: 578: 567: 556: 544:. Retrieved 540:the original 535: 525: 426: 395: 392:File systems 386: 370: 353: 342: 336: 329: 313: 301: 288: 273: 269: 265: 263: 252: 244: 240: 201: 171: 152: 140:exclusive or 81: 60: 45: 32:data quality 19: 18: 1354:Stewardship 1244:Integration 1193:Degradation 1178:Compression 1158:Archaeology 1143:Acquisition 696:. Baarf.com 694:"Baarf.com" 530:Boritz, J. 400:(including 398:filesystems 320:foreign key 308:primary key 208:rationality 204:correctness 174:file system 126:, use of a 63:human error 1399:Categories 1374:Validation 1309:Publishing 1299:Processing 1269:Management 1183:Corruption 1173:Collection 979:20 January 953:20 January 921:20 January 898:20 January 875:20 January 838:20 January 811:20 January 780:2014-01-02 775:Q111972797 722:2014-01-02 700:4 November 518:References 236:assertions 234:, program 192:See also: 42:Definition 24:life-cycle 1379:Warehouse 1344:Scrubbing 1324:Retention 1319:Reduction 1274:Migration 1249:Integrity 1217:Transform 1168:Cleansing 973:Nymi Blog 753:CiteSeerX 546:12 August 471:ISO 14155 467:ISO 13485 457:The U.S. 373:databases 249:Databases 142:or use a 132:checksums 108:redundant 92:corrosion 52:retrieval 28:computing 1349:Security 1339:Scraping 1314:Recovery 1188:Curation 1153:Analysis 1092:28545151 1052:PLOS ONE 944:Archived 802:Archived 771:Wikidata 677:Archived 653:Archived 607:Archived 491:See also 437:metadata 383:Examples 270:Products 266:Customer 134:such as 118:arrays, 104:g-forces 34:, while 1359:Storage 1334:Science 1329:Quality 1259:Lineage 1254:Library 1229:Farming 1212:Extract 1198:Editing 1083:5435237 1060:Bibcode 767:5722163 462:(2017). 371:Modern 122:chips, 88:fatigue 1279:Mining 1239:Fusion 1090:  1080:  1002:  773:  765:  755:  616:9 June 420:) nor 278:schema 947:(PDF) 940:(PDF) 869:(PDF) 858:(PDF) 805:(PDF) 798:(PDF) 763:S2CID 745:(PDF) 680:(PDF) 673:(PDF) 656:(PDF) 649:(PDF) 610:(PDF) 599:(PDF) 429:Btrfs 218:in a 194:Mutex 1369:Type 1264:Loss 1222:Load 1132:Data 1088:PMID 1000:ISBN 981:2018 955:2018 923:2018 900:2018 877:2018 840:2018 813:2018 702:2011 618:2012 548:2011 431:and 418:NTFS 416:and 324:null 214:and 196:and 182:ACID 116:RAID 1207:ELT 1203:ETL 1163:Big 1078:PMC 1068:doi 1029:doi 433:ZFS 414:JFS 410:XFS 406:Ext 402:UFS 206:or 161:or 136:ZFS 1401:: 1086:. 1076:. 1066:. 1056:12 1054:. 1050:. 1025:65 1023:. 998:. 971:. 860:. 830:. 769:. 761:. 751:. 747:. 675:. 651:. 601:. 534:. 469:, 448:. 412:, 408:, 404:, 230:, 169:. 94:, 90:, 69:. 1205:/ 1124:e 1117:t 1110:v 1094:. 1070:: 1062:: 1035:. 1031:: 1008:. 983:. 957:. 925:. 902:. 879:. 842:. 815:. 783:. 725:. 704:. 634:. 620:. 550:.

Index

life-cycle
computing
data quality
data validation
data corruption
retrieval
data security
human error
life-critical system
electromechanical
fatigue
corrosion
power outages
ionizing radiation
g-forces
redundant
uninterruptible power supply
RAID
radiation hardened
error-correcting memory
clustered file system
checksums
ZFS
exclusive or
cryptographic hash function
watchdog timer
error-correcting codes
Damm algorithm
Luhn algorithm
hash functions

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.