Knowledge

Linear hashing

Source 📝

812:
their file state, a client calculates the address of a key and sends a request to that bucket. At the bucket, the request is checked and if the record is not at the bucket, it is forwarded. In a reasonably stable system, that is, if there is only one split or merge going on while the request is processed, it can be shown that there are at most two forwards. After a forward, the final bucket sends an Image Adjustment Message to the client whose state is now closer to the state of the distributed file. While forwards are reasonably rare for active clients, their number can be even further reduced by additional information exchange between servers and clients
47:. In LH*, each bucket resides at a different server. LH* itself has been expanded to provide data availability in the presence of failed buckets. Key based operations (inserts, deletes, updates, reads) in LH and LH* take maximum constant time independent of the number of buckets and hence of records. 55:
Records in LH or LH* consists of a key and a content, the latter basically all the other attributes of the record. They are stored in buckets. For example, in Ellis' implementation, a bucket is a linked list of records. The file allows the key based CRUD operations create or insert, read, update,
811:
The main contribution of LH* is to allow a client of an LH* file to find the bucket where the record resides even if the client does not know the file state. Clients in fact store their version of the file state, which is initially just the knowledge of the first bucket, namely Bucket 0. Based on
31:
The file structure of a dynamic hashing data structure adapts itself to changes in the size of the file, so expensive periodic file reorganization is avoided. A Linear Hashing file expands by splitting a predetermined bucket into two and shrinks by merging two predetermined buckets into one. The
726:
For example, if numerical records are inserted into the hash index according to their farthest right binary digits, the bucket corresponding with the appended bucket will be split. Thus, if we have the buckets labelled as 000, 001, 10, 11, 100, 101, we would split the bucket 10 because we are
27:
and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980. It has been analyzed by Baeza-Yates and Soza-Pollman. It is the first in a number of schemes known as dynamic hashing such as Larson's Linear Hashing with Partial Extensions, Linear Hashing with Priority
617:
designated bucket is split. Instead of using the load factor, this threshold can also be expressed as an occupancy percentage, in which case, the maximum number of records in the hash index equals (occupancy percentage)*(max records per non-overflowed bucket)*(number of buckets).
36:(i.e., the number of records divided by the number of buckets) moving outside of a predetermined range. In Linear Hashing there are two types of buckets, those that are to be split and those already split. While extendible hashing splits only overflowing buckets, 631:
occurs in some LH algorithm implementations if a controlled split causes the load factor to sink below a threshold. In this case, a merge operation would be triggered which would undo the last split, and reset the file state.
59:
The key distinction from schemes such as Fagin's extendible hashing is that as the file expands due to insertions, only one bucket is split at a time, and the order in which buckets are split is already predetermined.
56:
and delete as well as a scan operations that scans all records, for example to do a database select operation on a non-key attribute. Records are stored in buckets whose numbering starts with 0.
609:, which is monitored by the file, exceeds a predetermined threshold. If the hash index uses controlled splitting, the buckets are allowed to overflow by using linked overflow blocks. When the 464: 40:(a.k.a. spiral storage) distributes records unevenly over the buckets such that buckets with high costs of insertion, deletion, or retrieval are earliest in line for a split. 730:
When a bucket is split, split pointer and possibly the level are updated according to the following, such that the level is 0 when the linear hashing index only has 1 bucket.
950: 329: 102: 721: 269: 209: 688: 379: 236: 176: 149: 889: 490: 909: 863: 843: 661: 513: 399: 352: 289: 122: 1073: 1187:
Ruchte, Willard; Tharp, Alan (Feb 1987), "Linear hashing with Priority Splitting: A method for improving the retrieval performance of linear hashing",
1009: 625:
occurs when a split is performed whenever a bucket overflows, in which case that bucket would be split into two separate buckets.
1407: 727:
appending and creating the next sequential bucket 110. This would give us the buckets 000, 001, 010, 11, 100, 101, 110.
968:
algorithm used in linear hashing, and presented performance comparisons using a list of Icon benchmark applications.
404: 1514: 1092: 1509: 1202:
Manolopoulos, Yannis; Lorentzos, N. (1994), "Performance of linear hashing schemes for primary key retrieval",
606: 33: 1264:
Litwin, Witold; Neimat, Marie-Anne; Schneider, Donavan A. (1993), "LH: Linear Hashing for distributed files",
32:
trigger for a reconstruction depends on the flavor of the scheme; it could be an overflow at a bucket or
1551: 28:
Splitting, Linear Hashing with Partial Expansions and Priority Splitting, or Recursive Linear Hashing.
401:
that are used to segregate the buckets. This dynamic hash function can be expressed arithmetically as
981: 599:
Linear hashing algorithms may use only controlled splits or both controlled and uncontrolled splits.
495:
Complete the calculations below to determine the correct hashing function for the given hashing key
916: 298: 71: 1515:
A C++ Implementation of Linear Hashtable which Supports Both Filesystem and In-Memory storage
693: 241: 181: 980:, which in turn is used by many software systems, using a C implementation derived from the 666: 357: 214: 154: 127: 8: 868: 469: 1531: 1526: 1484: 1443: 1378: 1340: 1281: 1247: 1170: 1131: 1084: 1052: 894: 848: 828: 646: 498: 384: 337: 274: 107: 1403: 1320: 1285: 1215: 1174: 1488: 1464: 1358: 1251: 1056: 1556: 1476: 1447: 1435: 1382: 1370: 1344: 1332: 1273: 1237: 1211: 1160: 1135: 1121: 1088: 1042: 1357:
Fagin, Ronald; Nievergelt, Jurg; Pippenger, Nicholas; Strong, Raymond (Sep 1979),
640:
The index of the next bucket to be split is part of the file state and called the
663:. The split pointer corresponds to the first bucket that uses the hash function 1536: 37: 1465:"The Design and Implementation of Dynamic Hashing for Sets and Tables in Icon" 1439: 43:
Linear Hashing has also been made into a scalable distributed data structure,
1545: 1460: 965: 961: 1336: 1480: 1228:
Ramamohanarao, K.; Sacks-Davis, R. (Sep 1984), "Recursive linear hashing",
1374: 1277: 104:
returns the 0-based index of the bucket that contains the record with key
1126: 977: 1165: 1047: 960:
Griswold and Townsend discussed the adoption of linear hashing in the
211:
for both of those new buckets. At any time, at most two hash functions
24: 1242: 1033:
Ellis, Carla Schlatter (June 1987), "Concurrency in Linear Hashing",
1504: 984:
article and first published on the Usenet in 1988 by Esmond Pitt.
1321:"LH*RS - a highly-available scalable distributed data structure" 1112:
Enbody, Richard; Du, HC (June 1988), "Dynamic hashing schemes",
381:
corresponds to the number of rightmost binary digits of the key
1398:
Silberschatz, Abraham; Korth, Henry F.; Sudarshan, S. (2020).
1356: 466:. Note that when the total number of buckets is equal to one, 1359:"Extendible Hashing - A Fast Access Method for Dynamic Files" 438: 1397: 1010:"Linear hashing: A new tool for file and table addressing" 1402:(Seventh ed.). New York, NY: McGraw-Hill Education. 1319:
Litwin, Witold; Moussa, Rim; Schwarz, Thomas (Sep 2005),
1227: 1189:
IEEE Third International Conference on Data Engineering
1071: 1426:
Chabkinian, Juan; Schwarz, Thomas (2016), "Fast LH*",
407: 1263: 1201: 1151:
Larson, Per-Åke (April 1988), "Dynamic Hash Tables",
919: 897: 871: 851: 831: 696: 669: 649: 501: 472: 387: 360: 340: 301: 277: 244: 217: 184: 157: 130: 110: 74: 964:. They discussed the implementation alternatives of 1318: 1072:Baeza-Yates, Ricardo; Soza-Pollman, Hector (1998), 944: 903: 883: 857: 837: 715: 682: 655: 507: 484: 458: 393: 373: 346: 331:is also referred to as the dynamic hash function. 323: 283: 263: 230: 203: 170: 143: 116: 96: 1425: 151:is split into two new buckets, the hash function 23:) is a dynamic data structure which implements a 1543: 1459: 1510:An in Memory Go Implementation with Explanation 1505:TommyDS, C implementation of a Linear Hashtable 971: 955: 459:{\textstyle h_{i}(c)\mapsto (c{\bmod {2}}^{i})} 1428:International Journal of Parallel Programming 124:. When a bucket which uses the hash function 1017:Proc. 6th Conference on Very Large Databases 605:occurs if a split is performed whenever the 1186: 1241: 1164: 1125: 1046: 825:The file state consists of split pointer 820: 1111: 1314: 1028: 1026: 1544: 1312: 1310: 1308: 1306: 1304: 1302: 1300: 1298: 1296: 1294: 1150: 1007: 1003: 1001: 999: 997: 738:# s represents the split pointer index 523:# s represents the split pointer index 1421: 1419: 1393: 1391: 1350: 1180: 1032: 1363:ACM Transactions on Database Systems 1325:ACM Transactions on Database Systems 1146: 1144: 1107: 1105: 1074:"Analysis of Linear Hashing Revised" 1067: 1065: 1035:ACM Transactions on Database Systems 1023: 891:buckets, then the number of buckets 865:. If the original file started with 50: 1463:; Townsend, Gregg M. (April 1993), 1291: 1257: 994: 911:and the file state are related via 815: 13: 1416: 1388: 14: 1568: 1498: 1469:Software: Practice and Experience 1221: 1141: 1102: 1062: 63: 735:# l represents the current level 635: 594: 520:# l represents the current level 1453: 613:surpasses a set threshold, the 295:. The family of hash functions 1195: 978:Berkeley database system (BDB) 976:Linear hashing is used in the 453: 430: 427: 424: 418: 318: 312: 91: 85: 1: 1230:ACM Transactions on Databases 987: 1216:10.1016/0306-4379(94)90005-1 972:Adoption in database systems 956:Adoption in language systems 7: 1520: 1081:Nordic Journal of Computing 291:corresponds to the current 10: 1573: 1440:10.1007/s10766-015-0371-8 1153:Communications of the ACM 945:{\displaystyle n=2^{l}+s} 1400:Database system concepts 732: 517: 334:Typically, the value of 324:{\displaystyle h_{i}(c)} 97:{\displaystyle h_{i}(c)} 1337:10.1145/1093382.1093386 1008:Litwin, Witold (1980), 716:{\displaystyle h_{l+1}} 264:{\displaystyle h_{l+1}} 204:{\displaystyle h_{i+1}} 1481:10.1002/spe.4380230402 946: 905: 885: 859: 839: 821:File state calculation 806: 717: 684: 657: 509: 486: 460: 395: 375: 348: 325: 285: 265: 232: 205: 172: 145: 118: 98: 1375:10.1145/320083.320092 1278:10.1145/170036.170084 1114:ACM Computing Surveys 947: 906: 886: 860: 840: 718: 685: 683:{\displaystyle h_{l}} 658: 510: 487: 461: 396: 376: 374:{\displaystyle h_{i}} 349: 326: 286: 266: 233: 231:{\displaystyle h_{l}} 206: 173: 171:{\displaystyle h_{i}} 146: 144:{\displaystyle h_{i}} 119: 99: 1461:Griswold, William G. 1127:10.1145/46157.330532 917: 895: 869: 849: 829: 694: 667: 647: 603:Controlled splitting 499: 470: 405: 385: 358: 338: 299: 275: 271:are used; such that 242: 215: 182: 155: 128: 108: 72: 1204:Information Systems 1166:10.1145/42404.42410 1048:10.1145/22952.22954 884:{\displaystyle N=1} 485:{\displaystyle i=0} 1532:Consistent hashing 1527:Extendible hashing 942: 901: 881: 855: 835: 713: 680: 653: 623:uncontrolled split 505: 482: 456: 391: 371: 344: 321: 281: 261: 228: 201: 168: 141: 114: 94: 68:The hash function 1552:Search algorithms 1409:978-1-260-08450-4 1266:ACM SIGMOD Record 1243:10.1145/1270.1285 904:{\displaystyle n} 858:{\displaystyle l} 838:{\displaystyle s} 656:{\displaystyle s} 508:{\displaystyle c} 394:{\displaystyle c} 347:{\displaystyle i} 284:{\displaystyle l} 178:is replaced with 117:{\displaystyle c} 51:Algorithm details 1564: 1492: 1491: 1457: 1451: 1450: 1423: 1414: 1413: 1395: 1386: 1385: 1354: 1348: 1347: 1316: 1289: 1288: 1261: 1255: 1254: 1245: 1225: 1219: 1218: 1199: 1193: 1192: 1184: 1178: 1177: 1168: 1148: 1139: 1138: 1129: 1109: 1100: 1099: 1097: 1091:, archived from 1078: 1069: 1060: 1059: 1050: 1030: 1021: 1020: 1014: 1005: 951: 949: 948: 943: 935: 934: 910: 908: 907: 902: 890: 888: 887: 882: 864: 862: 861: 856: 844: 842: 841: 836: 816:Other properties 802: 799: 796: 793: 790: 787: 784: 781: 778: 775: 772: 769: 766: 763: 760: 757: 754: 751: 748: 745: 742: 739: 736: 722: 720: 719: 714: 712: 711: 689: 687: 686: 681: 679: 678: 662: 660: 659: 654: 629:File contraction 590: 587: 584: 581: 578: 575: 572: 569: 566: 563: 560: 557: 554: 551: 548: 545: 542: 539: 536: 533: 530: 527: 524: 521: 514: 512: 511: 506: 491: 489: 488: 483: 465: 463: 462: 457: 452: 451: 446: 445: 417: 416: 400: 398: 397: 392: 380: 378: 377: 372: 370: 369: 353: 351: 350: 345: 330: 328: 327: 322: 311: 310: 290: 288: 287: 282: 270: 268: 267: 262: 260: 259: 237: 235: 234: 229: 227: 226: 210: 208: 207: 202: 200: 199: 177: 175: 174: 169: 167: 166: 150: 148: 147: 142: 140: 139: 123: 121: 120: 115: 103: 101: 100: 95: 84: 83: 1572: 1571: 1567: 1566: 1565: 1563: 1562: 1561: 1542: 1541: 1523: 1501: 1496: 1495: 1458: 1454: 1424: 1417: 1410: 1396: 1389: 1355: 1351: 1317: 1292: 1262: 1258: 1226: 1222: 1200: 1196: 1185: 1181: 1149: 1142: 1110: 1103: 1095: 1076: 1070: 1063: 1031: 1024: 1012: 1006: 995: 990: 974: 958: 930: 926: 918: 915: 914: 896: 893: 892: 870: 867: 866: 850: 847: 846: 830: 827: 826: 823: 818: 809: 804: 803: 800: 797: 794: 791: 788: 785: 782: 779: 776: 773: 770: 767: 764: 761: 758: 755: 752: 749: 746: 743: 740: 737: 734: 701: 697: 695: 692: 691: 674: 670: 668: 665: 664: 648: 645: 644: 638: 615:split pointer's 597: 592: 591: 588: 585: 582: 579: 576: 573: 570: 567: 564: 561: 558: 555: 552: 549: 546: 543: 540: 537: 534: 531: 528: 525: 522: 519: 500: 497: 496: 471: 468: 467: 447: 441: 437: 436: 412: 408: 406: 403: 402: 386: 383: 382: 365: 361: 359: 356: 355: 339: 336: 335: 306: 302: 300: 297: 296: 276: 273: 272: 249: 245: 243: 240: 239: 222: 218: 216: 213: 212: 189: 185: 183: 180: 179: 162: 158: 156: 153: 152: 135: 131: 129: 126: 125: 109: 106: 105: 79: 75: 73: 70: 69: 66: 53: 12: 11: 5: 1570: 1560: 1559: 1554: 1540: 1539: 1537:Spiral Hashing 1534: 1529: 1522: 1519: 1518: 1517: 1512: 1507: 1500: 1499:External links 1497: 1494: 1493: 1475:(4): 351–367, 1452: 1434:(4): 709–734, 1415: 1408: 1387: 1369:(2): 315–344, 1349: 1331:(3): 769–811, 1290: 1272:(2): 327–336, 1256: 1236:(3): 369–391, 1220: 1210:(5): 433–446, 1194: 1179: 1159:(4): 446–457, 1140: 1101: 1061: 1041:(2): 195–217, 1022: 992: 991: 989: 986: 973: 970: 957: 954: 941: 938: 933: 929: 925: 922: 900: 880: 877: 874: 854: 834: 822: 819: 817: 814: 808: 805: 733: 710: 707: 704: 700: 677: 673: 652: 637: 634: 596: 593: 518: 504: 481: 478: 475: 455: 450: 444: 440: 435: 432: 429: 426: 423: 420: 415: 411: 390: 368: 364: 343: 320: 317: 314: 309: 305: 280: 258: 255: 252: 248: 225: 221: 198: 195: 192: 188: 165: 161: 138: 134: 113: 93: 90: 87: 82: 78: 65: 64:Hash functions 62: 52: 49: 38:spiral hashing 17:Linear hashing 9: 6: 4: 3: 2: 1569: 1558: 1555: 1553: 1550: 1549: 1547: 1538: 1535: 1533: 1530: 1528: 1525: 1524: 1516: 1513: 1511: 1508: 1506: 1503: 1502: 1490: 1486: 1482: 1478: 1474: 1470: 1466: 1462: 1456: 1449: 1445: 1441: 1437: 1433: 1429: 1422: 1420: 1411: 1405: 1401: 1394: 1392: 1384: 1380: 1376: 1372: 1368: 1364: 1360: 1353: 1346: 1342: 1338: 1334: 1330: 1326: 1322: 1315: 1313: 1311: 1309: 1307: 1305: 1303: 1301: 1299: 1297: 1295: 1287: 1283: 1279: 1275: 1271: 1267: 1260: 1253: 1249: 1244: 1239: 1235: 1231: 1224: 1217: 1213: 1209: 1205: 1198: 1190: 1183: 1176: 1172: 1167: 1162: 1158: 1154: 1147: 1145: 1137: 1133: 1128: 1123: 1120:(2): 85–113, 1119: 1115: 1108: 1106: 1098:on 2019-03-07 1094: 1090: 1086: 1082: 1075: 1068: 1066: 1058: 1054: 1049: 1044: 1040: 1036: 1029: 1027: 1018: 1011: 1004: 1002: 1000: 998: 993: 985: 983: 979: 969: 967: 966:dynamic array 963: 962:Icon language 953: 939: 936: 931: 927: 923: 920: 912: 898: 878: 875: 872: 852: 832: 813: 731: 728: 724: 708: 705: 702: 698: 675: 671: 650: 643: 642:split pointer 636:Split pointer 633: 630: 626: 624: 619: 616: 612: 608: 604: 600: 595:Split control 516: 502: 493: 479: 476: 473: 448: 442: 433: 421: 413: 409: 388: 366: 362: 341: 332: 315: 307: 303: 294: 278: 256: 253: 250: 246: 223: 219: 196: 193: 190: 186: 163: 159: 136: 132: 111: 88: 80: 76: 61: 57: 48: 46: 41: 39: 35: 29: 26: 22: 18: 1472: 1468: 1455: 1431: 1427: 1399: 1366: 1362: 1352: 1328: 1324: 1269: 1265: 1259: 1233: 1229: 1223: 1207: 1203: 1197: 1188: 1182: 1156: 1152: 1117: 1113: 1093:the original 1080: 1038: 1034: 1016: 975: 959: 913: 824: 810: 729: 725: 641: 639: 628: 627: 622: 620: 614: 610: 602: 601: 598: 494: 333: 292: 67: 58: 54: 44: 42: 30: 20: 16: 15: 690:instead of 611:load factor 607:load factor 34:load factor 1546:Categories 988:References 845:and level 25:hash table 1286:259938726 1175:207548097 1083:: 70–85, 1019:: 212–223 428:↦ 1521:See also 1489:11595927 1252:18577730 1057:14260177 1557:Hashing 1448:7448240 1383:2723596 1345:1802386 1136:1437123 1089:7497598 1487:  1446:  1406:  1381:  1343:  1284:  1250:  1173:  1134:  1087:  1055:  1485:S2CID 1444:S2CID 1379:S2CID 1341:S2CID 1282:S2CID 1248:S2CID 1191:: 2–9 1171:S2CID 1132:S2CID 1096:(PDF) 1085:S2CID 1077:(PDF) 1053:S2CID 1013:(PDF) 765:>= 293:level 1404:ISBN 982:CACM 553:< 238:and 1477:doi 1436:doi 1371:doi 1333:doi 1274:doi 1238:doi 1212:doi 1161:doi 1122:doi 1043:doi 807:LH* 621:An 532:h_l 439:mod 354:in 45:LH* 1548:: 1483:, 1473:23 1471:, 1467:, 1442:, 1432:44 1430:, 1418:^ 1390:^ 1377:, 1365:, 1361:, 1339:, 1329:30 1327:, 1323:, 1293:^ 1280:, 1270:22 1268:, 1246:, 1232:, 1208:19 1206:, 1169:, 1157:31 1155:, 1143:^ 1130:, 1118:20 1116:, 1104:^ 1079:, 1064:^ 1051:, 1039:12 1037:, 1025:^ 1015:, 996:^ 952:. 777:): 756:if 723:. 583:}( 568:h_ 559:): 544:if 492:. 21:LH 1479:: 1438:: 1412:. 1373:: 1367:4 1335:: 1276:: 1240:: 1234:9 1214:: 1163:: 1124:: 1045:: 940:s 937:+ 932:l 928:2 924:= 921:n 899:n 879:1 876:= 873:N 853:l 833:s 801:0 798:= 795:s 792:1 789:+ 786:l 783:= 780:l 774:l 771:^ 768:2 762:s 759:( 753:1 750:+ 747:s 744:= 741:s 709:1 706:+ 703:l 699:h 676:l 672:h 651:s 589:) 586:c 580:1 577:+ 574:l 571:{ 565:= 562:a 556:s 550:a 547:( 541:) 538:c 535:( 529:= 526:a 515:. 503:c 480:0 477:= 474:i 454:) 449:i 443:2 434:c 431:( 425:) 422:c 419:( 414:i 410:h 389:c 367:i 363:h 342:i 319:) 316:c 313:( 308:i 304:h 279:l 257:1 254:+ 251:l 247:h 224:l 220:h 197:1 194:+ 191:i 187:h 164:i 160:h 137:i 133:h 112:c 92:) 89:c 86:( 81:i 77:h 19:(

Index

hash table
load factor
spiral hashing
load factor
Icon language
dynamic array
Berkeley database system (BDB)
CACM




"Linear hashing: A new tool for file and table addressing"


doi
10.1145/22952.22954
S2CID
14260177


"Analysis of Linear Hashing Revised"
S2CID
7497598
the original


doi
10.1145/46157.330532
S2CID

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.