Knowledge

Hash join

Source 📝

339:
via a hash function, and writing these partitions out to disk. The algorithm then loads pairs of partitions into memory, builds a hash table for the smaller partitioned relation, and probes the other relation for matches with the current hash table. Because the partitions were formed by hashing on
352:
The hybrid hash join algorithm is a combination of the classical hash join and grace hash join. It uses minimal amount of memory for partitioning like in grace hash join and uses the remaining memory to initialize a classical hash join during partitioning phase. During the partitioning phase, the
447:
Note that this algorithm is memory-sensitive, because there are two competing demands for memory (the hash table for partition 0, and the output buffers for the remaining partitions). Choosing too large a hash table for partition 0 might cause the algorithm to recurse because one of the non-zero
343:
It is possible that one or more of the partitions still does not fit into the available memory, in which case the algorithm is recursively applied: an additional orthogonal hash function is chosen to hash the large partition into sub-partitions, which are then processed as before. Since this is
456:
Hash joins can also be evaluated for an anti-join predicate (a predicate selecting values from one table when no related values are found in the other). Depending on the sizes of the tables, different algorithms can be applied:
532:
Hash semi-join is used to return the records found in the other table. Unlike the plain join, it returns each matching record from the leading table only once, regardless of how many matches there are in the
99:
This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows:
38:
from the tuples of one or both of the joined relations, and subsequently probing those tables so that only tuples with the same hash code need to be compared for equality in equijoins.
444:
fits nearly fully into memory hybrid hash join has a similar behavior like the classical hash join which is more beneficial. Otherwise hybrid hash join imitates grace hash join.
442: 418: 395: 375: 337: 317: 297: 266: 238: 212: 189: 163: 141: 121: 69:
using the contents of one relation, ideally whichever one is smaller after applying local predicates. This relation is called the build side of the join. The
344:
expensive, the algorithm tries to reduce the chance that it will occur by forming the smallest partitions possible during the initial partitioning phase.
80:
is built, scan the other relation (the probe side). For each row of the probe relation, find the relevant rows from the build relation by looking in the
648:
DeWitt, D.J.; Katz, R.; Olken, F.; Shapiro, L.; Stonebraker, M.; Wood, D. (June 1984). "Implementation techniques for main memory database systems".
96:. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input. 73:
entries are mappings from the value of the (composite) join attribute to the remaining attributes of that row (whichever ones are needed).
49:
comparing records from one table with those from the other table using a conjunction of equality operators '=' on one or more columns).
41:
Hash joins are typically more efficient than nested loops joins, except when the probe side of the join is very small. They require an
424:
Because partition 0 is never written to disk, hybrid hash join typically performs fewer I/O operations than grace hash join. When
276:
A better approach is known as the "grace hash join", after the GRACE database machine for which it was first implemented.
683: 560:
The records are returned right after they produced a hit. The actual records from the hash table are ignored.
679: 477:
Scan the other table, selecting any rows where the join attribute hashes to an empty entry in the hash table.
31: 726: 340:
the join key, it must be the case that any join output tuples must belong to the same partition.
8: 619: 46: 28: 427: 403: 380: 360: 322: 302: 282: 251: 223: 197: 174: 148: 126: 106: 245: 721: 657: 624: 58: 629: 591:
table, returning the corresponding records from the hash table and removing them.
509:
table, removing the corresponding records from the hash table on each hash hit.
715: 599:
table) can only be returned once, since it is removed after being returned.
661: 694: 467: 81: 77: 70: 66: 35: 24: 42: 647: 556:
Scan the other table, returning any rows that produce a hash hit.
168:
If the size of the hash table equals the maximum in-memory size:
595:
With this algorithm, each record from the hash table (that is,
540:
As with the anti-join, semi-join can also be left and right:
353:
hybrid hash join uses the available memory for two purposes:
684:"An Adaptive Hash Join Algorithm for Multiuser Environments" 194:
Reset the hash table, and continue scanning the build input
240:
and add the resulting join tuples to the output relation
34:. All variants of hash join algorithms involve building 677: 430: 406: 383: 363: 325: 305: 285: 254: 226: 200: 191:, and add matching join tuples to the output relation 177: 151: 129: 109: 436: 412: 389: 369: 331: 311: 291: 260: 232: 206: 183: 157: 135: 115: 713: 512:Return everything that left in the hash table. 448:partitions is too large to fit into memory. 279:This algorithm avoids rescanning the entire 248:join algorithm. This algorithm may scan 691:Proceedings of the 16th VLDB conference 574: 492: 57:The classic hash join algorithm for an 27:and is used in the implementation of a 714: 543: 460: 88:The first phase is usually called the 61:of two relations proceeds as follows: 16:Algorithm used in relational databases 299:relation by first partitioning both 244:This is essentially the same as the 52: 693:. Brisbane: 186–197. Archived from 347: 220:Do a final scan of the probe input 13: 271: 14: 738: 671: 527: 451: 420:in-memory, known as "partition 0" 400:To hold an entire partition from 92:, while the second is called the 602:This is more efficient when the 563:This is more efficient when the 516:This is more efficient when the 481:This is more efficient when the 641: 1: 635: 580:Prepare a hash table for the 549:Prepare a hash table for the 498:Prepare a hash table for the 357:To partition both relations 7: 613: 268:more times than necessary. 165:to the in-memory hash table 10: 743: 567:table is smaller than the 485:table is smaller than the 32:database management system 606:table is larger than the 520:table is larger than the 438: 414: 391: 371: 333: 313: 293: 262: 234: 208: 185: 159: 137: 117: 662:10.1145/971697.602261 650:Proc. ACM SIGMOD Conf 439: 415: 392: 372: 334: 314: 294: 263: 235: 209: 186: 171:Scan the probe input 160: 138: 118: 575:Hash right semi-join 493:Hash right anti-join 428: 404: 381: 361: 323: 303: 283: 252: 224: 198: 175: 149: 127: 107: 620:Symmetric hash join 544:Hash left semi-join 461:Hash left anti-join 123:in the build input 23:is an example of a 434: 410: 387: 367: 329: 309: 289: 258: 230: 204: 181: 155: 133: 113: 678:Hansjörg Zeller; 584:side of the join. 553:side of the join. 502:side of the join. 474:side of the join. 437:{\displaystyle R} 413:{\displaystyle R} 390:{\displaystyle S} 370:{\displaystyle R} 332:{\displaystyle S} 312:{\displaystyle R} 292:{\displaystyle S} 261:{\displaystyle S} 246:block nested loop 233:{\displaystyle S} 207:{\displaystyle R} 184:{\displaystyle S} 158:{\displaystyle r} 136:{\displaystyle R} 116:{\displaystyle r} 65:First, prepare a 53:Classic hash join 734: 708: 706: 705: 699: 688: 666: 665: 645: 625:Nested loop join 443: 441: 440: 435: 419: 417: 416: 411: 396: 394: 393: 388: 376: 374: 373: 368: 348:Hybrid hash join 338: 336: 335: 330: 318: 316: 315: 310: 298: 296: 295: 290: 267: 265: 264: 259: 239: 237: 236: 231: 213: 211: 210: 205: 190: 188: 187: 182: 164: 162: 161: 156: 142: 140: 139: 134: 122: 120: 119: 114: 742: 741: 737: 736: 735: 733: 732: 731: 727:Join algorithms 712: 711: 703: 701: 697: 686: 674: 669: 646: 642: 638: 630:Sort-merge join 616: 577: 546: 530: 495: 463: 454: 429: 426: 425: 405: 402: 401: 382: 379: 378: 362: 359: 358: 350: 324: 321: 320: 304: 301: 300: 284: 281: 280: 274: 272:Grace hash join 253: 250: 249: 225: 222: 221: 199: 196: 195: 176: 173: 172: 150: 147: 146: 128: 125: 124: 108: 105: 104: 103:For each tuple 55: 17: 12: 11: 5: 740: 730: 729: 724: 710: 709: 673: 672:External links 670: 668: 667: 639: 637: 634: 633: 632: 627: 622: 615: 612: 593: 592: 585: 576: 573: 558: 557: 554: 545: 542: 529: 528:Hash semi-join 526: 514: 513: 510: 503: 494: 491: 479: 478: 475: 462: 459: 453: 452:Hash anti-join 450: 433: 422: 421: 409: 398: 386: 366: 349: 346: 328: 308: 288: 273: 270: 257: 242: 241: 229: 218: 217: 216: 215: 214: 203: 192: 180: 166: 154: 132: 112: 86: 85: 74: 54: 51: 25:join algorithm 15: 9: 6: 4: 3: 2: 739: 728: 725: 723: 720: 719: 717: 700:on 2012-03-11 696: 692: 685: 681: 676: 675: 663: 659: 655: 651: 644: 640: 631: 628: 626: 623: 621: 618: 617: 611: 609: 605: 600: 598: 590: 586: 583: 579: 578: 572: 570: 566: 561: 555: 552: 548: 547: 541: 538: 536: 525: 523: 519: 511: 508: 504: 501: 497: 496: 490: 488: 484: 476: 473: 469: 465: 464: 458: 449: 445: 431: 407: 399: 384: 364: 356: 355: 354: 345: 341: 326: 306: 286: 277: 269: 255: 247: 227: 219: 201: 193: 178: 170: 169: 167: 152: 144: 143: 130: 110: 102: 101: 100: 97: 95: 94:"probe" phase 91: 90:"build" phase 83: 79: 75: 72: 68: 64: 63: 62: 60: 50: 48: 45:predicate (a 44: 39: 37: 33: 30: 26: 22: 702:. Retrieved 695:the original 690: 653: 649: 643: 607: 603: 601: 596: 594: 588: 581: 568: 564: 562: 559: 550: 539: 534: 531: 521: 517: 515: 506: 499: 486: 482: 480: 471: 455: 446: 423: 351: 342: 278: 275: 243: 98: 93: 89: 87: 56: 40: 20: 18: 36:hash tables 716:Categories 704:2008-09-21 656:(4): 1–8. 636:References 468:hash table 466:Prepare a 82:hash table 78:hash table 71:hash table 67:hash table 59:inner join 29:relational 587:Scan the 505:Scan the 76:Once the 47:predicate 21:hash join 682:(1990). 680:Jim Gray 614:See also 470:for the 43:equijoin 722:Hashing 610:table. 571:table. 537:table. 524:table. 489:table. 518:NOT IN 507:NOT IN 483:NOT IN 472:NOT IN 698:(PDF) 687:(PDF) 608:FROM 597:FROM 582:FROM 569:FROM 522:FROM 500:FROM 487:FROM 377:and 319:and 145:Add 19:The 658:doi 397:and 718:: 689:. 654:14 652:. 604:IN 589:IN 565:IN 551:IN 535:IN 707:. 664:. 660:: 432:R 408:R 385:S 365:R 327:S 307:R 287:S 256:S 228:S 202:R 179:S 153:r 131:R 111:r 84:.

Index

join algorithm
relational
database management system
hash tables
equijoin
predicate
inner join
hash table
hash table
hash table
hash table
block nested loop
hash table
Symmetric hash join
Nested loop join
Sort-merge join
doi
10.1145/971697.602261
Jim Gray
"An Adaptive Hash Join Algorithm for Multiuser Environments"
the original
Categories
Hashing
Join algorithms

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.