📜  Hash Join 和 Sort Merge Join 的区别

📅  最后修改于: 2021-09-09 10:14:46             🧑  作者: Mango

1. 哈希连接:
在连接运算符的情况下,它也被称为“go-to-guy”。这意味着如果没有其他连接是首选(可能是由于没有排序或索引等),则使用哈希连接。当要连接大型、未排序和非索引数据(驻留在表中)时,哈希连接是最佳算法。哈希连接算法由探测阶段和构建阶段组成。

在名为 R 和 S 的 2 个关系的情况下,哈希连接的算法如下:

Hash records of R, one by one, using A values
(Use same M buckets and same hash function h)
Hash matching pair of records into same bucket
End

2.排序合并连接:
Sort Merge Join顾名思义,join算法有2个阶段,即排序阶段和合并阶段。合并算法是最快的连接算法。这就是排序合并连接在排序关系的情况下最快的原因。假设需要合并2个排序关系R和S,算法如下:

If R is sorted on A, S is sorted on B do
Merge R and S to get join result
End

Hash Join 和 Sort Merge Join 的区别:

S.No. Hash Join Sort Merge Join
1. It is specifically used in case of joining of larger tables. It is usually used to join two independent sources of data represented in a table.
2. It has best performance in case of large and sorted and non-indexed inputs. It is better than hash join in case of performance in large tables.
3. Two phases in this are build and probe. It consists of 2 phases consisting sort operation and merge operation.
4. Steps involved are building a Hash table on a small table. It is used to probe hash value of Hash table is applicable for each element in second row. First row from first table and second row from table is taken, if it is not end then, selected rows are checked for merger. If they can be merged, merged row is returned else next rows are taken from tables and steps are repeated until rows are exhausted.
5. It is not as fast as sort merge join in case of sorted tables. It is fastest join operation in case of sorted tables. This is because it uses merge phase and sort phase, where, if sort is already previously done, then merge is fastest operation.
6. Its types are classic hash join, Grace hash join, hybrid hash join, hash anti join, hash semi-join, recursive hash join and hash bailout. It does not have further classifications.
7. This join is automatically selected in case there is no specific reason to adopt other types of join algorithms. It is also known as go-to guy of all join operators. It is not automatically selected.