📜  查询处理中的管道传递

📅  最后修改于: 2020-12-13 05:19:56             🧑  作者: Mango

查询处理中的流水线

在前面的部分中,我们了解了实现,其中我们通过临时关系评估给定表达式中的多个运算。但是,这导致产生大量临时文件的缺点。这会使查询评估效率降低。但是,查询的评估应该在产生有效输出方面非常高效。

在这里,我们将讨论另一种评估表达式的多个运算的方法,该方法比实现更有效。这种更有效的方法称为管道。流水线通过减少大量临时文件的产生来帮助提高查询评估的效率。实际上,我们通过将多个操作合并到管道中来减少临时文件的构造。当前执行的一个操作的结果传递给下一个执行该操作的操作,链继续进行直到所有操作完成,然后我们得到表达式的最终输出。这种评估过程称为流水线评估

管道优势

创建操作流水线具有以下优点:

  • 与实现过程不同,它通过消除读写临时关系的成本来降低查询评估的成本。
  • 如果我们将查询评估计划的根运算符与输入一起组合在管道中,则生成查询结果的过程将很快。结果,对于用户来说是有利的,因为一旦生成输出,他们就可以查看其询问的查询的结果。否则,用户需要等待一段时间才能获得并查看任何查询结果。

流水线与实现

尽管两种方法都用于评估表达式的多个操作,但是它们之间几乎没有区别。下表描述了差异点:

Pipelining Materialization
It is a modern approach to evaluate multiple operations. It is a traditional approach to evaluate multiple operations.
It does not use any temporary relations for storing the results of the evaluated operations. It uses temporary relations for storing the results of the evaluated operations. So, it needs more temporary files and I/O.
It is a more efficient way of query evaluation as it quickly generates the results. It is less efficient as it takes time to generate the query results.
It requires memory buffers at a high rate for generating outputs. Insufficient memory buffers will cause thrashing. It does not have any higher requirements for memory buffers for query evaluation.
Poor performance if trashing occurs. No trashing occurs in materialization. Thus, in such cases, materialization is having better performance.
It optimizes the cost of query evaluation. As it does not include the cost of reading and writing the temporary storages. The overall cost includes the cost of operations plus the cost of reading and writing results on the temporary storage.