📜  Hadoop 和Hive 的区别

📅  最后修改于: 2021-10-27 06:32:25             🧑  作者: Mango

Hadoop: Hadoop 是一种框架或软件,它被发明来管理大量数据或大数据。 Hadoop 用于存储和处理分布在商品服务器集群中的大数据。 Hadoop 使用 Hadoop 分布式文件系统存储数据,并使用 Map-Reduce 编程模型对其进行处理/查询。

Hive: Hive是一个运行在 Hadoop 框架上的应用程序,并提供类似 SQL 的接口来处理/查询数据。 Hive在成为 Apache-Hadoop 项目的一部分之前由 Facebook 设计和开发。 Hive使用 HQL(Hive查询语言)运行其查询。 Hive与 RDBMS 具有相同的结构,并且几乎可以在Hive使用相同的命令。 Hive可以将数据存储在外部表中,因此不强制使用 HDFS,它还支持文件格式,例如 ORC、Avro 文件、序列文件和文本文件等。

Hadoop 与 Hive

下表列出了 Hadoop 和Hive之间的差异:

Hadoop Hive
Hadoop is a framework to process/query the Big data Hive is an SQL Based tool that builds over Hadoop to process the data.
Hadoop can understand Map Reduce only. Hive process/query all the data using HQL (Hive Query Language) it’s SQL-Like Language
Map Reduce is an integral part of Hadoop Hive’s query first get converted into Map Reduce than processed by Hadoop to query the data.
Hadoop understands SQL using Java-based Map Reduce only. Hive works on SQL Like query
In Hadoop, have to write complex Map Reduce programs using Java which is not similar to traditional Java. In Hive, earlier used traditional “Relational Database’s” commands can also be used to query the big data
Hadoop is meant for all types of data whether it is Structured, Unstructured or Semi-Structured. Hive can only process/query the structured data
In the simple Hadoop ecosystem, the need to write complex Java programs for the same data. Using Hive, one can process/query the data without complex programming
One side Hadoop frameworks need 100s line for preparing Java-based MR program Hive can query the same data using 8 to 10 lines of HQL.