Hadoop: Hadoop 是一种框架或软件,它被发明来管理大量数据或大数据。 Hadoop 用于存储和处理分布在商品服务器集群中的大数据。 Hadoop 使用 Hadoop 分布式文件系统存储数据,并使用 Map-Reduce 编程模型对其进行处理/查询。
Hive: Hive是一个运行在 Hadoop 框架上的应用程序,并提供类似 SQL 的接口来处理/查询数据。 Hive在成为 Apache-Hadoop 项目的一部分之前由 Facebook 设计和开发。 Hive使用 HQL(Hive查询语言)运行其查询。 Hive与 RDBMS 具有相同的结构,并且几乎可以在Hive使用相同的命令。 Hive可以将数据存储在外部表中,因此不强制使用 HDFS,它还支持文件格式,例如 ORC、Avro 文件、序列文件和文本文件等。
下表列出了 Hadoop 和Hive之间的差异:
Hadoop | Hive |
---|---|
Hadoop is a framework to process/query the Big data | Hive is an SQL Based tool that builds over Hadoop to process the data. |
Hadoop can understand Map Reduce only. | Hive process/query all the data using HQL (Hive Query Language) it’s SQL-Like Language |
Map Reduce is an integral part of Hadoop | Hive’s query first get converted into Map Reduce than processed by Hadoop to query the data. |
Hadoop understands SQL using Java-based Map Reduce only. | Hive works on SQL Like query |
In Hadoop, have to write complex Map Reduce programs using Java which is not similar to traditional Java. | In Hive, earlier used traditional “Relational Database’s” commands can also be used to query the big data |
Hadoop is meant for all types of data whether it is Structured, Unstructured or Semi-Structured. | Hive can only process/query the structured data |
In the simple Hadoop ecosystem, the need to write complex Java programs for the same data. | Using Hive, one can process/query the data without complex programming |
One side Hadoop frameworks need 100s line for preparing Java-based MR program | Hive can query the same data using 8 to 10 lines of HQL. |