1.猪:
Pig用于分析大量数据。它是MapReduce的抽象。 Pig用于在Hadoop中执行各种数据操作操作。它提供了Pig-Latin语言来编写包含许多内置函数(如join,filter等)的代码。Apache Pig的两个部分是Pig-Latin和Pig-Engine。 Pig Engine用于将所有这些脚本转换为特定的映射并减少任务。猪的抽象处于更高的水平。与MapReduce相比,它包含的代码行更少。
2.Hive:
Hive构建在Hadoop的顶部,用于处理Hadoop中的结构化数据。 Hive由Facebook开发。它提供了各种类型的查询语言,通常称为Hive查询语言。 Apache Hive是一个数据仓库,它在用户和集成了Hadoop的Hadoop分布式文件系统(HDFS)之间提供类似于SQL的界面。
Pig和Hive之间的区别:
S.No. | Pig | Hive |
---|---|---|
1. | Pig operates on the client side of a cluster. | Hive operates on the server side of a cluster. |
2. | Pig uses pig-latin language. | Hive uses HiveQL language. |
3. | Pig is a Procedural Data Flow Language. | Hive is a Declarative SQLish Language. |
4. | It was developed by Yahoo. | It was developed by Facebook. |
5. | It is used by Researchers and Programmers. | It is mainly used by Data Analysts. |
6. | It is used to handle structured and semi-structured data. | It is mainly used to handle structured data. |
7. | It is used for programming. | It is used for creating reports. |
8. | Pig scripts end with .pig extension. | In HIve, all extensions are supported. |
9. | It does not support partitioning. | It supports partitioning. |
10. | It loads data quickly. | It loads data slowly. |
11. | It does not support JDBC. | It supports JDBC. |
12. | It does not support ODBC. | It supports ODBC. |
13. | Pig does not have a dedicated metadata database. | Hive makes use of the exact variation of dedicated SQL-DDL language by defining tables beforehand. |
14. | It supports Avro file format. | It does not support Avro file format. |
15. | Pig is suitable for complex and nested data structures. | Hive is suitable for batch-processing OLAP systems. |
16. | Pig does not support schema to store data. | Hive supports schema for data insertion in tables. |