MapReduce 是一种在 Hadoop 上工作的模型,可高效访问存储在 HDFS(Hadoop 分布式文件系统)中的大数据。它是 Hadoop 的核心组件,它将大数据分成小块并并行处理。
MapReduce 的特点:
- 它可以在各种服务器上存储和分发大量数据。
- 允许用户将数据存储在地图中并减少表格以进行处理。
- 它保护系统免受任何未经授权的访问。
- 它支持并行处理模型。
Hive是 Facebook 发起的一项计划,旨在为 MapReduce 编程提供传统的数据仓库接口。为了以 SQL 方式编写 MapReduce 查询, Hive编译器在后台将它们转换为在 Hadoop 集群中执行。它帮助程序员使用他们的 SQL 知识,而不是专注于开发一种新语言。
Hive的特点:
- 提供称为 HQL 的 SQL 类型语言。
- 帮助查询存储在 HDFS(Hadoop 分布式文件系统)中的大型数据集。
- 它是一个开源工具。
- 它支持灵活的项目视图并使数据可视化变得容易。
MapReduce 与Hive
S.No | MapReduce | Hive |
---|---|---|
1. | It is a Data Processing Language. | It is a SQL-like Query Language. |
2. | It converts the job into map-reduce functions. | It converts the SQL queries to HQL(Hive-QL) |
3. | It provides low level of abstraction. | It provides a high level of abstraction. |
4. | It is difficult for the user to perform join operations. | It makes it easy for the user to perform SQL-like operations on HDFS. |
5. | The user has to write 10 times more lines of code to perform a similar task than Pig. | The user has to write a few lines of code than MapReduce. |
6. | It has several jobs therefore execution time is more. | The code execution time is more but development effort is less. |
7. | It is supported by versions of the Hadoop. | It is also supported with recent versions of Hadoop. |