Hadoop: Hadoop 软件是一个框架,允许使用简单的编程模型跨计算机集群分布式处理大量数据集。简单来说,Hadoop 是一个处理“大数据”的框架。 Hadoop 是由 Doug Cutting 创建的。它也由 Mike Cafarella 创建。它旨在将单个服务器划分为数千台机器,每台机器都有本地计算和存储。 Hadoop 是一个开源软件。 Apache Hadoop 的核心由称为 Hadoop 分布式文件系统 (HDFS) 的存储部分和可以是 Map-Reduce 编程模型的处理部分组成。 Hadoop 将文件拆分为大块,并在集群期间将它们分布在节点之间。然后它将打包的代码传输到节点以并行处理信息。
MapReduce : MapReduce 是一种编程模型,用于在计算机集群上处理和生成大型数据集。它是由谷歌推出的。 Mapreduce 是一种大规模并行化的概念或方法。它的灵感来自于函数式编程的map()
和reduce()
函数。
MapReduce 程序分三个阶段执行,它们是:
- 映射: Mapper 的工作是处理输入数据。每个节点将映射函数应用于本地数据。
- Shuffle:这里的节点被重新分配,其中数据基于输出键。(输出键由 map函数产生)。
- 减少:节点现在被处理成每组输出数据,每个键并行。
下表列出了 Hadoop 和 MapReduce 之间的差异:
Based on | Hadoop | MapReduce | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Defination | The Apache Hadoop is a software that allows all the distributed processing of large data sets across clusters of computers using simple programming | MapReduce is a programming model which is an implementation for processing and generating big data sets with distributed algorithm on a cluster. | |||||||||
Meaning | The name “Hadoop” was the named after Doug cutting’s son’s toy elephant. He named this project as “Hadoop” as it was easy to pronounce it. | The “MapReduce” name came into existence as per the functionality itself of mapping and reducing in key-value pairs. | |||||||||
Framework | Hadoop not only has storage framework which stores the data but creating name node’s and data node’s it also has other frameworks which include MapReduce itself. | MapReduce is a programming framework which uses a key, value mappings to sort/process the data | |||||||||
Invention | Hadoop was created by Doug Cutting and Mike Cafarella. | Mapreduce is invented by Google. | |||||||||
Features |
|
Concept |
The Apache Hadoop is an eco-system which provides an environment which is reliable, scalable and ready for distributed computing. |
MapReduce is a submodule of this project which is a programming model and is used to process huge datasets which sits on HDFS (Hadoop distributed file system). |
Language |
Hadoop is a collection of all modules and hence may include other programming/scripting languages too |
MapReduce is basically written in Java programming language |
Pre-requisites |
Hadoop runs on HDFS (Hadoop Distributed File System) |
MapReduce can run on HDFS/GFS/NDFS or any other distributed system for example MapR-FS |
|