📜  Hadoop 和 MapReduce 的区别

📅  最后修改于: 2021-09-16 10:25:51             🧑  作者: Mango

Hadoop: Hadoop 软件是一个框架,允许使用简单的编程模型跨计算机集群分布式处理大量数据集。简单来说,Hadoop 是一个处理“大数据”的框架。 Hadoop 是由 Doug Cutting 创建的。它也由 Mike Cafarella 创建。它旨在将单个服务器划分为数千台机器,每台机器都有本地计算和存储。 Hadoop 是一个开源软件。 Apache Hadoop 的核心由称为 Hadoop 分布式文件系统 (HDFS) 的存储部分和可以是 Map-Reduce 编程模型的处理部分组成。 Hadoop 将文件拆分为大块,并在集群期间将它们分布在节点之间。然后它将打包的代码传输到节点以并行处理信息。

MapReduce MapReduce 是一种编程模型,用于在计算机集群上处理和生成大型数据集。它是由谷歌推出的。 Mapreduce 是一种大规模并行化的概念或方法。它的灵感来自于函数式编程的map()reduce()函数。
MapReduce 程序分三个阶段执行,它们是:

  • 映射: Mapper 的工作是处理输入数据。每个节点将映射函数应用于本地数据。
  • Shuffle:这里的节点被重新分配,其中数据基于输出键。(输出键由 map函数产生)。
  • 减少:节点现在被处理成每组输出数据,每个键并行。

Hadoop 与 MapReduce

下表列出了 Hadoop 和 MapReduce 之间的差异:

Based on Hadoop MapReduce
Defination The Apache Hadoop is a software that allows all the distributed processing of large data sets across clusters of computers using simple programming MapReduce is a programming model which is an implementation for processing and generating big data sets with distributed algorithm on a cluster.
Meaning The name “Hadoop” was the named after Doug cutting’s son’s toy elephant. He named this project as “Hadoop” as it was easy to pronounce it. The “MapReduce” name came into existence as per the functionality itself of mapping and reducing in key-value pairs.
Framework Hadoop not only has storage framework which stores the data but creating name node’s and data node’s it also has other frameworks which include MapReduce itself. MapReduce is a programming framework which uses a key, value mappings to sort/process the data
Invention Hadoop was created by Doug Cutting and Mike Cafarella. Mapreduce is invented by Google.
Features
  • Hadoop is Open Source
  • Hadoop cluster is Highly Scalable
  • Mapreduce provides Fault Tolerance
  • Mapreduce provides High Availability
  • Concept The Apache Hadoop is an eco-system which provides an environment which is reliable, scalable and ready for distributed computing. MapReduce is a submodule of this project which is a programming model and is used to process huge datasets which sits on HDFS (Hadoop distributed file system).
    Language Hadoop is a collection of all modules and hence may include other programming/scripting languages too MapReduce is basically written in Java programming language
    Pre-requisites Hadoop runs on HDFS (Hadoop Distributed File System) MapReduce can run on HDFS/GFS/NDFS or any other distributed system for example MapR-FS