Hadoop 2.x 与 Hadoop 3.x 之间的差异

Hadoop 之旅由 Doug Cutting 和 Mike Cafarella 于 2005 年开始。哪个是用于处理大型数据的开源软件版本?本文的目的是让您熟悉 Hadoop 2.x 与 Hadoop 3.x 版本之间的差异。显然，Hadoop 3.x 比旧版本的 Hadoop 2.x 具有一些更先进和兼容的特性。

Hadoop 2.X 与 3.X

Hadoop 2.x 与 Hadoop 3.x

S.No.	Feature	Hadoop 2.x	Hadoop 3.x
1	License	Apache 2.0 is used for licensing which is open-source.	Apache 2.0 is used for licensing which is open-source.
2	Minimum supported Java version	JAVA 7 is the minimum compatible version.	JAVA 8 is the minimum compatible version.
3	Fault Tolerance	Replication is the only way to handle fault tolerance which is not space optimized.	Erasure coding is used for handling fault tolerance.
4	Data Balancing	HDFS balancer is used for Data Balancing.	Intra-data node balancer is used which is called via HDFS disk-balancer command-line interface.
5	Storage Scheme	3x Replication Scheme is used.	uses eraser encoding in HDFS.
6	Storage Overhead	200% of HDFS is consumed in Hadoop 2.x	50% used in Hadoop 3.x means we have more space to work.
7	YARN Timeline Service	Uses timeline service with scalability issue.	Improve the time line service along with improving scalability and reliability of this service.
8	Scalability	Limited Scalability, can have upto 10000 nodes in a cluster.	Scalability is improved, can have more then 10000 nodes in a cluster.
9	Default Port Range (32768-61000)	Linux ephemeral port range is used as default, which is failed to bind at startup time.	Ports used are out of this ephemeral port range.
10	Compatible File System.	HDFS(default), FTP, Amazon S3 and Windows Azure Storage Blobs (WASB) file system.	All file systems including Microsoft Azure Data Lake filesystem.
11	Name Node recovery	Manual intervention is needed for the namenode recovery.	No need of Manual intervention for name node recovery.