Hadoop 之旅由 Doug Cutting 和 Mike Cafarella 于 2005 年开始。哪个是用于处理大型数据的开源软件版本?本文的目的是让您熟悉 Hadoop 2.x 与 Hadoop 3.x 版本之间的差异。显然,Hadoop 3.x 比旧版本的 Hadoop 2.x 具有一些更先进和兼容的特性。
Hadoop 2.x 与 Hadoop 3.x
S.No. | Feature | Hadoop 2.x | Hadoop 3.x |
---|---|---|---|
1 | License | Apache 2.0 is used for licensing which is open-source. | Apache 2.0 is used for licensing which is open-source. |
2 | Minimum supported Java version | JAVA 7 is the minimum compatible version. | JAVA 8 is the minimum compatible version. |
3 | Fault Tolerance | Replication is the only way to handle fault tolerance which is not space optimized. | Erasure coding is used for handling fault tolerance. |
4 | Data Balancing | HDFS balancer is used for Data Balancing. | Intra-data node balancer is used which is called via HDFS disk-balancer command-line interface. |
5 | Storage Scheme | 3x Replication Scheme is used. | uses eraser encoding in HDFS. |
6 | Storage Overhead | 200% of HDFS is consumed in Hadoop 2.x | 50% used in Hadoop 3.x means we have more space to work. |
7 | YARN Timeline Service | Uses timeline service with scalability issue. | Improve the time line service along with improving scalability and reliability of this service. |
8 | Scalability | Limited Scalability, can have upto 10000 nodes in a cluster. | Scalability is improved, can have more then 10000 nodes in a cluster. |
9 | Default Port Range (32768-61000) | Linux ephemeral port range is used as default, which is failed to bind at startup time. | Ports used are out of this ephemeral port range. |
10 | Compatible File System. | HDFS(default), FTP, Amazon S3 and Windows Azure Storage Blobs (WASB) file system. | All file systems including Microsoft Azure Data Lake filesystem. |
11 | Name Node recovery | Manual intervention is needed for the namenode recovery. | No need of Manual intervention for name node recovery. |