HDFS :Hadoop分布式文件系统是一种分布式文件系统,旨在存储和运行在作为节点相互连接的多台机器上并提供数据可靠性。它由集群组成,每个集群都通过安装在单独机器上的单个 NameNode 软件工具进行访问,以监控和管理该集群的文件系统和用户访问机制。
HBase :HBase是一个用Java编写的顶级Apache项目,满足实时读写数据的需求。它为分布式数据提供了一个简单的接口。它可以被 Apache Hive、Apache Pig、MapReduce 访问,并将信息存储在 HDFS 中。
下面是 HDFS 和 HBase 之间的差异表:
HDFS | HBase |
---|---|
HDFS is a java based file distribution system | Hbase is hadoop database that runs on top of HDFS |
HDFS is highly fault-tolerant and cost-effective | HBase is partially tolerant and highly consistent |
HDFS Provides only sequential read/write operation | Random access is possible due to hash table |
HDFS is based on write once read many times | HBase supports random read and writeoperation into filesystem |
HDFS has a rigid architecture | HBase support dynamic changes |
HDFS is prefereable for offline batch processing | HBase is preferable for real time processing |
HDFS provides high latency for access operations. | HBase provides low latency access to small amount of data |