HDFS :Hadoop分布式文件系统是一种分布式文件系统,旨在存储并在作为节点相互连接的多台计算机上运行并提供数据可靠性。它由群集组成,每个群集都可以通过安装在单独计算机上的单个NameNode软件工具进行访问,以监视和管理该群集的文件系统和用户访问机制。
HBase :HBase是用Java编写的顶级Apache项目,可以满足实时读取和写入数据的需求。它为分布式数据提供了一个简单的接口。 Apache Hive,Apache Pig,MapReduce可以访问它,并将信息存储在HDFS中。
下表是HDFS和HBase之间的区别表:
HDFS | HBase |
---|---|
HDFS is a java based file distribution system | Hbase is hadoop database that runs on top of HDFS |
HDFS is highly fault-tolerant and cost-effective | HBase is partially tolerant and highly consistent |
HDFS Provides only sequential read/write operation | Random access is possible due to hash table |
HDFS is based on write once read many times | HBase supports random read and writeoperation into filesystem |
HDFS has a rigid architecture | HBase support dynamic changes |
HDFS is prefereable for offline batch processing | HBase is preferable for real time processing |
HDFS provides high latency for access operations. | HBase provides low latency access to small amount of data |