Hadoop 是一个用Java编写的开源软件框架以及一些 shell 脚本和 C 代码,用于对非常大的数据执行计算。 Hadoop 用于通过形成物理集群的众多机器的网络进行批处理/离线处理。该框架的工作方式足以在同一集群上提供分布式存储和处理。它旨在用于通常称为商品硬件的更便宜的系统,其中每个系统都提供其本地存储和计算能力。
Hadoop 能够运行各种文件系统,而 HDFS 只是所有这些文件系统中的一种。 Hadoop有多种可以具体实现的文件系统。 Java抽象类org.apache.hadoop.fs.FileSystem表示 Hadoop 中的文件系统。
Filesystem |
URI scheme |
Java implementation (all under org.apache.hadoop) |
Description |
---|---|---|---|
Local | file | fs.LocalFileSystem | The Hadoop Local filesystem is used for a locally connected disk with client-side checksumming. The local filesystem uses RawLocalFileSystem with no checksums. |
HDFS | hdfs | hdfs.DistributedFileSystem | HDFS stands for Hadoop Distributed File System and it is drafted for working with MapReduce efficiently. |
HFTP | hftp | hdfs.HftpFileSystem |
The HFTP filesystem provides read-only access to HDFS over HTTP. There is no connection of HFTP with FTP. This filesystem is commonly used with distcp to share data between HDFS clusters possessing different versions. |
HSFTP | hsftp | hdfs.HsftpFileSystem | The HSFTP filesystem provides read-only access to HDFS over HTTPS. This file system also does not have any connection with FTP. |
HAR | har | fs.HarFileSystem | The HAR file system is mainly used to reduce the memory usage of NameNode by registering files in Hadoop HDFS. This file system is layered on some other file system for archiving purposes. |
KFS (Cloud-Store) | kfs | fs.kfs.KosmosFileSystem | cloud store or KFS(KosmosFileSystem) is a file system that is written in c++. It is very much similar to a distributed file system like HDFS and GFS(Google File System). |
FTP | ftp | fs.ftp.FTPFileSystem | The FTP filesystem is supported by the FTP server. |
S3 (native) | s3n | fs.s3native.NativeS3FileSystem | This file system is backed by AmazonS3. |
S3 (block-based) | s3 | fs.s3.S3FileSystem | S3 (block-based) file system which is supported by Amazon s3 stores files in blocks(similar to HDFS) just to overcome S3’s file system 5 GB file size limit. |
Hadoop 为其各种文件系统提供了大量接口,并且它在大多数情况下利用 URI 计划来选择正确的文件系统示例进行对话。您可以在处理非常大的数据集时使用此文件系统中的任何一个来处理 MapReduce,但具有数据局部性功能的分布式文件系统更可取,如 HDFS 和 KFS(KosmosFileSystem)。