大数据和 Apache Hadoop 的区别

大数据：是大型组织和企业获得的巨大的、庞大的或海量的数据、信息或相关统计数据。由于难以手动计算大数据，因此创建和准备了许多软件和数据存储。它用于发现模式和趋势，并做出与人类行为和交互技术相关的决策。

大数据的应用和使用：

facebook 和 twitter 等社交网站。
航空和铁路等交通运输。
医疗保健和教育系统。
农业方面。

大数据与 Apache Hadoop

Apache Hadoop：它是一个建立在机器集群上的开源软件框架。它用于非常大的数据集(即大数据)的分布式存储和分布式处理。它是使用 MapReduce 编程模型完成的。用Java实现的开发友好型工具支持大数据应用程序。它可以轻松处理商品服务器集群上的海量数据。它可以挖掘任何形式的数据，即结构化、非结构化或半结构化数据。它是高度可扩展的。

它由 3 个组件组成：

HDFS ：可靠的存储系统，其中存储了世界上一半的数据。
MapReduce ：层由分布式处理器组成。
Yarn ：层由资源管理器组成。

下表列出了大数据和 Apache Hadoop 之间的差异：

No.	Big Data	Apache Hadoop
1	Big Data is group of technologies. It is a collection of huge data which is multiplying continuously.	Apache Hadoop is a open source java based framework which involves some of the big data principles.
2	It is a collection of assets which is quite complex, complicated and ambiguous.	It achieves a set of goals and objectives for dealing with the collection of assets.
3	It is a complicated problem i.e. huge amount of raw data.	It is a solution being processing machine of those data.
4	Big Data is harder to access.	It allows the data to be accessed and process faster.
5	It is hard to store the huge amount of data as it consists all form of data. i.e. structured, unstructured and semi-structured.	It implements Hadoop Distributed File System (HDFS) which allows the storage of different variety of data.
6	It defines the data set size.	It is where the data set stored and processed.