📜  大数据和 Apache Hadoop 的区别

📅  最后修改于: 2021-09-15 01:44:46             🧑  作者: Mango

大数据:是大型组织和企业获得的巨大的、庞大的或海量的数据、信息或相关统计数据。由于难以手动计算大数据,因此创建和准备了许多软件和数据存储。它用于发现模式和趋势,并做出与人类行为和交互技术相关的决策。

大数据的应用和使用:

  • facebook 和 twitter 等社交网站。
  • 航空和铁路等交通运输。
  • 医疗保健和教育系统。
  • 农业方面。

大数据与 Apache Hadoop

Apache Hadoop:它是一个建立在机器集群上的开源软件框架。它用于非常大的数据集(即大数据)的分布式存储和分布式处理。它是使用 MapReduce 编程模型完成的。用Java实现的开发友好型工具支持大数据应用程序。它可以轻松处理商品服务器集群上的海量数据。它可以挖掘任何形式的数据,即结构化、非结构化或半结构化数据。它是高度可扩展的。

它由 3 个组件组成:

  • HDFS :可靠的存储系统,其中存储了世界上一半的数据。
  • MapReduce :层由分布式处理器组成。
  • Yarn :层由资源管理器组成。

下表列出了大数据和 Apache Hadoop 之间的差异:

No. Big Data Apache Hadoop
1 Big Data is group of technologies. It is a collection of huge data which is multiplying continuously. Apache Hadoop is a open source java based framework which involves some of the big data principles.
2 It is a collection of assets which is quite complex, complicated and ambiguous. It achieves a set of goals and objectives for dealing with the collection of assets.
3 It is a complicated problem i.e. huge amount of raw data. It is a solution being processing machine of those data.
4 Big Data is harder to access. It allows the data to be accessed and process faster.
5 It is hard to store the huge amount of data as it consists all form of data. i.e. structured, unstructured and semi-structured. It implements Hadoop Distributed File System (HDFS) which allows the storage of different variety of data.
6 It defines the data set size. It is where the data set stored and processed.