📌  相关文章
📜  Apache Hadoop 和 Amazon Redshift 之间的区别

📅  最后修改于: 2021-10-27 06:51:30             🧑  作者: Mango

Hadoop是一个建立在机器集群上的开源软件框架。它用于非常大的数据集(即大数据)的分布式存储和分布式处理。它是使用 Map-Reduce 编程模型完成的。用Java实现的开发友好型工具支持大数据应用程序。它可以轻松处理商品服务器集群上的海量数据。它可以挖掘任何形式的数据,即结构化、非结构化或半结构化数据。它是高度可扩展的。它由 3 个组件组成:

  • HDFS:可靠的存储系统,其中存储了世界上一半的数据。
  • Map Reduce :该层由分布式处理器组成。
  • Yarn :该层由资源管理器组成。

Amazon RedShift是一种基于云的大规模数据仓库服务。 Amazon Redshift 拥有商业许可证,并且是 Amazon Web 服务的一部分。它处理大量数据并以其可扩展性而闻名。它并行处理多个数据。它使用 ACID 特性作为其工作原理,非常受欢迎。用C语言实现,可用性高。 Amazon Redshift 的功能 – 快速、简单、经济高效的数据仓库服务。

下表列出了Apace Hadoop 与 Amazon Redshift之间的差异

APACHE HADOOP

AMAZON REDSHIFT

Hadoop is 10 times costlier than Redshift. It costs about $200 per month. It is cheaper than Hadoop and costs $20 per month as the price depends on the region of the server.
Map Reduce jobs are slower in Hadoop. Redshift performs much faster than Hadoop cluster. For example: Redshift 16 node cluster performed a lot faster than a Hive/Elastic Map Reduce 44 node cluster.
Hadoop has a storage layer and stores data as files without taking into account any underlying data structure. Redshift is a columnar database which is designed to work with complex queries spanning millions of rows. Data is arranged in a table format and supports the structures based on PostgreSQL standard.
Use the HDFS set and get shell command to copy data to the Hadoop cluster. Data in Redshift are copied firstly by using Amazon S3 and then by copy command.
Scaling is not a limiting factor in Hadoop as one can scale to any amount of storage space by managing and integrating the nodes process properly. Redshift can only scale up to 2 PB.
Slower in comparison to Redshift. Ten times faster than Hadoop.
Hadoop is a Open Source Framework by Apache Projects. Red Shift is a priced Services provided by Amazon.
Hadoop is more flexible with local file system and any database Redshift can only load data from Amazon S3 or DynamoDB.
Administrative activities are complex and trickier to handle in Hadoop. Redshift has automated backups to Amazon S3 and data warehouse administration. 
It is provided by Hortonworks and Cloudera providers etc., It is developed and provided by Amazon Web services.