Hadoop是一个开源软件编程框架。 Hadoop 的框架基于Java编程语言,带有一些 shell 脚本和 C 的本机代码。
该框架用于管理、存储和处理集群系统下运行的大数据不同应用的数据和计算。 Hadoop 的主要组件是 HDFS、MapReduce 和 YARN。
Cassandra是一个开源分布式数据管理系统,具有广泛的列存储和 NoSQL 数据库。在此 NoSQL 数据库中,能够处理跨许多商用硬件的大量数据,且无单点故障和高可用性。该代码是用Java编写的,由 Apache 软件基金会开发。
Hadoop 和 Cassandra 的区别
S.NO. | HADOOP | CASSANDRA |
---|---|---|
1 | Hadoop is a scalable framework that is designed to be deployed on low-cost hardware. | It is deployed in a very distributed fashion as a cluster of instances that are all aware of each other. |
2 | Hadoop is a big data processing framework based on the famous MapReduce programming model. | Cassandra is mainly used for real-time data processing. |
3 | Hadoop supports a variety of formats. | Cassandra does not support images. |
4 | Hadoop follows a master slave architecture. | Cassandra follows a peer-to-peer architecture |
5 | Hadoop is deployed in a single data center. | Cassandra is deployed in a very distributed fashion. |
6 | It used map reduce to read/write. | This uses Cassandra query language. |
7 | In hadoop, data is directly written to data node. | While in Cassandra, data is first written to mem-table and then it is written to disk. |
8 | Hadoop has a fixed replication factor of 3. | Replication factor in Cassandra depend on the number of nodes. |
9 | It has high latency rate. | It has less latency rate. |
10 | Hadoop uses TCP and UDP for communication. | In Cassandra, gossip protocol is used for communication. |
11 | It is for data batch processing. | It is for real-time processing. |