📜  Hadoop 和 SQL 性能之间的差异

📅  最后修改于: 2021-10-27 06:31:04             🧑  作者: Mango

Hadoop: Hadoop 是一个用Java编写的开源软件框架,用于存储数据和处理从千兆字节到 PB 级的大型数据集。 Hadoop 是一种分布式文件系统,可以跨计算机存储和处理海量数据集群。开源的 Hadoop 与所有平台兼容,因为它是基于 Java 的。 Hadoop 有两个核心层,即处理/计算层(MapReduce)和存储层(Hadoop 分布式文件系统)。 Hadoop 跨计算机集群运行代码,并对跨商用服务器集群的庞大数据集执行离线批处理。但是,Hadoop 不是 SQL 的替代品,而是它们的使用取决于个人需求。在性能方面,Hadoop 优于 SQL,因为它具有更高的速度和以相同效率处理结构化、半结构化和非结构化数据的能力。

SQL 性能:结构化查询语言 (SQL) 是一种用于在数据库中操作、检索和存储数据的标准语言。关系数据库使用 SQL 作为标准来维护和操作数据。 SQL 命令,例如“选择”、“插入”、“更新”、“删除”、“创建”和“删除”,可用于存储、更新或检索数据库中的数据。一些使用 SQL 的常见关系数据库管理系统有 Oracle、Microsoft SQL Server、Sybase、Access、Ingres 等。 然而,随着数据量(或大数据)的增加,使用 SQL 存储如此大量的数据变得困难。关系数据库。对于结构化模式效果很好,但对于大数据,它没有固定模式,而是半结构化数据。 RDBMS 大数据的 3 V:容量、多样性和速度是导致 NoSQL 数据库出现的主要原因。从名称上看,很明显 SQL 不能再用于 NoSQL 数据库的数据操作。在这种情况下,Hadoop 比 SQL 更具优势。

下表列出了 Hadoop 和 SQL 性能之间的差异:

Feature Hadoop SQL Performance
Structure No fixed schema Fixed Schema
Data Format Structured, semi-structured or unstructured data Structured data
Data Volume Hadoop works exceptionally well on both low and high volume of data SQL works better on low volume of data
Data processing Hadoop supports large-scale offline batch processing known as OLAP SQL supports Real-time data processing known as OLTP
Speed Faster Slower
Throughput Higher throughput Lower throughput
Latency Hadoop cannot fetch a particular record from the data set very quickly hence it has low latency SQL can fetch a particular record from the data set very quickly hence it has high latency
Scalability Horizontal scalability which means more machines can be added in the network for parallel processing Vertical scalability which means more hardware or CPU is added to existing machine
Data Storage Data can be stored in the form of tables, key-value pairs etc Data can be stored in the form of tables only.
Integrity Low integrity High integrity
Data variety Hadoop deals with Big data and supports variety of data SQL does not support variety of data
Updates Hadoop is designed with the concept of write once read many. Hence data updates are practically not possible SQL is write once, read and update many. Hence data updates are very easily done
ACID Properties It does not fully comply with ACID properties It fully complies with ACID properties
License Hadoop is free open source software SQL is licensed
Example MongoDB, HBase etc Oracle, Microsoft SQL Server etc