Hadoop:它是一个将大数据存储在分布式系统中然后并行处理的框架。 Hadoop 的四个主要组件是 Hadoop 分布式文件系统 (HDFS)、Yarn、MapReduce 和库。它不仅涉及大数据,还涉及结构化、半结构化和非结构化信息的混合。亚马逊、IBM、微软、Cloudera、ScienceSoft、Pivotal、Hortonworks 是一些使用 Hadoop 技术的公司。
SQL:结构化查询语言是一种领域特定语言,用于计算和处理关系数据库管理系统中的数据管理,它还处理关系数据流管理系统中的数据流。简而言之,SQL 是一种标准的数据库语言,用于从 MySQL、Oracle、SQL Server 等关系数据库中创建、存储和提取数据。
下表列出了 Hadoop 和 SQL 之间的差异:
Feature | Hadoop | SQL |
---|---|---|
Technology | Modern | Traditional |
Volume | Usually in PetaBytes | Usually in GigaBytes |
Operations | Storage, processing, retrieval and pattern extraction from data | Storage, processing, retrieval and pattern mining of data |
Fault Tolerance | Hadoop is highly fault tolerant | SQL has good fault tolerance |
Storage | Stores data in the form of key-value pairs, tables, hash map etc in distributed systems. | Stores structured data in tabular format with fixed schema in cloud |
Scaling | Linear | Non linear |
Providers | Cloudera, Horton work, AWS etc. provides Hadoop systems. | Well-known industry leaders in SQL systems are Microsoft, SAP, Oracle etc. |
Data Access | Batch oriented data access | Interactive and batch oriented data access |
Cost | It is open source and systems can be cost effectively scaled | It is licensed and costs a fortune to buy a SQL server, moreover if system runs out of storage additional charges also emerge |
Time | Statements are executed very quickly | SQL syntax is slow when executed in millions of rows |
Optimization | It stores data in HDFS and process though Map Reduce with huge optimization techniques. | It does not have any advanced optimization techniques |
Structure | Dynamic schema, capable of storing and processing log data, real-time data, images, videos, sensor data etc.(both structured and unstructured) | Static Schema, capable of storing data(fixed schema) in tabular format only(structured) |
Data Update | Write data once, read data multiple times | Read and Write data multiple times |
Integrity | Low | High |
Interaction | Hadoop uses JDBC(Java Database Connectivity) to communicate with SQL systems to send and receive data | SQL systems can read and write data to Hadoop systems |
Hardware | Uses commodity hardware | Uses propriety hardware |
Training | Learning Hadoop for entry-level as well as seasoned profession is moderately hard | Learning SQL is easy for even entry-level professionals |