1. 阿帕奇Hive:
Apache Hive是在 Apache Hadoop 的顶峰上构建的数据仓库设备,可以方便地对保存在与 Hadoop 结合的多个数据库和文件结构中的海量数据集进行记录汇总、即席查询和评估,以及 MapR 数据带有 MapR XD 和 MapR 数据库的平台。 Hive提供了一种简单的方法来练习对大量非结构化事实进行结构化,然后对这些数据执行类似 SQL 的批处理查询。
2. Apache Spark SQL:
Spark SQL 为 Spark 带来了 SQL 的原生辅助,并简化了查询保存在 RDD(Spark 的分配数据集)和外部源中的记录的方法。 Spark SQL 毫不费力地模糊了 RDD 和关系表之间的痕迹。统一这些有效的抽象使开发人员可以方便地在单个应用程序中混合查询外部信息的 SQL 指令和复杂的分析。
Apache Hive和 Apache Spark SQL 的区别:
S.No. | Apache Hive | Apache Spark SQL |
---|---|---|
1. | It is an Open Source Data warehouse system, constructed on top of Apache Hadoop. |
It is used in structured data Processing system where it processes information using SQL. |
2. | It contains large data sets and stored in Hadoop files for analyzing and querying purposes. |
It computes heavy functions followed by correct optimization techniques for processing a task. |
3. | It was released in the year 2012. | It first came into the picture in 2014. |
4. | For its implementation, it mainly uses JAVA. | It can be implemented in various languages such as R, Python and Scala. |
5. | Its latest version (2.3.2) is released in 2017. | Its latest version (2.3.0) is released in 2018. |
6. | Mainly RDMS is used as its Database Model. | It can be integrated with any No-SQL database. |
7. | It can support all OS provided, JVM environment will be there. | It supports various OS such as Linux, Windows, etc. |
8. | Access methods for its processing include JDBC, ODBC and Thrift. | It can be accessed only by ODBC and JDBC. |