1. Apache Hive :
Apache Hive是构建在Apache Hadoop巅峰的数据仓库设备,可方便地进行记录汇总,即席查询以及评估保存在与Hadoop结合在一起的许多数据库和文件结构中的海量数据集以及MapR数据具有MapR XD和MapR数据库的平台。 Hive提供了一种简单的方法来对大量非结构化事实进行结构化处理,然后对该数据进行批处理类似SQL的查询。
2. Apache Spark SQL:
Spark SQL为Spark提供了SQL的本机帮助,并简化了查询保存在RDD(Spark分配的数据集)和外部源中的记录的方法。 Spark SQL毫不费力地模糊了RDD和关系表之间的跟踪。统一这些有效的抽象,使开发人员可以方便地将查询外部信息的SQL指令与复杂的分析混合在一起,而这些都可以在单个应用程序中进行。
Apache Hive和Apache Spark SQL之间的区别:
S.No. | Apache Hive | Apache Spark SQL |
---|---|---|
1. | It is an Open Source Data warehouse system, constructed on top of Apache Hadoop. |
It is used in structured data Processing system where it processes information using SQL. |
2. | It contains large data sets and stored in Hadoop files for analyzing and querying purposes. |
It computes heavy functions followed by correct optimization techniques for processing a task. |
3. | It was released in the year 2012. | It first came into the picture in 2014. |
4. | For its implementation, it mainly uses JAVA. | It can be implemented in various languages such as R, Python and Scala. |
5. | Its latest version (2.3.2) is released in 2017. | Its latest version (2.3.0) is released in 2018. |
6. | Mainly RDMS is used as its Database Model. | It can be integrated with any No-SQL database. |
7. | It can support all OS provided, JVM environment will be there. | It supports various OS such as Linux, Windows, etc. |
8. | Access methods for its processing include JDBC, ODBC and Thrift. | It can be accessed only by ODBC and JDBC. |