📜  Apache Hive和Apache Spark SQL之间的区别

📅  最后修改于: 2021-08-25 18:04:02             🧑  作者: Mango

1. Apache Hive :
Apache Hive是构建在Apache Hadoop巅峰的数据仓库设备,可方便地进行记录汇总,即席查询以及评估保存在与Hadoop结合在一起的许多数据库和文件结构中的海量数据集以及MapR数据具有MapR XD和MapR数据库的平台。 Hive提供了一种简单的方法来对大量非结构化事实进行结构化处理,然后对该数据进行批处理类似SQL的查询。

2. Apache Spark SQL:
Spark SQL为Spark提供了SQL的本机帮助,并简化了查询保存在RDD(Spark分配的数据集)和外部源中的记录的方法。 Spark SQL毫不费力地模糊了RDD和关系表之间的跟踪。统一这些有效的抽象,使开发人员可以方便地将查询外部信息的SQL指令与复杂的分析混合在一起,而这些都可以在单个应用程序中进行。

Apache Hive和Apache Spark SQL之间的区别:

S.No. Apache Hive Apache Spark SQL
1. It is an Open Source Data warehouse system,
constructed on top of Apache Hadoop.
It is used in structured data Processing system where
it processes information using SQL.
2. It contains large data sets and stored in Hadoop files for
analyzing and querying purposes.
It computes heavy functions followed by correct
optimization techniques for processing a task.
3. It was released in the year 2012. It first came into the picture in 2014.
4. For its implementation, it mainly uses JAVA. It can be implemented in various languages such as R, Python and Scala.
5. Its latest version (2.3.2) is released in 2017. Its latest version (2.3.0) is released in 2018.
6. Mainly RDMS is used as its Database Model. It can be integrated with any No-SQL database.
7. It can support all OS provided, JVM environment will be there. It supports various OS such as Linux, Windows, etc.
8. Access methods for its processing include JDBC, ODBC and Thrift. It can be accessed only by ODBC and JDBC.