📜  Apache HIVE – 特性和限制

📅  最后修改于: 2021-10-27 06:50:08             🧑  作者: Mango

Apache hive 是一个建立在 Hadoop 之上的数据仓库工具,用于从数据中提取有意义的信息。数据仓库就是将来自不同来源的各种数据存储在同一位置。数据主要有 3 种形式,即结构化(SQL 数据库)、半结构化(XML 或 JSON)和非结构化(音乐或视频)。为了处理表格格式中可用的结构化数据,我们在 Hadoop 之上使用了Hive 。 Hive非常强大,可以非常高效地查询 PB 级数据。

我们知道MapReduce是我们在 Hadoop 上使用Java或其他语言进行编程的默认模型,因此Hive主要是为熟悉SQL的开发人员设计的。 Hive诞生后,对Java不太熟悉的人也可以借助Hive在Hadoop上处理数据。使用Hive还可以轻松查询结构数据,因为与Hive相比,用Java编写代码更困难。 HQL 或 HIVEQL 是我们用来处理 hive 的查询语言,其语法与 SQL 语言非常相似,因此使用Hive非常容易。

Apache Hive特性

Features

Explanation

Supported Computing Engine Hive supports MapReduce, Tez, and Spark computing engine.
Framework Hive is a stable batch-processing framework built on top of the Hadoop Distributed File system and can work as a data warehouse. 
Easy To Code Hive uses HIVE query language to query structure data which is easy to code. The 100 lines of java code we use to query a structure data can be minimized to 4 lines with HQL.  
Declarative HQL is a declarative language like SQL means it is non-procedural.
Structure Of Table  The table, the structure is similar to the RDBMS. It also supports partitioning and bucketing.
Supported data structures Partition, Bucket, and tables are the 3 data structures that hive supports.
Supports ETL Apache hive supports ETL i.e. Extract Transform and Load. Before Hive python is used for ETL.
Storage Hive supports users to access files from HDFS, Apache HBase, Amazon S3, etc.
Capable Hive is capable to process very large datasets of Petabytes in size.  
Helps in processing unstructured data We can easily embed custom MapReduce code with Hive to process unstructured data. 
Drivers JDBC/ODBC drivers are also available in Hive.
Fault Tolerance Since we store Hive data on HDFS so fault tolerance is provided by Hadoop. 
Area of uses We can use a hive for data mining, predictive modeling, and document indexing.

Apache Hive限制

Limitation

Explanation

Does not support OLAP Apache Hive doesn’t support online transaction processing (OLTP) but Online Analytical Processing(OLAP) is supported.
No updation and Deletion Hive does not support update and delete operation on tables.
Doesn’t support subqueries Subqueries are not supported.
Latency The latency in the apache hive query is very high.
Only non-real or cold data is supported Hive is not used for real-time data querying since it takes a while to produce a result.
Transaction processing is not supported HQL does not support the Transaction processing feature.