📅  最后修改于: 2020-11-30 04:27:36             🧑  作者: Mango
HCatLoader和HCatStorer API与Pig脚本一起使用,可以在HCatalog管理的表中读取和写入数据。这些接口不需要特定于HCatalog的设置。
最好对Apache Pig脚本有一些了解,以便更好地理解本章。有关更多参考,请阅读我们的Apache Pig教程。
A = LOAD 'tablename' USING org.apache.HCatalog.pig.HCatLoader();
您必须在单引号中指定表名称: LOAD’tablename’ 。如果使用的是非默认数据库,则必须将输入指定为’ dbname.tablename’ 。
Hive Metastore使您无需指定数据库即可创建表。如果以这种方式创建表,则数据库名称为“默认”,并且在为HCatLoader指定表时不需要。
Sr.No. | Method Name & Description |
1 |
public InputFormat,?> getInputFormat()throws IOException Read the input format of the loading data using the HCatloader class. |
2 |
public String relativeToAbsolutePath(String location, Path curDir) throws IOException It returns the String format of the Absolute path. |
3 |
public void setLocation(String location, Job job) throws IOException It sets the location where the job can be executed. |
4 |
public Tuple getNext() throws IOException Returns the current tuple (key and value) from the loop. |
A = LOAD ...
my_processed_data = ...
STORE my_processed_data INTO 'tablename' USING org.apache.HCatalog.pig.HCatStorer();
您必须在单引号中指定表名称: LOAD’tablename’ 。在运行Pig脚本之前,必须同时创建数据库和表。如果使用的是非默认数据库,则必须将输入指定为‘dbname.tablename’ 。
Hive Metastore使您无需指定数据库即可创建表。如果以这种方式创建表,则数据库名称为“默认”,并且无需在store语句中指定数据库名称。
Sr.No. | Method Name & Description |
1 |
public OutputFormat getOutputFormat() throws IOException Read the output format of the stored data using the HCatStorer class. |
2 |
public void setStoreLocation (String location, Job job) throws IOException Sets the location where to execute this store application. |
3 |
public void storeSchema (ResourceSchema schema, String arg1, Job job) throws IOException Store the schema. |
4 |
public void prepareToWrite (RecordWriter writer) throws IOException It helps to write data into a particular file using RecordWriter. |
5 |
public void putNext (Tuple tuple) throws IOException Writes the tuple data into the file. |
猪不会自动捡起HCatalog罐。要引入必要的罐子,可以在Pig命令中使用标志,也可以如下所述设置环境变量PIG_CLASSPATH和PIG_OPTS 。
pig –useHCatalog
使用以下CLASSPATH设置将HCatalog与Apache Pig同步。
export HADOOP_HOME =
export HIVE_HOME =
export HCAT_HOME =
export PIG_CLASSPATH = $HCAT_HOME/share/HCatalog/HCatalog-core*.jar:\
假设我们在HDFS中有一个具有以下内容的文件Student_details.txt 。
001, Rajiv, Reddy, 21, 9848022337, Hyderabad
002, siddarth, Battacharya, 22, 9848022338, Kolkata
003, Rajesh, Khanna, 22, 9848022339, Delhi
004, Preethi, Agarwal, 21, 9848022330, Pune
005, Trupthi, Mohanthy, 23, 9848022336, Bhuwaneshwar
006, Archana, Mishra, 23, 9848022335, Chennai
007, Komal, Nayak, 24, 9848022334, trivendram
008, Bharathi, Nambiayar, 24, 9848022333, Chennai
student = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING
PigStorage(',') as (id:int, firstname:chararray, lastname:chararray,
phone:chararray, city:chararray);
student_order = ORDER student BY age DESC;
STORE student_order INTO 'student_order_table' USING org.apache.HCatalog.pig.HCatStorer();
student_limit = LIMIT student_order 4;
Dump student_limit;
脚本的第二条语句将根据年龄按降序排列关系的元组,并将其存储为student_order 。
脚本的第四条语句会将student_order的前四个元组存储为student_limit 。
现在让我们执行sample_script.pig ,如下所示。
$./pig -useHCatalog hdfs://localhost:9000/pig_data/sample_script.pig
现在,检查输出目录(hdfs:user / tmp / hive)以获取输出(part_0000,part_0001)。