📅  最后修改于: 2020-12-02 05:31:49             🧑  作者: Mango
Pig Latin是使用Apache Pig在Hadoop中分析数据的语言。在本章中,我们将讨论Pig Latin的基础知识,例如Pig Latin语句,数据类型,通用和关系运算符以及Pig Latin UDF。
如前几章所述,Pig的数据模型是完全嵌套的。关系是Pig Latin数据模型的最外层结构。这是一个袋子–
使用Pig Latin处理数据时,语句是基本构造。
这些陈述与关系有关。它们包括表达式和模式。
每个语句以分号(;)结尾。
通过声明,我们将使用Pig Latin提供的运算符执行各种操作。
除了LOAD和STORE,在执行所有其他操作时,Pig Latin语句将一个关系作为输入,并产生另一个关系作为输出。
在Grunt shell中输入Load语句后,将立即执行其语义检查。要查看模式的内容,您需要使用Dump运算符。仅在执行转储操作之后,才会执行将数据加载到文件系统中的MapReduce作业。
下面给出的是Pig Latin语句,该语句将数据加载到Apache Pig。
grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as
( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
下表给出了Pig Pig的数据类型。
S.N. | Data Type | Description & Example |
---|---|---|
1 | int |
Represents a signed 32-bit integer. Example : 8 |
2 | long |
Represents a signed 64-bit integer. Example : 5L |
3 | float |
Represents a signed 32-bit floating point. Example : 5.5F |
4 | double |
Represents a 64-bit floating point. Example : 10.5 |
5 | chararray |
Represents a character array (string) in Unicode UTF-8 format. Example : ‘tutorials point’ |
6 | Bytearray |
Represents a Byte array (blob). |
7 | Boolean |
Represents a Boolean value. Example : true/ false. |
8 | Datetime |
Represents a date-time. Example : 1970-01-01T00:00:00.000+00:00 |
9 | Biginteger |
Represents a Java BigInteger. Example : 60708090709 |
10 | Bigdecimal |
Represents a Java BigDecimal Example : 185.98376256272893883 |
Complex Types | ||
11 | Tuple |
A tuple is an ordered set of fields. Example : (raja, 30) |
12 | Bag |
A bag is a collection of tuples. Example : {(raju,30),(Mohhammad,45)} |
13 | Map |
A Map is a set of key-value pairs. Example : [ ‘name’#’Raju’, ‘age’#30] |
以上所有数据类型的值都可以为NULL。 Apache Pig对待空值的方式与SQL相似。
空值可以是未知值或不存在的值。它用作可选值的占位符。这些空值可以自然发生,也可以是操作的结果。
下表描述了Pig Latin的算术运算运算符。假设a = 10和b = 20。
Operator | Description | Example |
---|---|---|
+ |
Addition − Adds values on either side of the operator |
a + b will give 30 |
− |
Subtraction − Subtracts right hand operand from left hand operand |
a − b will give −10 |
* |
Multiplication − Multiplies values on either side of the operator |
a * b will give 200 |
/ |
Division − Divides left hand operand by right hand operand |
b / a will give 2 |
% |
Modulus − Divides left hand operand by right hand operand and returns remainder |
b % a will give 0 |
? : |
Bincond − Evaluates the Boolean operators. It has three operands as shown below. variable x = (expression) ? value1 if true : value2 if false. |
b = (a == 1)? 20: 30; if a = 1 the value of b is 20. if a!=1 the value of b is 30. |
CASE WHEN THEN ELSE END |
Case − The case operator is equivalent to nested bincond operator. |
CASE f2 % 2 WHEN 0 THEN ‘even’ WHEN 1 THEN ‘odd’ END |
下表描述隐语的比较运算符。
Operator | Description | Example |
---|---|---|
== |
Equal − Checks if the values of two operands are equal or not; if yes, then the condition becomes true. |
(a = b) is not true |
!= |
Not Equal − Checks if the values of two operands are equal or not. If the values are not equal, then condition becomes true. |
(a != b) is true. |
> |
Greater than − Checks if the value of the left operand is greater than the value of the right operand. If yes, then the condition becomes true. |
(a > b) is not true. |
< |
Less than − Checks if the value of the left operand is less than the value of the right operand. If yes, then the condition becomes true. |
(a < b) is true. |
>= |
Greater than or equal to − Checks if the value of the left operand is greater than or equal to the value of the right operand. If yes, then the condition becomes true. |
(a >= b) is not true. |
<= |
Less than or equal to − Checks if the value of the left operand is less than or equal to the value of the right operand. If yes, then the condition becomes true. |
(a <= b) is true. |
matches |
Pattern matching − Checks whether the string in the left-hand side matches with the constant in the right-hand side. |
f1 matches ‘.*tutorial.*’ |
下表描述了Pig Latin的Type构造运算符。
Operator | Description | Example |
---|---|---|
() |
Tuple constructor operator − This operator is used to construct a tuple. |
(Raju, 30) |
{} |
Bag constructor operator − This operator is used to construct a bag. |
{(Raju, 30), (Mohammad, 45)} |
[] |
Map constructor operator − This operator is used to construct a tuple. |
[name#Raja, age#30] |
下表描述了Pig Latin的关系运算符。
Operator | Description |
---|---|
Loading and Storing | |
LOAD | To Load the data from the file system (local/HDFS) into a relation. |
STORE | To save a relation to the file system (local/HDFS). |
Filtering | |
FILTER | To remove unwanted rows from a relation. |
DISTINCT | To remove duplicate rows from a relation. |
FOREACH, GENERATE | To generate data transformations based on columns of data. |
STREAM | To transform a relation using an external program. |
Grouping and Joining | |
JOIN | To join two or more relations. |
COGROUP | To group the data in two or more relations. |
GROUP | To group the data in a single relation. |
CROSS | To create the cross product of two or more relations. |
Sorting | |
ORDER | To arrange a relation in a sorted order based on one or more fields (ascending or descending). |
LIMIT | To get a limited number of tuples from a relation. |
Combining and Splitting | |
UNION | To combine two or more relations into a single relation. |
SPLIT | To split a single relation into two or more relations. |
Diagnostic Operators | |
DUMP | To print the contents of a relation on the console. |
DESCRIBE | To describe the schema of a relation. |
EXPLAIN | To view the logical, physical, or MapReduce execution plans to compute a relation. |
ILLUSTRATE | To view the step-by-step execution of a series of statements. |