📜  猪拉丁语¢基础知识

📅  最后修改于: 2020-12-02 05:31:49             🧑  作者: Mango


Pig Latin是使用Apache Pig在Hadoop中分析数据的语言。在本章中,我们将讨论Pig Latin的基础知识,例如Pig Latin语句,数据类型,通用和关系运算符以及Pig Latin UDF。

猪拉丁语–数据模型

如前几章所述,Pig的数据模型是完全嵌套的。关系是Pig Latin数据模型的最外层结构。这是一个袋子

  • 包是元组的集合。
  • 元组是一组有序的字段。
  • 字段是一条数据。

猪拉丁语– Statemets

使用Pig Latin处理数据时,语句是基本构造。

  • 这些陈述与关系有关。它们包括表达式模式

  • 每个语句以分号(;)结尾。

  • 通过声明,我们将使用Pig Latin提供的运算符执行各种操作。

  • 除了LOAD和STORE,在执行所有其他操作时,Pig Latin语句将一个关系作为输入,并产生另一个关系作为输出。

  • 在Grunt shell中输入Load语句后,将立即执行其语义检查。要查看模式的内容,您需要使用Dump运算符。仅在执行转储操作之后,才会执行将数据加载到文件系统中的MapReduce作业。

下面给出的是Pig Latin语句,该语句将数据加载到Apache Pig。

grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as 
   ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Pig Latin –数据类型

下表给出了Pig Pig的数据类型。

S.N. Data Type Description & Example
1 int

Represents a signed 32-bit integer.

Example : 8

2 long

Represents a signed 64-bit integer.

Example : 5L

3 float

Represents a signed 32-bit floating point.

Example : 5.5F

4 double

Represents a 64-bit floating point.

Example : 10.5

5 chararray

Represents a character array (string) in Unicode UTF-8 format.

Example : ‘tutorials point’

6 Bytearray

Represents a Byte array (blob).

7 Boolean

Represents a Boolean value.

Example : true/ false.

8 Datetime

Represents a date-time.

Example : 1970-01-01T00:00:00.000+00:00

9 Biginteger

Represents a Java BigInteger.

Example : 60708090709

10 Bigdecimal

Represents a Java BigDecimal

Example : 185.98376256272893883

Complex Types
11 Tuple

A tuple is an ordered set of fields.

Example : (raja, 30)

12 Bag

A bag is a collection of tuples.

Example : {(raju,30),(Mohhammad,45)}

13 Map

A Map is a set of key-value pairs.

Example : [ ‘name’#’Raju’, ‘age’#30]

空值

以上所有数据类型的值都可以为NULL。 Apache Pig对待空值的方式与SQL相似。

空值可以是未知值或不存在的值。它用作可选值的占位符。这些空值可以自然发生,也可以是操作的结果。

猪拉丁语–算术运算符

下表描述了Pig Latin的算术运算运算符。假设a = 10和b = 20。

Operator Description Example
+

Addition − Adds values on either side of the operator

a + b will give 30

Subtraction − Subtracts right hand operand from left hand operand

a − b will give −10
*

Multiplication − Multiplies values on either side of the operator

a * b will give 200
/

Division − Divides left hand operand by right hand operand

b / a will give 2
%

Modulus − Divides left hand operand by right hand operand and returns remainder

b % a will give 0
? :

Bincond − Evaluates the Boolean operators. It has three operands as shown below.

variable x = (expression) ? value1 if true : value2 if false.

b = (a == 1)? 20: 30;

if a = 1 the value of b is 20.

if a!=1 the value of b is 30.

CASE

WHEN

THEN

ELSE END

Case − The case operator is equivalent to nested bincond operator.

CASE f2 % 2

WHEN 0 THEN ‘even’

WHEN 1 THEN ‘odd’

END

Pig Latin –比较运算符

下表描述隐语的比较运算符。

Operator Description Example
==

Equal − Checks if the values of two operands are equal or not; if yes, then the condition becomes true.

(a = b) is not true
!=

Not Equal − Checks if the values of two operands are equal or not. If the values are not equal, then condition becomes true.

(a != b) is true.
>

Greater than − Checks if the value of the left operand is greater than the value of the right operand. If yes, then the condition becomes true.

(a > b) is not true.
<

Less than − Checks if the value of the left operand is less than the value of the right operand. If yes, then the condition becomes true.

(a < b) is true.
>=

Greater than or equal to − Checks if the value of the left operand is greater than or equal to the value of the right operand. If yes, then the condition becomes true.

(a >= b) is not true.
<=

Less than or equal to − Checks if the value of the left operand is less than or equal to the value of the right operand. If yes, then the condition becomes true.

(a <= b) is true.
matches

Pattern matching − Checks whether the string in the left-hand side matches with the constant in the right-hand side.

f1 matches ‘.*tutorial.*’

拉丁猪–类型建筑运营商

下表描述了Pig Latin的Type构造运算符。

Operator Description Example
()

Tuple constructor operator − This operator is used to construct a tuple.

(Raju, 30)
{}

Bag constructor operator − This operator is used to construct a bag.

{(Raju, 30), (Mohammad, 45)}
[]

Map constructor operator − This operator is used to construct a tuple.

[name#Raja, age#30]

拉丁猪–关系运算

下表描述了Pig Latin的关系运算符。

Operator Description
Loading and Storing
LOAD To Load the data from the file system (local/HDFS) into a relation.
STORE To save a relation to the file system (local/HDFS).
Filtering
FILTER To remove unwanted rows from a relation.
DISTINCT To remove duplicate rows from a relation.
FOREACH, GENERATE To generate data transformations based on columns of data.
STREAM To transform a relation using an external program.
Grouping and Joining
JOIN To join two or more relations.
COGROUP To group the data in two or more relations.
GROUP To group the data in a single relation.
CROSS To create the cross product of two or more relations.
Sorting
ORDER To arrange a relation in a sorted order based on one or more fields (ascending or descending).
LIMIT To get a limited number of tuples from a relation.
Combining and Splitting
UNION To combine two or more relations into a single relation.
SPLIT To split a single relation into two or more relations.
Diagnostic Operators
DUMP To print the contents of a relation on the console.
DESCRIBE To describe the schema of a relation.
EXPLAIN To view the logical, physical, or MapReduce execution plans to compute a relation.
ILLUSTRATE To view the step-by-step execution of a series of statements.