Python PySpark – Union 和 UnionAll

在本文中，我们将讨论Python中 PySpark 中的 Union 和 UnionAll。

PySpark 中的联合

PySpark union()函数用于组合两个或多个具有相同结构或模式的数据帧。如果数据框的架构彼此不同，此函数将返回错误。

句法：

dataFrame1.union(dataFrame2)

Here,

dataFrame1 and dataFrame2 are the dataframes

编程需要懂一点英语

示例 1：

在此示例中，我们组合了两个数据帧 data_frame1 和 data_frame2。请注意，两个数据框的架构是相同的。

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a dataframe
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another dataframe
data_frame2 = spark.createDataFrame(
    [("Naveen", 91.123), ("Piyush", 90.51)],
    ["Student Name", "Overall Percentage"]
)
  
# union()
answer = data_frame1.union(data_frame2)
  
# Print the result of the union()
answer.show()

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a data frame
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another data frame
data_frame2 = spark.createDataFrame(
    [(91.123, "Naveen"), (90.51, "Piyush"), (87.67, "Hitesh")],
    ["Overall Percentage", "Student Name"]
)
  
# Union both the dataframes using union() function
answer = data_frame1.union(data_frame2)
  
# Print the union of both the dataframes
answer.show()

Python3

# Python program to illustrate the
# working of unionAll() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a dataframe
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another dataframe
data_frame2 = spark.createDataFrame(
    [("Naveen", 91.123), ("Piyush", 90.51)],
    ["Student Name", "Overall Percentage"]
)
  
# Union both the dataframes using unionAll() function
answer = data_frame1.unionAll(data_frame2)
  
# Print the union of both the dataframes
answer.show()

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a data frame
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another data frame
data_frame2 = spark.createDataFrame(
    [(91.123, "Naveen"), (90.51, "Piyush"), (87.67, "Hitesh")],
    ["Overall Percentage", "Student Name"]
)
  
# Union both the dataframes using unionAll() function
answer = data_frame1.unionAll(data_frame2)
  
# Print the union of both the dataframes
answer.show()

输出：

示例 2：

在此示例中，我们组合了两个数据帧 data_frame1 和 data_frame2。请注意，两个数据框的架构是不同的。因此，输出不是所需的，因为 union()函数非常适合具有相同结构或模式的数据集。

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a data frame
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another data frame
data_frame2 = spark.createDataFrame(
    [(91.123, "Naveen"), (90.51, "Piyush"), (87.67, "Hitesh")],
    ["Overall Percentage", "Student Name"]
)
  
# Union both the dataframes using union() function
answer = data_frame1.union(data_frame2)
  
# Print the union of both the dataframes
answer.show()

输出：

PySpark 中的 UnionAll()

UnionAll()函数与 union()函数执行相同的任务，但此函数自 Spark “2.0.0” 版本以来已弃用。因此，建议使用 union()函数。

Syntax:

dataFrame1.unionAll(dataFrame2)

Here,

dataFrame1 and dataFrame2 are the dataframes

编程需要懂一点英语

示例 1：

在此示例中，我们组合了两个数据帧 data_frame1 和 data_frame2。请注意，两个数据框的架构是相同的。

Python3

# Python program to illustrate the
# working of unionAll() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a dataframe
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another dataframe
data_frame2 = spark.createDataFrame(
    [("Naveen", 91.123), ("Piyush", 90.51)],
    ["Student Name", "Overall Percentage"]
)
  
# Union both the dataframes using unionAll() function
answer = data_frame1.unionAll(data_frame2)
  
# Print the union of both the dataframes
answer.show()

输出：

示例 2：

在此示例中，我们组合了两个数据帧 data_frame1 和 data_frame2。请注意，两个数据框的架构是不同的。因此，输出不是所需的，因为 unionAll()函数非常适合具有相同结构或模式的数据集。

Python3

# Python program to illustrate the
# working of union() function
  
import pyspark
from pyspark.sql import SparkSession
  
spark = SparkSession.builder.appName('GeeksforGeeks.com').getOrCreate()
  
# Creating a data frame
data_frame1 = spark.createDataFrame(
    [("Bhuwanesh", 82.98), ("Harshit", 80.31)],
    ["Student Name", "Overall Percentage"]
)
  
# Creating another data frame
data_frame2 = spark.createDataFrame(
    [(91.123, "Naveen"), (90.51, "Piyush"), (87.67, "Hitesh")],
    ["Overall Percentage", "Student Name"]
)
  
# Union both the dataframes using unionAll() function
answer = data_frame1.unionAll(data_frame2)
  
# Print the union of both the dataframes
answer.show()

输出：