📜  如何从多个列表创建 PySpark 数据框?

📅  最后修改于: 2022-05-13 01:55:26.442000             🧑  作者: Mango

如何从多个列表创建 PySpark 数据框?

在本文中,我们将讨论如何从多个列表创建 Pyspark 数据框。

方法

  • 从多个列表创建数据并在另一个列表中给出列名。因此,为了完成我们的任务,我们将使用 zip 方法。
  • 将此压缩数据传递给 spark.createDataFrame() 方法

例子

示例 1: Python程序创建两个列表并使用这两个列表创建数据框

Python3
# importing module
import pyspark
  
# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving 
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of college data with dictionary
# with two lists in three elements each
data = [1, 2, 3]
data1 = ["sravan", "bobby", "ojaswi"]
  
# specify column names
columns = ['ID', 'NAME']
  
# creating a dataframe by zipping the two lists
dataframe = spark.createDataFrame(zip(data, data1), columns)
  
# show data frame
dataframe.show()


Python3
# importing module
import pyspark
  
# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving 
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of college data with dictionary
# with four lists in three elements each
data = [1, 2, 3]
data1 = ["sravan", "bobby", "ojaswi"]
data2 = ["iit-k", "iit-mumbai", "vignan university"]
data3 = ["AP", "TS", "UP"]
  
# specify column names
columns = ['ID', 'NAME', 'COLLEGE', 'ADDRESS']
  
# creating a dataframe by zipping 
# the two lists
dataframe = spark.createDataFrame(
  zip(data, data1, data2, data3), columns)
  
# show data frame
dataframe.show()


输出:

示例 2:创建 4 个列表并创建数据框的Python程序

蟒蛇3

# importing module
import pyspark
  
# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving 
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of college data with dictionary
# with four lists in three elements each
data = [1, 2, 3]
data1 = ["sravan", "bobby", "ojaswi"]
data2 = ["iit-k", "iit-mumbai", "vignan university"]
data3 = ["AP", "TS", "UP"]
  
# specify column names
columns = ['ID', 'NAME', 'COLLEGE', 'ADDRESS']
  
# creating a dataframe by zipping 
# the two lists
dataframe = spark.createDataFrame(
  zip(data, data1, data2, data3), columns)
  
# show data frame
dataframe.show()

输出: