如何将字典列表转换为 Pyspark DataFrame ?
在本文中,我们将讨论从字典列表中创建 Pyspark 数据框。
我们将使用带有帮助 createDataFrame() 方法的字典列表在 PySpark 中创建一个数据框。 data 属性采用字典列表,columns 属性采用名称列表。
dataframe = spark.createDataFrame(data, columns)
示例 1:
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of dictionaries of students data
data = [{"Student ID": 1, "Student name": "sravan"},
{"Student ID": 2, "Student name": "Jyothika"},
{"Student ID": 3, "Student name": "deepika"},
{"Student ID": 4, "Student name": "harsha"}]
# creating a dataframe
dataframe = spark.createDataFrame(data)
# display dataframe
dataframe.show()
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of dictionaries of crop data
data = [{"Crop ID": 1, "name": "rose", "State": "AP"},
{"Crop ID": 2, "name": "lilly", "State": "TS"},
{"Crop ID": 3, "name": "lotus", "State": "Maharashtra"},
{"Crop ID": 4, "name": "jasmine", "State": "AP"}]
# creating a dataframe
dataframe = spark.createDataFrame(data)
# display dataframe
dataframe.show()
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of dictionaries of crop data
data = [{"Crop ID": 1, "name": "rose", "State": "AP"},
{"Crop ID": 2, "name": "lilly", "State": "TS"},
{"Crop ID": 3, "name": "lotus", "State": "Maharashtra"},
{"Crop ID": 4, "name": "jasmine", "State": "AP"}]
# creating a dataframe
dataframe = spark.createDataFrame(data)
# display dataframe count
dataframe.count()
输出:
示例 2:
蟒蛇3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of dictionaries of crop data
data = [{"Crop ID": 1, "name": "rose", "State": "AP"},
{"Crop ID": 2, "name": "lilly", "State": "TS"},
{"Crop ID": 3, "name": "lotus", "State": "Maharashtra"},
{"Crop ID": 4, "name": "jasmine", "State": "AP"}]
# creating a dataframe
dataframe = spark.createDataFrame(data)
# display dataframe
dataframe.show()
输出:
示例 3:
蟒蛇3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of dictionaries of crop data
data = [{"Crop ID": 1, "name": "rose", "State": "AP"},
{"Crop ID": 2, "name": "lilly", "State": "TS"},
{"Crop ID": 3, "name": "lotus", "State": "Maharashtra"},
{"Crop ID": 4, "name": "jasmine", "State": "AP"}]
# creating a dataframe
dataframe = spark.createDataFrame(data)
# display dataframe count
dataframe.count()
输出:
4