从嵌套字典创建 PySpark 数据框
在本文中,我们将讨论从嵌套字典创建 Pyspark 数据框。
我们将使用 pyspark 中的 createDataFrame() 方法来创建 DataFrame。为此,我们将使用嵌套字典列表并将该对提取为键和值。通过提及嵌套字典中的 items()函数来选择键值对
[Row(**{'': k, **v}) for k,v in data.items()]
示例1:用字典中嵌套地址的字典创建大学数据的Python程序
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# creating nested dictionary
data = {
'student_1': {
'student id': 7058,
'country': 'India',
'state': 'AP',
'district': 'Guntur'
},
'student_2': {
'student id': 7059,
'country': 'Srilanka',
'state': 'X',
'district': 'Y'
}
}
# taking row data
rowdata = [Row(**{'': k, **v}) for k,
v in data.items()]
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
'student id', 'country', 'state', 'district')
# display pyspark dataframe
final.show()
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# creating nested dictionary
data = {
'student_1': {
'student id': 7058,
'country': 'India',
'state': 'AP'
},
'student_2': {
'student id': 7059,
'country': 'Srilanka',
'state': 'X'
}
}
# taking row data
rowdata = [Row(**{'': k, **v}) for k, v in data.items()]
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
'student id', 'country', 'state')
# display pyspark dataframe
final.show()
输出:
+----------+--------+-----+--------+
|student id| country|state|district|
+----------+--------+-----+--------+
| 7058| India| AP| Guntur|
| 7059|Srilanka| X| Y|
+----------+--------+-----+--------+
示例 2:创建具有 3 列(3 个键)的嵌套字典的Python程序
蟒蛇3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# creating nested dictionary
data = {
'student_1': {
'student id': 7058,
'country': 'India',
'state': 'AP'
},
'student_2': {
'student id': 7059,
'country': 'Srilanka',
'state': 'X'
}
}
# taking row data
rowdata = [Row(**{'': k, **v}) for k, v in data.items()]
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
'student id', 'country', 'state')
# display pyspark dataframe
final.show()
输出:
+----------+--------+-----+
|student id| country|state|
+----------+--------+-----+
| 7058| India| AP|
| 7059|Srilanka| X|
+----------+--------+-----+