📜  从嵌套字典创建 PySpark 数据框

📅  最后修改于: 2022-05-13 01:55:46.360000             🧑  作者: Mango

从嵌套字典创建 PySpark 数据框

在本文中,我们将讨论从嵌套字典创建 Pyspark 数据框。

我们将使用 pyspark 中的 createDataFrame() 方法来创建 DataFrame。为此,我们将使用嵌套字典列表并将该对提取为键和值。通过提及嵌套字典中的 items()函数来选择键值对

[Row(**{'': k, **v}) for k,v in data.items()]

示例1:用字典中嵌套地址的字典创建大学数据的Python程序

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# creating nested dictionary
data = {
    'student_1': {
        'student id': 7058,
        'country': 'India',
        'state': 'AP',
        'district': 'Guntur'
    },
    'student_2': {
        'student id': 7059,
        'country': 'Srilanka',
        'state': 'X',
        'district': 'Y'
    }
}
  
# taking row data
rowdata = [Row(**{'': k, **v}) for k,
           v in data.items()]
  
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
  'student id', 'country', 'state', 'district')
  
# display pyspark dataframe
final.show()


Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# creating nested dictionary
data = {
    'student_1': {
        'student id': 7058,
        'country': 'India',
        'state': 'AP'
    },
    'student_2': {
        'student id': 7059,
        'country': 'Srilanka',
        'state': 'X'
  
    }
}
  
# taking row data
rowdata = [Row(**{'': k, **v}) for k, v in data.items()]
  
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
  'student id', 'country', 'state')
  
# display pyspark dataframe
final.show()


输出:



+----------+--------+-----+--------+
|student id| country|state|district|
+----------+--------+-----+--------+
|      7058|   India|   AP|  Guntur|
|      7059|Srilanka|    X|       Y|
+----------+--------+-----+--------+

示例 2:创建具有 3 列(3 个键)的嵌套字典的Python程序

蟒蛇3

# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
from pyspark.sql import Row
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# creating nested dictionary
data = {
    'student_1': {
        'student id': 7058,
        'country': 'India',
        'state': 'AP'
    },
    'student_2': {
        'student id': 7059,
        'country': 'Srilanka',
        'state': 'X'
  
    }
}
  
# taking row data
rowdata = [Row(**{'': k, **v}) for k, v in data.items()]
  
# creating the pyspark dataframe
final = spark.createDataFrame(rowdata).select(
  'student id', 'country', 'state')
  
# display pyspark dataframe
final.show()

输出:

+----------+--------+-----+
|student id| country|state|
+----------+--------+-----+
|      7058|   India|   AP|
|      7059|Srilanka|    X|
+----------+--------+-----+