在 PySpark 中将 Row 转换为列表 RDD
在本文中,我们将在 Pyspark 中将 Row 转换为列表 RDD。
从 Row 创建 RDD 以进行演示:
Python3
# import Row and SparkSession
from pyspark.sql import SparkSession, Row
# create sparksession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
# create student data with Row function
data = [Row(name="sravan kumar",
subjects=["Java", "python", "C++"],
state="AP"),
Row(name="Ojaswi",
lang=["Spark", "Java", "C++"],
state="Telangana"),
Row(name="rohith",
subjects=["DS", "PHP", ".net"],
state="AP"),
Row(name="bobby",
lang=["Python", "C", "sql"],
state="Delhi"),
Row(name="rohith",
lang=["CSharp", "VB"],
state="Telangana")]
rdd = spark.sparkContext.parallelize(data)
# display actual rdd
rdd.collect()
Python3
# convert rdd to list by using map() method
b = rdd.map(list)
# display the data in b with collect method
for i in b.collect():
print(i)
输出:
[Row(name='sravan kumar', subjects=['Java', 'python', 'C++'], state='AP'),
Row(name='Ojaswi', lang=['Spark', 'Java', 'C++'], state='Telangana'),
Row(name='rohith', subjects=['DS', 'PHP', '.net'], state='AP'),
Row(name='bobby', lang=['Python', 'C', 'sql'], state='Delhi'),
Row(name='rohith', lang=['CSharp', 'VB'], state='Telangana')]
使用 map()函数,我们可以将其转换为列表 RDD
Syntax: rdd_data.map(list)
where, rdd_data is the data is of type rdd.
最后,通过使用 collect 方法,我们可以将数据显示在列表 RDD 中。
蟒蛇3
# convert rdd to list by using map() method
b = rdd.map(list)
# display the data in b with collect method
for i in b.collect():
print(i)
输出:
['sravan kumar', ['Java', 'python', 'C++'], 'AP']
['Ojaswi', ['Spark', 'Java', 'C++'], 'Telangana']
['rohith', ['DS', 'PHP', '.net'], 'AP']
['bobby', ['Python', 'C', 'sql'], 'Delhi']
['rohith', ['CSharp', 'VB'], 'Telangana']