获取 PySpark Dataframe 中特定单元格的值
在本文中,我们将获取 pyspark 数据帧中特定单元格的值。
为此,我们将使用 collect()函数来获取数据框中的所有行。我们可以为 collect函数指定索引(单元格位置)
创建用于演示的数据框:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of employee data with 5 row values
data =[["1","sravan","company 1"],
["2","ojaswi","company 2"],
["3","bobby","company 3"],
["4","rohith","company 2"],
["5","gnanesh","company 1"]]
# specify column names
columns=['Employee ID','Employee NAME',
'Company Name']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
# display dataframe
dataframe.show()
Python3
# display dataframe using collect()
dataframe.collect()
Python3
# display dataframe using collect()
print("First row :",dataframe.collect()[0])
print("Third row :",dataframe.collect()[2])
Python3
# first row - second column
print("first row - second column :",
dataframe.collect()[0][1])
# Third row - Third column
print("Third row - Third column :",
dataframe.collect()[2][1])
# Third row - Third column
print("Third row - Third column :",
dataframe.collect()[2][2])
输出:
collect():这用于以列表格式从数据框中获取所有数据行。
Syntax: dataframe.collect()
示例 1:演示 collect()函数的Python程序
蟒蛇3
# display dataframe using collect()
dataframe.collect()
输出:
[Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′),
Row(Employee ID=’2′, Employee NAME=’ojaswi’, Company Name=’company 2′),
Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′),
Row(Employee ID=’4′, Employee NAME=’rohith’, Company Name=’company 2′),
Row(Employee ID=’5′, Employee NAME=’gnanesh’, Company Name=’company 1′)]
示例 2:获取特定行
为了获取特定行,我们可以将索引方法与 collect 一起使用。在 pyspark 数据框中,索引从 0 开始
Syntax: dataframe.collect()[index_number]
蟒蛇3
# display dataframe using collect()
print("First row :",dataframe.collect()[0])
print("Third row :",dataframe.collect()[2])
输出:
First row : Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′)
Third row : Row(Employee ID=’3′, Employee NAME=’bobby’, Company Name=’company 3′)
示例 3:获取特定单元格
我们必须指定行和列索引以及 collect()函数
Syntax: dataframe.collect()[row_index][column_index]
where, row_index is the row number and column_index is the column number
在这里,我们访问数据帧中单元格的值。
蟒蛇3
# first row - second column
print("first row - second column :",
dataframe.collect()[0][1])
# Third row - Third column
print("Third row - Third column :",
dataframe.collect()[2][1])
# Third row - Third column
print("Third row - Third column :",
dataframe.collect()[2][2])
输出:
first row - second column : sravan
Third row - Third column : bobby
Third row - Third column : company 3