📜  获取 PySpark Dataframe 中特定单元格的值

📅  最后修改于: 2022-05-13 01:54:24.448000             🧑  作者: Mango

获取 PySpark Dataframe 中特定单元格的值

在本文中,我们将获取 pyspark 数据帧中特定单元格的值。

为此,我们将使用 collect()函数来获取数据框中的所有行。我们可以为 collect函数指定索引(单元格位置)

创建用于演示的数据框:

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data with 5 row values
data =[["1","sravan","company 1"],
       ["2","ojaswi","company 2"],
       ["3","bobby","company 3"],
       ["4","rohith","company 2"],
       ["5","gnanesh","company 1"]]
  
# specify column names
columns=['Employee ID','Employee NAME',
         'Company Name']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
# display dataframe
dataframe.show()


Python3
# display dataframe using collect()
dataframe.collect()


Python3
# display dataframe using collect()
print("First row :",dataframe.collect()[0])
  
print("Third row :",dataframe.collect()[2])


Python3
# first row - second column
print("first row - second column  :",
      dataframe.collect()[0][1])
  
# Third  row - Third column
print("Third  row - Third column  :",
      dataframe.collect()[2][1])
  
# Third  row - Third column
print("Third  row - Third column  :",
      dataframe.collect()[2][2])


输出:



collect():这用于以列表格式从数据框中获取所有数据行。

示例 1:演示 collect()函数的Python程序

蟒蛇3

# display dataframe using collect()
dataframe.collect()

输出:

示例 2:获取特定行

为了获取特定行,我们可以将索引方法与 collect 一起使用。在 pyspark 数据框中,索引从 0 开始

蟒蛇3

# display dataframe using collect()
print("First row :",dataframe.collect()[0])
  
print("Third row :",dataframe.collect()[2])

输出:

示例 3:获取特定单元格

我们必须指定行和列索引以及 collect()函数

在这里,我们访问数据帧中单元格的值。

蟒蛇3

# first row - second column
print("first row - second column  :",
      dataframe.collect()[0][1])
  
# Third  row - Third column
print("Third  row - Third column  :",
      dataframe.collect()[2][1])
  
# Third  row - Third column
print("Third  row - Third column  :",
      dataframe.collect()[2][2])

输出:

first row - second column  : sravan
Third  row - Third column  : bobby
Third  row - Third column  : company 3