如何选择最后一行并按索引访问 PySpark 数据帧?
在本文中,我们将讨论如何选择最后一行并通过索引访问 pyspark 数据帧。
创建用于演示的数据框:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1","sravan","vignan"],
["2","ojaswi","vvit"],
["3","rohith","vvit"],
["4","sridevi","vignan"],
["1","sravan","vignan"],
["5","gnanesh","iit"]]
# specify column names
columns = ['student ID','student NAME','college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
# show dataframe
dataframe.show()
Python3
# access last row of the dataframe
dataframe.tail(1)
Python3
# access last 5 rows of the
# dataframe
dataframe.tail(5)
Python3
# select column with column number 1
dataframe.select(dataframe.columns[1]).show()
Python3
# select column with column number slice
# operator
dataframe.select(dataframe.columns[0:3]).show()
输出:
从数据框中选择最后一行
示例 1:使用 tail()函数。
该函数用于访问数据帧的最后一行
Syntax: dataframe.tail(n)
where
- n is the number of rows to be selected from the last.
- dataframe is the input dataframe
我们可以使用 n = 1 只选择最后一行。
示例 1:选择最后一行。
蟒蛇3
# access last row of the dataframe
dataframe.tail(1)
输出:
[Row(student ID=’5′, student NAME=’gnanesh’, college=’iit’)]
示例 2:访问最后 N 行的Python程序。
蟒蛇3
# access last 5 rows of the
# dataframe
dataframe.tail(5)
输出:
[Row(student ID='2', student NAME='ojaswi', college='vvit'),
Row(student ID='3', student NAME='rohith', college='vvit'),
Row(student ID='4', student NAME='sridevi', college='vignan'),
Row(student ID='1', student NAME='sravan', college='vignan'),
Row(student ID='5', student NAME='gnanesh', college='iit')]
按列索引访问数据框
在这里,我们将根据列号选择数据框。为了通过使用 pyspark 数据框中的列号选择特定列,我们使用 select()函数
Syntax: dataframe.select(dataframe.columns[column_number]).show()
where,
- dataframe is the dataframe name
- dataframe.columns[]: is the method which can take column number as an input and select those column
- show() function is used to display the selected column
示例1:基于列号访问列的Python程序
蟒蛇3
# select column with column number 1
dataframe.select(dataframe.columns[1]).show()
输出:
+------------+
|student NAME|
+------------+
| sravan|
| ojaswi|
| rohith|
| sridevi|
| sravan|
| gnanesh|
+------------+
示例2:根据列号访问多列,这里我们将使用切片运算符选择多列,最多可以访问n列
Syntax: dataframe.select(dataframe.columns[column_start:column_end]).show()
where: column_start is the starting index and column_end is the ending index.
蟒蛇3
# select column with column number slice
# operator
dataframe.select(dataframe.columns[0:3]).show()
输出: