选择 PySpark 数据框的特定列及其位置
在本文中,我们将讨论如何使用Pythonpyspark 数据帧中的位置来选择特定列。为此,我们将在 dataframe.select() 方法中使用 dataframe.columns() 方法。
Syntax:
dataframe.select(dataframe.columns[column_number]).show()
where,
- dataframe is the dataframe name
- dataframe.columns[]: is the methid which can take column number as an input and select those column
- show() function is used to display the selected column
让我们创建一个示例数据框。
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1", "sravan", "vignan"], ["2", "ojaswi", "vvit"],
["3", "rohith", "vvit"], ["4", "sridevi", "vignan"],
["1", "sravan", "vignan"], ["5", "gnanesh", "iit"]]
# specify column names
columns = ['student ID', 'student NAME', 'college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
print("Actual data in dataframe")
# show dataframe
dataframe.show()
Python3
# select column with column number 1
dataframe.select(dataframe.columns[1]).show()
Python3
#select column with column number slice operator
dataframe.select(dataframe.columns[1:3]).show()
输出:
按列号选择列
蟒蛇3
# select column with column number 1
dataframe.select(dataframe.columns[1]).show()
输出:
我们还可以使用切片运算符(:) 选择具有相同函数的多列。它最多可以访问 n 列。
Syntax: dataframe.select(dataframe.columns[column_start:column_end]).show()
蟒蛇3
#select column with column number slice operator
dataframe.select(dataframe.columns[1:3]).show()
输出: