如何删除 PySpark 数据框中的列?
在本文中,我们将删除 Pyspark 数据框中的列。为此,我们将使用 drop()函数。此函数可用于从数据框中删除值。
Syntax: dataframe.drop(‘column name’)
用于创建具有三列的学生数据框的Python代码:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data =[["1","sravan","vignan"],
["2","ojaswi","vvit"],
["3","rohith","vvit"],
["4","sridevi","vignan"],
["1","sravan","vignan"],
["5","gnanesh","iit"]]
# specify column names
columns=['student ID','student NAME','college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
print("Actual data in dataframe")
# show dataframe
dataframe.show()
Python3
# delete single column
dataframe=dataframe.drop('student ID')
dataframe.show()
Python3
# delete two columns
dataframe=dataframe.drop(*('student NAME',
'student ID'))
dataframe.show()
Python3
# delete two columns
dataframe=dataframe.drop(*('student NAME',
'student ID',
'college'))
dataframe.show()
输出:
Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
| 1| sravan| vignan|
| 2| ojaswi| vvit|
| 3| rohith| vvit|
| 4| sridevi| vignan|
| 1| sravan| vignan|
| 5| gnanesh| iit|
+----------+------------+-------+
示例 1:删除单个列的Python程序。
在这里,我们将从数据框中删除“学生 ID”,为此我们将使用 drop()。
蟒蛇3
# delete single column
dataframe=dataframe.drop('student ID')
dataframe.show()
输出:
+------------+-------+
|student NAME|college|
+------------+-------+
| sravan| vignan|
| ojaswi| vvit|
| rohith| vvit|
| sridevi| vignan|
| sravan| vignan|
| gnanesh| iit|
+------------+-------+
示例 2:删除多列
在这里,我们将删除数据框中的多列,只是在 drop()函数传递多列。
蟒蛇3
# delete two columns
dataframe=dataframe.drop(*('student NAME',
'student ID'))
dataframe.show()
输出:
+-------+
|college|
+-------+
| vignan|
| vvit|
| vvit|
| vignan|
| vignan|
| iit|
+-------+
示例 3:删除所有列
在这里,我们将删除数据框中的所有列。
蟒蛇3
# delete two columns
dataframe=dataframe.drop(*('student NAME',
'student ID',
'college'))
dataframe.show()
输出:
++
||
++
||
||
||
||
||
||
++