如何从 PySpark DataFrame 中删除列表中给出的多个列名?
在本文中,我们将在Python中的 Pyspark 数据帧中删除列表中给出的多个列。
为此,我们将使用drop()函数。此函数用于从数据框中删除值。
Syntax: dataframe.drop(*[‘column 1′,’column 2′,’column n’])
Where,
- dataframe is the input dataframe
- column names are the columns passed through a list in the dataframe.
用于创建具有三列的学生数据框的Python代码:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data =[["1","sravan","vignan"],
["2","ojaswi","vvit"],
["3","rohith","vvit"],
["4","sridevi","vignan"],
["1","sravan","vignan"],
["5","gnanesh","iit"]]
# specify column names
columns=['student ID','student NAME','college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
print("Actual data in dataframe")
# show dataframe
dataframe.show()
Python3
list = ['student NAME','college']
# drop two columns in dataframe
dataframe = dataframe.drop(*list)
dataframe.show()
Python3
list = ['college']
# drop two columns in dataframe
dataframe=dataframe.drop(*list)
dataframe.show()
Python3
list = ['student ID','student NAME','college']
# drop all columns in dataframe
dataframe=dataframe.drop(*list)
dataframe.show()
输出:
Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
| 1| sravan| vignan|
| 2| ojaswi| vvit|
| 3| rohith| vvit|
| 4| sridevi| vignan|
| 1| sravan| vignan|
| 5| gnanesh| iit|
+----------+------------+-------+
示例 1:将多个列名作为列表删除的程序。
蟒蛇3
list = ['student NAME','college']
# drop two columns in dataframe
dataframe = dataframe.drop(*list)
dataframe.show()
输出:
+----------+
|student ID|
+----------+
| 1|
| 2|
| 3|
| 4|
| 1|
| 5|
+----------+
示例 2:示例程序将一列名称作为列表删除。
蟒蛇3
list = ['college']
# drop two columns in dataframe
dataframe=dataframe.drop(*list)
dataframe.show()
输出:
+----------+------------+
|student ID|student NAME|
+----------+------------+
| 1| sravan|
| 2| ojaswi|
| 3| rohith|
| 4| sridevi|
| 1| sravan|
| 5| gnanesh|
+----------+------------+
示例 3:将所有列名作为列表删除。
蟒蛇3
list = ['student ID','student NAME','college']
# drop all columns in dataframe
dataframe=dataframe.drop(*list)
dataframe.show()
输出:
++
||
++
||
||
||
||
||
||
++