📌  相关文章
📜  如何删除 PySpark 数据框中的列?

📅  最后修改于: 2022-05-13 01:54:47.953000             🧑  作者: Mango

如何删除 PySpark 数据框中的列?

在本文中,我们将删除 Pyspark 数据框中的列。为此,我们将使用 drop()函数。此函数可用于从数据框中删除值。

用于创建具有三列的学生数据框的Python代码:

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data 
data =[["1","sravan","vignan"],
       ["2","ojaswi","vvit"],
       ["3","rohith","vvit"],
       ["4","sridevi","vignan"],
       ["1","sravan","vignan"], 
       ["5","gnanesh","iit"]]
  
# specify column names
columns=['student ID','student NAME','college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
print("Actual data in dataframe")
  
# show dataframe
dataframe.show()


Python3
# delete single column
dataframe=dataframe.drop('student ID')
dataframe.show()


Python3
# delete two columns
dataframe=dataframe.drop(*('student NAME',
                           'student ID'))
dataframe.show()


Python3
# delete two columns
dataframe=dataframe.drop(*('student NAME',
                           'student ID',
                           'college'))
dataframe.show()


输出:



Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
|         1|      sravan| vignan|
|         2|      ojaswi|   vvit|
|         3|      rohith|   vvit|
|         4|     sridevi| vignan|
|         1|      sravan| vignan|
|         5|     gnanesh|    iit|
+----------+------------+-------+

示例 1:删除单个列的Python程序。

在这里,我们将从数据框中删除“学生 ID”,为此我们将使用 drop()。

蟒蛇3

# delete single column
dataframe=dataframe.drop('student ID')
dataframe.show()

输出:

+------------+-------+
|student NAME|college|
+------------+-------+
|      sravan| vignan|
|      ojaswi|   vvit|
|      rohith|   vvit|
|     sridevi| vignan|
|      sravan| vignan|
|     gnanesh|    iit|
+------------+-------+

示例 2:删除多列

在这里,我们将删除数据框中的多列,只是在 drop()函数传递多列。

蟒蛇3

# delete two columns
dataframe=dataframe.drop(*('student NAME',
                           'student ID'))
dataframe.show()

输出:

+-------+
|college|
+-------+
| vignan|
|   vvit|
|   vvit|
| vignan|
| vignan|
|    iit|
+-------+

示例 3:删除所有列

在这里,我们将删除数据框中的所有列。

蟒蛇3

# delete two columns
dataframe=dataframe.drop(*('student NAME',
                           'student ID',
                           'college'))
dataframe.show()

输出:

++
||
++
||
||
||
||
||
||
++