📜  从 PySpark DataFrame 中删除一列或多列

📅  最后修改于: 2022-05-13 01:54:28.123000             🧑  作者: Mango

从 PySpark DataFrame 中删除一列或多列

在本文中,我们将讨论如何删除 Pyspark 数据框中的列。

在 pyspark 中,可以使用drop()函数从数据框中删除值/列。

用于创建具有三列的学生数据框的Python代码:

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data with 5 row values
data =[["1", "sravan", "company 1"],
       ["3", "bobby", "company 3"],
       ["2", "ojaswi", "company 2"],
       ["1", "sravan", "company 1"],
       ["3", "bobby", "company 3"],
       ["4", "rohith", "company 2"],
       ["5", "gnanesh", "company 1"]]
  
# specify column names
columns = ['Employee ID','Employee NAME','Company Name']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
dataframe.show()


Python3
# delete single column
dataframe = dataframe.drop('Employee ID')
dataframe.show()


Python3
# delete two columns
dataframe = dataframe.drop(*('Employee NAME',
                             'Employee ID'))
dataframe.show()


Python3
list = ['Employee ID','Employee NAME','Company Name']
  
# delete two columns
dataframe = dataframe.drop(*list)
dataframe.show()


输出:



+-----------+-------------+------------+
|Employee ID|Employee NAME|Company Name|
+-----------+-------------+------------+
|          1|       sravan|   company 1|
|          3|        bobby|   company 3|
|          2|       ojaswi|   company 2|
|          1|       sravan|   company 1|
|          3|        bobby|   company 3|
|          4|       rohith|   company 2|
|          5|      gnanesh|   company 1|
+-----------+-------------+------------+

示例 1:删除单个列。

在这里,我们将从数据框中删除单个列。

代码:

蟒蛇3

# delete single column
dataframe = dataframe.drop('Employee ID')
dataframe.show()

输出:

+-------------+------------+
|Employee NAME|Company Name|
+-------------+------------+
|       sravan|   company 1|
|        bobby|   company 3|
|       ojaswi|   company 2|
|       sravan|   company 1|
|        bobby|   company 3|
|       rohith|   company 2|
|      gnanesh|   company 1|
+-------------+------------+Example 2:

示例 2:删除多列。

在这里,我们将从数据框中删除多个列。



代码:

蟒蛇3

# delete two columns
dataframe = dataframe.drop(*('Employee NAME',
                             'Employee ID'))
dataframe.show()

输出:

+------------+
|Company Name|
+------------+
|   company 1|
|   company 3|
|   company 2|
|   company 1|
|   company 3|
|   company 2|
|   company 1|
+------------+

示例 3:删除所有列

在这里,我们将从数据框中删除所有列,为此我们将列的名称作为列表并将其传递给 drop()。

蟒蛇3

list = ['Employee ID','Employee NAME','Company Name']
  
# delete two columns
dataframe = dataframe.drop(*list)
dataframe.show()

输出:

++
||
++
||
||
||
||
||
||
||
++