PySpark DataFrame – 选择除一个或一组列之外的所有列

在本文中，我们将从 Pyspark 数据框中提取除一组列或一列之外的所有列。为此，我们将使用 select()、drop() 函数。

但首先，让我们为演示创建 Dataframe。

Python3

# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data
data = [["1", "sravan", "vignan"],
        ["2", "ojaswi", "vvit"],
        ["3", "rohith", "vvit"],
        ["4", "sridevi", "vignan"],
        ["1", "sravan", "vignan"],
        ["5", "gnanesh", "iit"]]
  
# specify column names
columns = ['student ID', 'student NAME', 'college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
print('Actual data in dataframe')
dataframe.show()

Python3

# drop student id
dataframe.drop('student ID').show()

Python3

# drop student id and college
dataframe.drop('student ID','college').show()

Python3

# select student id 
dataframe.select('student ID').show()

Python3

# select student id and student name
dataframe.select('student ID','student NAME').show()

输出：

方法一：使用 drop()函数

drop() 用于从数据框中删除列。

Syntax: dataframe.drop(‘column_names’)

Where dataframe is the input dataframe and column names are the columns to be dropped

编程需要懂一点英语

示例：通过删除一列来选择数据的Python程序

蟒蛇3

# drop student id
dataframe.drop('student ID').show()

输出：

示例 2：删除多个列（列集）的Python程序

蟒蛇3

# drop student id and college
dataframe.drop('student ID','college').show()

输出：

方法二：使用select()函数

此函数用于从数据框中选择列

Syntax: dataframe.select(columns)

Where dataframe is the input dataframe and columns are the input columns

编程需要懂一点英语

示例 1：从数据框中选择一列。

蟒蛇3

# select student id 
dataframe.select('student ID').show()

输出：

示例2： Python程序选择两列id和name

蟒蛇3

# select student id and student name
dataframe.select('student ID','student NAME').show()

输出：