📌  相关文章
📜  如何重命名多个 PySpark DataFrame 列

📅  最后修改于: 2022-05-13 01:54:20.232000             🧑  作者: Mango

如何重命名多个 PySpark DataFrame 列

在本文中,我们将讨论如何重命名 PySpark Dataframe 中的多列。为此,我们将使用withColumnRenamed()toDF()函数。

创建用于演示的数据框:

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data  with null values
# we can define null values with none
data = [[None, "sravan", "vignan"],
        ["2", None, "vvit"],
        ["3", "rohith", None],
        ["4", "sridevi", "vignan"],
        ["1", None, None],
        ["5", "gnanesh", "iit"]]
  
# specify column names
columns = ['ID', 'NAME', 'college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# show columns
print(dataframe.columns)
  
# display dataframe
dataframe.show()


Python3
# display actual columns
print("Actual columns: ", dataframe.columns)
  
# change the college column name to university 
# and ID to student_id
dataframe = dataframe.withColumnRenamed(
  "college", "university").withColumnRenamed("ID", "student_id")
  
# display modified columns
print("modified columns: ", dataframe.columns)
  
# final dataframe
dataframe.show()


Python3
# display actual columns
print("Actual columns: ", dataframe.columns)
  
# change the college column name to university 
# and ID to student_id
dataframe = dataframe.withColumnRenamed(
  "college", "university").withColumnRenamed(
  "ID", "student_id").withColumnRenamed("NAME", "student_name")
  
# display modified columns
print("modified columns: ", dataframe.columns)
  
# final dataframe
dataframe.show()


Python3
# display actual
print("Actual columns: ", dataframe.columns)
  
# change column names to A,B,C
dataframe = dataframe.toDF(*("A", "B", "C"))
  
# display new columns
print("New columns: ", dataframe.columns)
  
# display dataframe
dataframe.show()


输出:



方法 1:使用 withColumnRenamed()

此方法用于重命名数据框中的列

要更改多列,我们可以指定 n 次函数,用“.”分隔。运算符

示例 1:更改两列的列名的Python程序



蟒蛇3

# display actual columns
print("Actual columns: ", dataframe.columns)
  
# change the college column name to university 
# and ID to student_id
dataframe = dataframe.withColumnRenamed(
  "college", "university").withColumnRenamed("ID", "student_id")
  
# display modified columns
print("modified columns: ", dataframe.columns)
  
# final dataframe
dataframe.show()

输出:

示例 2:重命名所有列

蟒蛇3

# display actual columns
print("Actual columns: ", dataframe.columns)
  
# change the college column name to university 
# and ID to student_id
dataframe = dataframe.withColumnRenamed(
  "college", "university").withColumnRenamed(
  "ID", "student_id").withColumnRenamed("NAME", "student_name")
  
# display modified columns
print("modified columns: ", dataframe.columns)
  
# final dataframe
dataframe.show()

输出:

方法 2:使用 toDF()

此方法用于更改数据框所有列的名称

示例:更改列名的Python程序

蟒蛇3

# display actual
print("Actual columns: ", dataframe.columns)
  
# change column names to A,B,C
dataframe = dataframe.toDF(*("A", "B", "C"))
  
# display new columns
print("New columns: ", dataframe.columns)
  
# display dataframe
dataframe.show()

输出: