如何重命名 PySpark 数据框中的多列?
在本文中,我们将了解如何重命名 PySpark Dataframe 中的多个列。
在开始之前,让我们使用 pyspark 创建一个数据框:
Python3
# importing module
import pyspark
from pyspark.sql.functions import col
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1", "sravan", "vignan"],
["2", "ojaswi", "vvit"],
["3", "rohith", "vvit"],
["4", "sridevi", "vignan"],
["1", "sravan", "vignan"],
["5", "gnanesh", "iit"]]
# specify column names
columns = ['student ID', 'student NAME', 'college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
print("Actual data in dataframe")
# show dataframe
dataframe.show()
Python3
dataframe.withColumnRenamed("college",
"College Name").show()
Python3
df2 = dataframe.withColumnRenamed("student ID",
"Id").withColumnRenamed("college",
"College_Name")
df2.show()
Python3
Data_list = ["College Id"," Name"," College"]
new_df = dataframe.toDF(*Data_list)
new_df.show()
输出:
方法 1:使用 withColumnRenamed。
这里我们将使用 withColumnRenamed() 来重命名现有的列名。
Syntax: withColumnRenamed( Existing_col, New_col)
Parameters:
- Existing_col: Old column name.
- New_col: New column name.
示例 1:重命名单列。
蟒蛇3
dataframe.withColumnRenamed("college",
"College Name").show()
输出:
示例 2:重命名多个列。
蟒蛇3
df2 = dataframe.withColumnRenamed("student ID",
"Id").withColumnRenamed("college",
"College_Name")
df2.show()
输出:
方法 2:使用 toDF()
此函数返回一个具有新指定列名称的新 DataFrame。
Syntax: toDF(*col)
Where, col is a new column name
在这个例子中,我们将创建一个新列名的顺序列表并将其传递给 toDF函数。
蟒蛇3
Data_list = ["College Id"," Name"," College"]
new_df = dataframe.toDF(*Data_list)
new_df.show()
输出: