如何按索引重命名 PySpark 数据框列?
在本文中,我们将了解如何使用Python按索引重命名 PySpark Dataframe 列。我们可以使用 Dataframe.withColumnRenamed() 和 Dataframe.columns[] 方法按索引重命名列。在 Dataframe.columns[] 的帮助下,我们获得了特定索引上列的名称,然后我们使用 withColumnRenamed() 方法将该名称替换为另一个名称。
示例 1:以下程序是通过索引重命名列。
Python3
# importing required module
import pyspark
from pyspark.sql import SparkSession
# creating sparksession and giving
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# demo data of college students
data = [["Mukul", 23, "BBA"],
["Robin", 21, "BCA"],
["Rohit", 24, "MBA"],
["Suraj", 25, "MBA"],
["Krish", 22, "BCA"]]
# giving column names of dataframe
columns = ["Name", "Age", "Course"]
# creating a dataframe
dataframe = spark.createDataFrame(data, columns)
# Rename dataframe
df = dataframe.withColumnRenamed(dataframe.columns[0],
"Student Name")
# Original dataframe
print("Original Dataframe")
dataframe.show()
# Dataframe after rename column
print("Dataframe after rename 0 index column")
df.show()
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [[123, "Sagar", "Rajveer", 22, "BBA"],
[124, "Rajeev", "Mukesh", 23, "BBA"],
[125, "Harish", "Parveen", 25, "BBA"],
[126, "Gagan", "Rohit", 24, "BBA"],
[127, "Rakesh", "Mayank", 25, "BBA"],
[128, "Gnanesh", "Dleep", 26, "BBA"]]
# specify column names
columns = ['ID', 'Name', 'Father Name',
'Age', "Course", ]
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# display original dataframe
print('Actual data in dataframe')
dataframe.show()
# Rename column
df = dataframe.withColumnRenamed(dataframe.columns[1],
"Student Name").withColumnRenamed(
dataframe.columns[3], "Student Age")
# display dataframe after rename column
print('After rename 1 and 3 index column')
df.show()
输出:
示例2:下面的程序是通过这些索引重命名多个列。
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving
# an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [[123, "Sagar", "Rajveer", 22, "BBA"],
[124, "Rajeev", "Mukesh", 23, "BBA"],
[125, "Harish", "Parveen", 25, "BBA"],
[126, "Gagan", "Rohit", 24, "BBA"],
[127, "Rakesh", "Mayank", 25, "BBA"],
[128, "Gnanesh", "Dleep", 26, "BBA"]]
# specify column names
columns = ['ID', 'Name', 'Father Name',
'Age', "Course", ]
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# display original dataframe
print('Actual data in dataframe')
dataframe.show()
# Rename column
df = dataframe.withColumnRenamed(dataframe.columns[1],
"Student Name").withColumnRenamed(
dataframe.columns[3], "Student Age")
# display dataframe after rename column
print('After rename 1 and 3 index column')
df.show()
输出: