按升序或降序对 PySpark DataFrame 列进行排序
在本文中,我们将对 pyspark 中的数据框列进行排序。为此,我们在升序和降序排序中使用sort()和orderBy()函数。
让我们创建一个示例数据框。
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of employee data
data = [["1", "sravan", "company 1"],
["2", "ojaswi", "company 1"],
["3", "rohith", "company 2"],
["4", "sridevi", "company 1"],
["1", "sravan", "company 1"],
["4", "sridevi", "company 1"]]
# specify column names
columns = ['Employee_ID', 'Employee NAME', 'Company']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# display data in the dataframe
dataframe.show()
Python3
# sort the dataframe based on
# employee name column in ascending order
dataframe.sort(['Employee NAME'],
ascending = True).show()
Python3
# sort the dataframe based on
# employee name column in descending order
dataframe.sort(['Employee NAME'],
ascending = False).show()
Python3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.sort(['Employee_ID','Employee NAME'],
ascending = True).show()
Python3
# sort the dataframe based on employee ID ,
# company and employee Name columns in descending order
dataframe.sort(['Employee_ID','Employee NAME',
'Company'], ascending = False).show()
Python3
dataframe.sort(dataframe.Employee_ID.asc()).show()
Python3
dataframe.sort(dataframe.Employee_ID.desc()).show()
Python3
# sort the dataframe based on employee I
# columns in descending order
dataframe.orderBy(['Employee_ID'],
ascending=False).show()
Python3
# sort the dataframe based on
# Employee ID in descending order
dataframe.orderBy(['Employee_ID'],
ascending = False).show()
Python3
# sort the dataframe based on employee ID
# and employee Name columns in descending order
dataframe.orderBy(['Employee ID','Employee NAME'],
ascending = False).show()
Python3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.orderBy(['Employee_ID','Employee NAME'],
ascending =True).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 1| sravan|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
使用 sort()函数
sort函数用于对数据框列进行排序。
Syntax: dataframe.sort([‘column name’], ascending=True).show()
示例 1:使用 Sort() 对一列进行升序排列
根据员工姓名按升序对数据进行排序
蟒蛇3
# sort the dataframe based on
# employee name column in ascending order
dataframe.sort(['Employee NAME'],
ascending = True).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
根据员工姓名按降序对数据进行排序:
Syntax: dataframe.sort([‘column name’], ascending = False).show()
代码:
蟒蛇3
# sort the dataframe based on
# employee name column in descending order
dataframe.sort(['Employee NAME'],
ascending = False).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
+-----------+-------------+---------+
示例 2:对多列使用 Sort()
我们将根据员工 ID 和员工姓名按升序对数据框进行排序。
蟒蛇3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.sort(['Employee_ID','Employee NAME'],
ascending = True).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
我们将根据员工 ID、公司和员工姓名按降序对数据框进行排序
蟒蛇3
# sort the dataframe based on employee ID ,
# company and employee Name columns in descending order
dataframe.sort(['Employee_ID','Employee NAME',
'Company'], ascending = False).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
示例 3:按 ASC 方法排序。
Column函数的ASC 方法,它根据给定列名的升序返回一个排序表达式。
蟒蛇3
dataframe.sort(dataframe.Employee_ID.asc()).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
示例 4:按 DESC 方法排序。
Column函数的DESC 方法,它根据给定列名的降序返回一个排序表达式。
蟒蛇3
dataframe.sort(dataframe.Employee_ID.desc()).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
使用 OrderBy()函数
orderBy()函数按一列或多列排序。默认情况下,它按升序排序。
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols→ Columns by which sorting is needed to be performed.
- ascending→ Boolean value to say that sorting is to be done in ascending order
示例 1:一列升序
Python程序根据员工ID按升序对数据框进行排序
蟒蛇3
# sort the dataframe based on employee I
# columns in descending order
dataframe.orderBy(['Employee_ID'],
ascending=False).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
Python程序根据员工ID按降序对数据框进行排序
蟒蛇3
# sort the dataframe based on
# Employee ID in descending order
dataframe.orderBy(['Employee_ID'],
ascending = False).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
示例 2:升序多列
使用 orderBy 根据员工 ID 和员工姓名列按降序对数据框进行排序。
蟒蛇3
# sort the dataframe based on employee ID
# and employee Name columns in descending order
dataframe.orderBy(['Employee ID','Employee NAME'],
ascending = False).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
根据员工 ID 和员工姓名列按升序对数据框进行排序
蟒蛇3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.orderBy(['Employee_ID','Employee NAME'],
ascending =True).show()
输出:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+