📌  相关文章
📜  如何按多列对 PysPark DataFrame 进行排序?

📅  最后修改于: 2022-05-13 01:55:25.319000             🧑  作者: Mango

如何按多列对 PysPark DataFrame 进行排序?

在本文中,我们将使用 pyspark 数据帧中的 orderBy() 函数对多列进行排序。对行进行排序意味着按升序或降序排列行,因此我们将使用嵌套列表创建数据框并获取不同的数据。

orderBy()函数对一列或多列进行排序。默认情况下,它按升序排序。

使用学生数据作为信息创建数据框的示例程序:

Python3
# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data 
data =[["1","sravan","vignan"],
       ["2","ojaswi","vvit"],
       ["3","rohith","vvit"],
       ["4","sridevi","vignan"],
       ["1","sravan","vignan"], 
       ["5","gnanesh","iit"]]
  
# specify column names
columns=['student ID','student NAME','college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data,columns)
  
print("Actual data in dataframe")
  
# show dataframe
dataframe.show()


Python3
# show dataframe by sorting the dataframe based
# on two columns in descending order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
                  ascending=False).show()


Python3
# show dataframe by sorting the dataframe
# based on two columns in ascending order
# using orderby() function
dataframe.orderBy(['student ID','student NAME'],
                  ascending=True).show()


输出:



Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
|         1|      sravan| vignan|
|         2|      ojaswi|   vvit|
|         3|      rohith|   vvit|
|         4|     sridevi| vignan|
|         1|      sravan| vignan|
|         5|     gnanesh|    iit|
+----------+------------+-------+

示例 1: Python程序通过使用 orderby()函数按降序对基于两列的数据框进行排序来显示数据框

蟒蛇3

# show dataframe by sorting the dataframe based
# on two columns in descending order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
                  ascending=False).show()

输出:

+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
|         5|     gnanesh|    iit|
|         4|     sridevi| vignan|
|         3|      rohith|   vvit|
|         2|      ojaswi|   vvit|
|         1|      sravan| vignan|
|         1|      sravan| vignan|
+----------+------------+-------+

示例 2: Python程序通过使用 orderby()函数按升序对基于两列的数据框进行排序来显示数据框

蟒蛇3

# show dataframe by sorting the dataframe
# based on two columns in ascending order
# using orderby() function
dataframe.orderBy(['student ID','student NAME'],
                  ascending=True).show()

输出:

+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
|         1|      sravan| vignan|
|         1|      sravan| vignan|
|         2|      ojaswi|   vvit|
|         3|      rohith|   vvit|
|         4|     sridevi| vignan|
|         5|     gnanesh|    iit|
+----------+------------+-------+