PySpark – 按多列对数据框进行排序
在本文中,我们将了解如何按多列对 PySpark 数据框进行排序。
可以通过以下方式完成:
- 使用排序()
- 使用 orderBy()
创建用于演示的数据框:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1", "sravan", "vignan"],
["2", "ojaswi", "vvit"],
["3", "rohith", "vvit"],
["4", "sridevi", "vignan"],
["1", "sravan", "vignan"],
["5", "gnanesh", "iit"]]
# specify column names
columns = ['student ID', 'student NAME', 'college']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
print("Actual data in dataframe")
# show dataframe
dataframe.show()
Python3
# show dataframe by sorting the dataframe
# based on two columns in ascending order
dataframe.sort(['college','student ID'],
ascending = True).show()
Python3
# show dataframe by sorting the dataframe
# based on two columns in descending order
dataframe.sort(['college','student NAME'],
ascending = False).show()
Python3
# show dataframe by sorting the dataframe
# based on two columns in descending
# order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
ascending = False).show()
Python3
# show dataframe by sorting the dataframe
# based on two columns in ascending
# order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
ascending = True).show()
输出:
方法一:使用sort()函数
该函数用于对列进行排序。
Syntax: dataframe.sort([‘column1′,’column2′,’column n’],ascending=True)
Where,
- dataframe is the dataframe name created from the nested lists using pyspark
- where columns are the llst of columns
- ascending = True specifies order the dataframe in increasing order, ascending=False specifies order the dataframe in decreasing order
示例 1:通过按升序传递多列(2 列)列表来对数据帧进行排序的Python代码。
蟒蛇3
# show dataframe by sorting the dataframe
# based on two columns in ascending order
dataframe.sort(['college','student ID'],
ascending = True).show()
输出:
示例 2: Python程序通过按降序传递列列表来对数据框进行排序
蟒蛇3
# show dataframe by sorting the dataframe
# based on two columns in descending order
dataframe.sort(['college','student NAME'],
ascending = False).show()
输出:
方法二:使用 orderBy()函数。
orderBy()函数对一列或多列进行排序。默认情况下,它按升序排序。
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols: Columns by which sorting is needed to be performed.
- ascending: Boolean value to say that sorting is to be done in ascending order
示例 1: Python程序通过使用 orderby()函数按降序对基于两列的数据框进行排序来显示数据框
蟒蛇3
# show dataframe by sorting the dataframe
# based on two columns in descending
# order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
ascending = False).show()
输出:
示例 2: Python程序通过使用 orderby()函数按升序对基于两列的数据框进行排序来显示数据框
蟒蛇3
# show dataframe by sorting the dataframe
# based on two columns in ascending
# order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
ascending = True).show()
输出: