如何用关系对Python NumPy 数组进行排名?
在本文中,我们将了解如何在Python中使用 tie-breakers 对 Numpy 数组进行排名。
排名是在数据科学、社会学等众多领域中使用的基本统计操作。一种非常暴力的方法是按照对应值的顺序对数组的索引进行排序。在给定数字集中不涉及相同值的情况下,这种方法会很方便。本文将向前迈出一步,探索Python库 Scipy 中的 rankdata()函数,并说明它在有关系的列表中的用法。
rankdata()函数
为了计算排名,我们将使用Python中 scipy.stats 库中的 rankdata()函数。该函数有五种不同的平局策略,其语法如下:
Syntax: scipy.stats.rankdata(arr, method=’average’, axis=None)
Parameters:
- arr: A n-dimensional array
- method: A string mentioning the tie-breaking strategy. It is of 5 types:
- ‘average’: The average of the ranks that would have been assigned to all the tied values is assigned to each value.
- ‘min’: The minimum of the ranks that would have been assigned to all the tied values is assigned to each value.
- ‘max’: The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
- ‘dense’: The rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.
- ‘ordinal’: All values are given a distinct rank, corresponding to the order that the values occur in arr.
- axis: Axis along which to perform the ranking. If None, the data array is first flattened.
Returns: An Numpy array of size equal to the size of arr, containing rank scores.
示例 1:在一维 Numpy 数组上排名
在这个例子中,我们将探索一维 Numpy 数组上的所有平局策略。
Python3
import numpy as np
from scipy.stats import rankdata
arr = np.array([-20, -10, -10, -10, 10,
20, 20, 50, 50, 60, 60,
60, 60, 60])
# Normal ranking; each value has distinct rank
print(f"Ordinal ranking: {rankdata(arr,
method='ordinal')}")
# Average ranking; each value's
# rank is averaged over all ties
print(f"Average ranking: {rankdata(arr,
method='average')}")
# Max ranking; each value's rank is the
# maximum ordinal rank for the corresponding
# tie
print(f"Max ranking: {rankdata(arr,
method='max')}")
# Min ranking; each value's rank is
# the minimum ordinal rank for the corresponding
# tie
print(f"Min ranking: {rankdata(arr,
method='min')}")
# Dense ranking; each value's rank
# is sequentially arranged
print(f"Dense ranking: {rankdata(arr,
method='dense')}")
Python3
arr = np.array([[-20, -10, -10, -10, 10, 20, 20],
[50, 50, 60, -20, 60, 60, 60],
[-20, 50, -10, -30, 60, 20, 60]])
# Normal ranking; each value has distinct rank
print(f"Ordinal ranking:\n {rankdata(arr,
method='ordinal', axis = 0)}")
# Average ranking; each value's
# rank is averaged over all ties
print(f"Average ranking:\n {rankdata(arr,
method='average', axis = 0)}")
# Max ranking; each value's rank is
# the maximum ordinal rank for
# the corresponding tie
print(f"Max ranking:\n {rankdata(arr,
method='max', axis = 0)}")
# Min ranking; each value's rank is the
# minimum ordinal rank for the corresponding
# tie
print(f"Min ranking:\n {rankdata(arr,
method='min', axis = 0)}")
# Dense ranking; each value's rank
# is sequentially arranged
print(f"Dense ranking:\n {rankdata(arr,
method='dense', axis = 0)}")
输出:
示例 2:使用 'axis' 参数沿特定轴对 2-D Numpy 数组进行排名
在这个例子中,我们将沿着行探索二维 Numpy 数组上的所有平局策略。
Python3
arr = np.array([[-20, -10, -10, -10, 10, 20, 20],
[50, 50, 60, -20, 60, 60, 60],
[-20, 50, -10, -30, 60, 20, 60]])
# Normal ranking; each value has distinct rank
print(f"Ordinal ranking:\n {rankdata(arr,
method='ordinal', axis = 0)}")
# Average ranking; each value's
# rank is averaged over all ties
print(f"Average ranking:\n {rankdata(arr,
method='average', axis = 0)}")
# Max ranking; each value's rank is
# the maximum ordinal rank for
# the corresponding tie
print(f"Max ranking:\n {rankdata(arr,
method='max', axis = 0)}")
# Min ranking; each value's rank is the
# minimum ordinal rank for the corresponding
# tie
print(f"Min ranking:\n {rankdata(arr,
method='min', axis = 0)}")
# Dense ranking; each value's rank
# is sequentially arranged
print(f"Dense ranking:\n {rankdata(arr,
method='dense', axis = 0)}")
输出:
正如我们所看到的,通过比较同一行中的相应条目,为二维数组“arr”中的每一列的值分配了一个等级。