Python|熊猫 Index.factorize()

Python是一种用于进行数据分析的出色语言，主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas就是其中之一，它使导入和分析数据变得更加容易。

Pandas Index.factorize()函数将对象编码为枚举类型或分类变量。当重要的是识别不同的值时，此方法对于获取数组的数字表示很有用。 factorize 既可以作为顶级函数pandas.factorize() 使用，也可以作为方法 Series.factorize() 和 Index.factorize() 使用。

Syntax: Index.factorize(sort=False, na_sentinel=-1)

Parameters :
sort : Sort uniques and shuffle labels to maintain the relationship.
na_sentinel : Value to mark “not found”.

Returns : An integer ndarray that’s an indexer into uniques. uniques.take(labels) will have the same values as values.

编程需要懂一点英语

示例 #1：使用Index.factorize()函数将给定的索引值编码为分类形式。

# importing pandas as pd
import pandas as pd
  
# Creating the Index
idx = pd.Index(['Labrador', 'Beagle', 'Labrador',
                     'Lhasa', 'Husky', 'Beagle'])
  
# Print the Index
idx

输出：

让我们分解给定的索引。

# convert it into categorical values.
idx.factorize()

输出：

正如我们在输出中看到的那样， Index.factorize()函数已将 Index 中的每个标签转换为一个类别，并为它们分配了数值。示例 #2：使用Index.factorize()函数根据其排序顺序对索引值进行因式分解。

# importing pandas as pd
import pandas as pd
  
# Creating the Index
idx = pd.Index(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
  
# Print the Index
idx

输出：

让我们根据排序顺序对其进行分解。仅在对索引中的值进行排序后才分配数值。

# Factorize the sorted labels
idx.factorize(sort = True)

输出：

正如我们在输出中看到的那样，在为索引值分配数值之前，已经对它们进行了排序。