Python|熊猫 Series.factorize()

Pandas 系列是带有轴标签的一维 ndarray。标签不必是唯一的，但必须是可散列的类型。该对象支持基于整数和基于标签的索引，并提供了许多用于执行涉及索引的操作的方法。

Pandas Series.factorize()函数将对象编码为枚举类型或分类变量。当重要的是识别不同的值时，此方法对于获取数组的数字表示很有用。

Syntax: Series.factorize(sort=False, na_sentinel=-1)

Parameter :
sort : Sort uniques and shuffle labels to maintain the relationship.
na_sentinel : Value to mark “not found”.

Returns :
labels : ndarray
uniques : ndarray, Index, or Categorical

编程需要懂一点英语

示例 #1：使用Series.factorize()函数对给定系列对象的基础数据进行编码。

# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'Toronto', None, 'Rio'])
  
# Create the Index
sr.index = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5'] 
  
# set the index
sr.index = index_
  
# Print the series
print(sr)

输出：

现在我们将使用Series.factorize()函数对给定系列对象的基础数据进行编码。

# encode the values
result = sr.factorize()
  
# Print the result
print(result)

输出：

正如我们在输出中看到的， Series.factorize()函数已成功编码给定系列对象的基础数据。请注意，缺失值已分配为 -1 代码。示例 #2：使用Series.factorize()函数对给定系列对象的基础数据进行编码。

# importing pandas as pd
import pandas as pd
  
# Creating the Series
sr = pd.Series([80, 25, 3, 80, 24, 25])
  
# Create the Index
index_ = ['Coca Cola', 'Sprite', 'Coke', 'Fanta', 'Dew', 'ThumbsUp']
  
# set the index
sr.index = index_
  
# Print the series
print(sr)

输出：

现在我们将使用Series.factorize()函数对给定系列对象的基础数据进行编码。

# encode the values
result = sr.factorize()
  
# Print the result
print(result)

输出：

正如我们在输出中看到的， Series.factorize()函数已成功编码给定系列对象的基础数据。