创建频率表|生成频率表 - Python (1)

📌 相关文章

📜 创建频率表|生成频率表 - Python (1)

📅 最后修改于: 2023-12-03 15:22:41.618000 🧑 作者: Mango

创建频率表|生成频率表 - Python

在数据分析、文本处理等领域，常常需要统计一些文本中出现单词、字符、词组等的频率。Python 提供了内置函数和第三方库，可以方便地实现这个功能，本文将介绍如何使用 Python 创建频率表。

collections 模块

Python 内置的 collections 模块提供了一些高效的容器类型，其中包括了用于统计频率的 Counter 类。

from collections import Counter

words = ['apple', 'banana', 'cherry', 'apple', 'cherry', 'cherry']
counter = Counter(words)

print(counter)
# Counter({'cherry': 3, 'apple': 2, 'banana': 1})

上述代码中，我们创建了一个包含字符串的列表，使用 Counter 对象统计了每个字符串出现的次数，并打印出来。Counter 对象支持多种常用的操作，例如：most_common(n) 可以返回出现频率前 n 高的元素。

print(counter.most_common(2))
# [('cherry', 3), ('apple', 2)]

pandas 库

pandas 是一款强大的数据处理库，可以实现各种数据的处理和分析任务。其中，pandas.Series 对象提供了 value_counts() 方法，可以方便地统计频率。

import pandas as pd

words = ['apple', 'banana', 'cherry', 'apple', 'cherry', 'cherry']
s = pd.Series(words)
counts = s.value_counts()

print(counts)
# cherry    3
# apple     2
# banana    1
# dtype: int64

上述代码中，我们将列表转换为 Series 对象，使用 value_counts() 方法统计每个字符串出现的次数，并打印出来。

numpy 库

numpy 是一款专门处理数值计算的库，其中的 numpy.unique 函数可以方便地统计每个元素在数组中出现的次数。

import numpy as np

words = ['apple', 'banana', 'cherry', 'apple', 'cherry', 'cherry']
unique, counts = np.unique(words, return_counts=True)

print(dict(zip(unique, counts)))
# {'apple': 2, 'banana': 1, 'cherry': 3}

上述代码中，我们将列表传入 numpy.unique 函数，并将返回的结果用 dict 函数转换为字典形式，打印出来。

小结

本文介绍了三种方法来创建频率表：collections 模块的 Counter 类，pandas 库的 Series 对象，以及 numpy 库的 numpy.unique 函数。这些方法各有优劣，可根据实际需求选择使用。