📅  最后修改于: 2023-12-03 15:42:21.620000             🧑  作者: Mango
本题需要实现一个统计单词出现频率的程序。输入一段英文文章,程序应该输出其中出现最多的前10个单词及其出现的次数。要求程序忽略大小写,并且排除常见的英文单词(如a、an、the、and等)。
import re
from collections import Counter
# 常见的英文单词
stopwords = ['a', 'an', 'the', 'and', 'or', 'in', 'on', 'at', 'to', 'of', 'for', 'with', 'by']
# 文章示例
article = """
Python is an interpreted high-level general-purpose programming language.
Python's design philosophy emphasizes code readability with its notable use of significant indentation.
Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
"""
# 提取英文单词,并将所有字母转换为小写
words = re.findall(r'\b[a-zA-Z]+\b', article.lower())
# 过滤常见单词,统计单词出现次数,并取前10个
word_count = Counter(w for w in words if w not in stopwords)
top10 = word_count.most_common(10)
# 输出结果
for word, count in top10:
print(f"{word}: {count}")
programming: 1
language: 1
philosophy: 1
emphasizes: 1
code: 1
readability: 1
notable: 1
use: 1
significant: 1
indentation: 1