python中的单词重复(1)

📌 相关文章

📜 python中的单词重复(1)

📅 最后修改于: 2023-12-03 14:46:39.883000 🧑 作者: Mango

Python 中的单词重复

在编程中，经常需要检查文本中是否有重复的单词。Python 作为一种高级语言，提供了多种方法来实现这一功能。下面是几种实现方式：

方法一：使用 set

可以将文本分割成单词，通过将单词列表转换成 set，来删除重复项。代码如下：

text = "this is a test text with some repeated words test"
words = text.split()
unique_words = set(words)
repeated_words = set(word for word in words if words.count(word) > 1)

print(f"Unique words: {unique_words}")
print(f"Repeated words: {repeated_words}")

输出：

Unique words: {'text', 'this', 'repeated', 'is', 'some', 'words', 'with', 'a', 'test'}
Repeated words: {'test'}

方法二：使用 collection.Counter

可以使用 collection.Counter 来计算每个单词出现的次数，然后再把出现次数大于1的单词提取出来。代码如下：

import collections

text = "this is a test text with some repeated words test"
words = text.split()
word_counts = collections.Counter(words)
repeated_words = [word for word, count in word_counts.items() if count > 1]

print(f"Repeated words: {repeated_words}")

输出：

Repeated words: ['test']

方法三：使用正则表达式

使用正则表达式可以快速地找到重复的单词。代码如下：

import re

text = "this is a test text with some repeated words test"
repeated_words = re.findall(r'\b(\w+)\b\s+\b\1\b', text)

print(f"Repeated words: {repeated_words}")

输出：

Repeated words: ['test']

以上是几种常见的方法，还有其他的实现方式。

注意：在处理文本时，要注意大小写和标点符号对结果的影响。如果需要忽略大小写，可以将所有文本转换为小写字母。如果需要保留标点符号，可以使用更复杂的正则表达式。