用于重复单词的正则表达式 python (1)

📌 相关文章

📜 用于重复单词的正则表达式 python (1)

📅 最后修改于: 2023-12-03 14:56:22.840000 🧑 作者: Mango

用于重复单词的正则表达式

在编写代码或处理文本数据时，可能需要通过正则表达式来查找或替换重复的单词。本文将介绍如何使用Python中的正则表达式来处理这种情况。

匹配连续重复的单词

为了匹配连续重复的单词，我们可以使用反向引用。具体来说，我们可以使用\b(\w+)\b\s+\b\1\b来匹配连续重复的单词。其中\b匹配单词边界，\w+匹配一个或多个单词字符，\s+匹配一个或多个空格，\1代表第一个捕获组中的内容（也就是前面的\w+），这个正则表达式只匹配连续出现的两个相同单词。

import re

text = "hello hello world world world"
pattern = r'\b(\w+)\b\s+\b\1\b'
result = re.findall(pattern, text)
print(result)  # ['hello', 'world']

匹配任意重复的单词

如果要匹配任意重复的单词，我们可以使用改进版的正则表达式：\b(\w+)\b(?=.*\b\1\b)。其中(?=.*\b\1\b)表示必须在后面的位置找到相同的单词。这个正则表达式会匹配任意重复的单词。

import re

text = "hello hello world world world this is a test test test"
pattern = r'\b(\w+)\b(?=.*\b\1\b)'
result = re.findall(pattern, text)
print(result)  # ['hello', 'world', 'test']

替换重复的单词

如果你想使用Python中的正则表达式来替换重复的单词，可以使用re.sub()函数。下面的示例代码将把所有的重复单词替换为<repeat>。

import re

text = "hello hello world world world this is a test test test"
pattern = r'\b(\w+)\b(?=.*\b\1\b)'
result = re.sub(pattern, r'<repeat>', text)
print(result)  # <repeat> <repeat> <repeat> this is a <repeat>

总结

本文介绍了如何使用Python中的正则表达式来处理重复的单词。我们可以使用反向引用来匹配连续出现的两个相同单词，或者使用改进版的正则表达式来匹配任意重复的单词。如果需要替换重复单词，可以使用re.sub()函数。