使用正则表达式从句子中删除重复的单词
给定一个表示句子的字符串str ,任务是使用Java中的正则表达式从句子中删除重复的单词。
例子:
Input: str = “Good bye bye world world”
Output: Good bye world
Explanation:
We remove the second occurrence of bye and world from Good bye bye world world
Input: str = “Ram went went to to to his home”
Output: Ram went to his home
Explanation:
We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.
Input: str = “Hello hello world world”
Output: Hello world
Explanation:
We remove the second occurrence of hello and world from Hello hello world world.
方法
- 得到句子。
- 形成一个正则表达式以从句子中删除重复的单词。
regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
- 上述正则表达式的细节可以理解为:
- “\\b” :单词边界。特殊情况需要边界。例如,在“My thesis is great”中,“is”不会匹配两次。
- “\\w+”一个单词字符:[a-zA-Z_0-9]
- “\\W+” : 非单词字符: [^\w]
- “\\1” :匹配第一组括号中匹配的任何内容,在本例中为 (\w+)
- “+” :匹配 1 次或多次后放置的任何内容
- 将句子与正则表达式匹配。在Java中,这可以使用 Pattern.matcher() 来完成。
- 返回修改后的句子。
下面是上述方法的实现:
C++
// C++ program to remove duplicate words
// using Regular Expression or ReGex.
#include
#include
using namespace std;
// Function to validate the sentence
// and remove the duplicate words
string removeDuplicateWords(string s)
{
// Regex to matching repeated words.
const regex pattern("\\b(\\w+)(?:\\W+\\1\\b)+", regex_constants::icase);
string answer = s;
for (auto it = sregex_iterator(s.begin(), s.end(), pattern);
it != sregex_iterator(); it++)
{
// flag type for determining the matching behavior
// here it is for matches on 'string' objects
smatch match;
match = *it;
answer.replace(answer.find(match.str(0)), match.str(0).length(), match.str(1));
}
return answer;
}
// Driver Code
int main()
{
// Test Case: 1
string str1
= "Good bye bye world world";
cout << removeDuplicateWords(str1) << endl;
// Test Case: 2
string str2
= "Ram went went to to his home";
cout << removeDuplicateWords(str2) << endl;
// Test Case: 3
string str3
= "Hello hello world world";
cout << removeDuplicateWords(str3) << endl;
return 0;
}
// This code is contributed by yuvraj_chandra
Java
// Java program to remove duplicate words
// using Regular Expression or ReGex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class GFG {
// Function to validate the sentence
// and remove the duplicate words
public static String
removeDuplicateWords(String input)
{
// Regex to matching repeated words.
String regex
= "\\b(\\w+)(?:\\W+\\1\\b)+";
Pattern p
= Pattern.compile(
regex,
Pattern.CASE_INSENSITIVE);
// Pattern class contains matcher() method
// to find matching between given sentence
// and regular expression.
Matcher m = p.matcher(input);
// Check for subsequences of input
// that match the compiled pattern
while (m.find()) {
input
= input.replaceAll(
m.group(),
m.group(1));
}
return input;
}
// Driver code
public static void main(String args[])
{
// Test Case: 1
String str1
= "Good bye bye world world";
System.out.println(
removeDuplicateWords(str1));
// Test Case: 2
String str2
= "Ram went went to to his home";
System.out.println(
removeDuplicateWords(str2));
// Test Case: 3
String str3
= "Hello hello world world";
System.out.println(
removeDuplicateWords(str3));
}
}
Python3
# Python program to remove duplicate words
# using Regular Expression or ReGex.
import re
# Function to validate the sentence
# and remove the duplicate words
def removeDuplicateWords(input):
# Regex to matching repeated words
regex = r'\b(\w+)(?:\W+\1\b)+'
return re.sub(regex, r'\1', input, flags=re.IGNORECASE)
# Driver Code
# Test Case: 1
str1 = "Good bye bye world world"
print(removeDuplicateWords(str1))
# Test Case: 2
str2 = "Ram went went to to his home"
print(removeDuplicateWords(str2))
# Test Case: 3
str3 = "Hello hello world world"
print(removeDuplicateWords(str3))
# This code is contributed by yuvraj_chandra
输出:
Good bye world
Ram went to his home
Hello world