使用正则表达式从句子中删除重复的单词

给定一个表示句子的字符串str ，任务是使用Java中的正则表达式从句子中删除重复的单词。
例子：

Input: str = “Good bye bye world world”
Output: Good bye world
Explanation:
We remove the second occurrence of bye and world from Good bye bye world world
Input: str = “Ram went went to to to his home”
Output: Ram went to his home
Explanation:
We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.
Input: str = “Hello hello world world”
Output: Hello world
Explanation:
We remove the second occurrence of hello and world from Hello hello world world.

编程需要懂一点英语

方法

得到句子。
形成一个正则表达式以从句子中删除重复的单词。

regex = "\\b(\\w+)(?:\\W+\\1\\b)+";

上述正则表达式的细节可以理解为：
- “\\b” ：单词边界。特殊情况需要边界。例如，在“My thesis is great”中，“is”不会匹配两次。
- “\\w+”一个单词字符：[a-zA-Z_0-9]
- “\\W+” : 非单词字符: [^\w]
- “\\1” ：匹配第一组括号中匹配的任何内容，在本例中为 (\w+)
- “+” ：匹配 1 次或多次后放置的任何内容
将句子与正则表达式匹配。在Java中，这可以使用 Pattern.matcher() 来完成。
返回修改后的句子。

下面是上述方法的实现：

C++

// C++ program to remove duplicate words
// using Regular Expression or ReGex.
#include 
#include 
using namespace std;
 
// Function to validate the sentence
// and remove the duplicate words
string removeDuplicateWords(string s)
{
 
  // Regex to matching repeated words.
  const regex pattern("\\b(\\w+)(?:\\W+\\1\\b)+", regex_constants::icase);
 
  string answer = s;
  for (auto it = sregex_iterator(s.begin(), s.end(), pattern);
       it != sregex_iterator(); it++)
  {
      // flag type for determining the matching behavior
      // here it is for matches on 'string' objects
      smatch match;
      match = *it;
      answer.replace(answer.find(match.str(0)), match.str(0).length(), match.str(1));
  }
 
  return answer;
}
 
// Driver Code
int main()
{
  // Test Case: 1
  string str1
      = "Good bye bye world world";
  cout << removeDuplicateWords(str1) << endl;
 
  // Test Case: 2
  string str2
      = "Ram went went to to his home";
  cout << removeDuplicateWords(str2) << endl;
 
  // Test Case: 3
  string str3
      = "Hello hello world world";
  cout << removeDuplicateWords(str3) << endl;
 
  return 0;
}
 
// This code is contributed by yuvraj_chandra

Java

// Java program to remove duplicate words
// using Regular Expression or ReGex.
 
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
class GFG {
 
    // Function to validate the sentence
    // and remove the duplicate words
    public static String
    removeDuplicateWords(String input)
    {
 
        // Regex to matching repeated words.
        String regex
            = "\\b(\\w+)(?:\\W+\\1\\b)+";
        Pattern p
            = Pattern.compile(
                regex,
                Pattern.CASE_INSENSITIVE);
 
        // Pattern class contains matcher() method
        // to find matching between given sentence
        // and regular expression.
        Matcher m = p.matcher(input);
 
        // Check for subsequences of input
        // that match the compiled pattern
        while (m.find()) {
            input
                = input.replaceAll(
                    m.group(),
                    m.group(1));
        }
        return input;
    }
 
    // Driver code
    public static void main(String args[])
    {
 
        // Test Case: 1
        String str1
            = "Good bye bye world world";
        System.out.println(
            removeDuplicateWords(str1));
 
        // Test Case: 2
        String str2
            = "Ram went went to to his home";
        System.out.println(
            removeDuplicateWords(str2));
 
        // Test Case: 3
        String str3
            = "Hello hello world world";
        System.out.println(
            removeDuplicateWords(str3));
    }
}

Python3

# Python program to remove duplicate words
# using Regular Expression or ReGex.
import re
 
 
# Function to validate the sentence
# and remove the duplicate words
def removeDuplicateWords(input):
 
    # Regex to matching repeated words
    regex = r'\b(\w+)(?:\W+\1\b)+'
 
    return re.sub(regex, r'\1', input, flags=re.IGNORECASE)
 
 
# Driver Code
 
# Test Case: 1
str1 = "Good bye bye world world"
print(removeDuplicateWords(str1))
 
# Test Case: 2
str2 = "Ram went went to to his home"
print(removeDuplicateWords(str2))
 
# Test Case: 3
str3 = "Hello hello world world"
print(removeDuplicateWords(str3))
 
# This code is contributed by yuvraj_chandra

输出：

Good bye world
Ram went to his home
Hello world