📜  使用回溯的分词问题

📅  最后修改于: 2021-05-24 21:07:56             🧑  作者: Mango

给定一个有效的句子,在单词和有效的英语单词的词典之间没有任何空格,请找到所有可能的方法以将单词分解为单个词典的单词。

例子

Consider the following dictionary 
{ i, like, sam, sung, samsung, mobile, ice, 
  and, cream, icecream, man, go, mango}

Input: "ilikesamsungmobile"
Output: i like sam sung mobile
        i like samsung mobile

Input: "ilikeicecreamandmango"
Output: i like ice cream and man go
        i like ice cream and mango
        i like icecream and man go
        i like icecream and mango

我们在下面的文章中讨论了动态编程解决方案。
动态编程|设置32(分词问题)

动态编程解决方案仅查找断字是否可能。在这里,我们需要打印所有可能的分词符。
我们从左开始扫描句子。当我们找到有效的单词时,我们需要检查句子的其余部分是否可以生成有效的单词。因为在某些情况下,从左侧找到的第一个单词可能会留下无法进一步分离的剩余部分。因此,在这种情况下,我们应该返回并保留当前找到的单词,并继续搜索下一个单词。这个过程是递归的,因为要找出正确的部分是否可分离,我们需要相同的逻辑。因此,我们将使用递归和回溯来解决此问题。为了跟踪找到的单词,我们将使用堆栈。只要字符串的右边部分没有形成有效的单词,我们就会从堆栈中弹出顶部的字符串,然后继续查找。

以下是上述想法的实现:

CPP
// A recursive program to print all possible
// partitions of a given string into dictionary
// words
#include 
using namespace std;
 
/* A utility function to check whether a word
  is present in dictionary or not.  An array of
  strings is used for dictionary.  Using array
  of strings for dictionary is definitely not
  a good idea. We have used for simplicity of
  the program*/
int dictionaryContains(string &word)
{
    string dictionary[] = {"mobile","samsung","sam","sung",
                            "man","mango", "icecream","and",
                            "go","i","love","ice","cream"};
    int n = sizeof(dictionary)/sizeof(dictionary[0]);
    for (int i = 0; i < n; i++)
        if (dictionary[i].compare(word) == 0)
            return true;
    return false;
}
 
// Prototype of wordBreakUtil
void wordBreakUtil(string str, int size, string result);
 
// Prints all possible word breaks of given string
void wordBreak(string str)
{
    // Last argument is prefix
    wordBreakUtil(str, str.size(), "");
}
 
// Result store the current prefix with spaces
// between words
void wordBreakUtil(string str, int n, string result)
{
    //Process all prefixes one by one
    for (int i=1; i<=n; i++)
    {
        // Extract substring from 0 to i in prefix
        string prefix = str.substr(0, i);
 
        // If dictionary conatins this prefix, then
        // we check for remaining string. Otherwise
        // we ignore this prefix (there is no else for
        // this if) and try next
        if (dictionaryContains(prefix))
        {
            // If no more elements are there, print it
            if (i == n)
            {
                // Add this element to previous prefix
                result += prefix;
                cout << result << endl;
                return;
            }
            wordBreakUtil(str.substr(i, n-i), n-i,
                                result + prefix + " ");
        }
    }     
}
 
//Driver Code
int main()
{
   
    // Function call
    cout << "First Test:\n";
    wordBreak("iloveicecreamandmango");
 
    cout << "\nSecond Test:\n";
    wordBreak("ilovesamsungmobile");
    return 0;
}


Python3
# A recursive program to print all possible
# partitions of a given string into dictionary
# words
 
# A utility function to check whether a word
# is present in dictionary or not.  An array of
# strings is used for dictionary.  Using array
# of strings for dictionary is definitely not
# a good idea. We have used for simplicity of
# the program
def dictionaryContains(word):
    dictionary = {"mobile", "samsung", "sam", "sung", "man",
                  "mango", "icecream", "and", "go", "i", "love", "ice", "cream"}
    return word in dictionary
 
# Prints all possible word breaks of given string
def wordBreak(string):
   
    # Last argument is prefix
    wordBreakUtil(string, len(string), "")
 
# Result store the current prefix with spaces
# between words
def wordBreakUtil(string, n, result):
 
    # Process all prefixes one by one
    for i in range(1, n + 1):
       
        # Extract substring from 0 to i in prefix
        prefix = string[:i]
         
        # If dictionary conatins this prefix, then
        # we check for remaining string. Otherwise
        # we ignore this prefix (there is no else for
        # this if) and try next
        if dictionaryContains(prefix):
           
            # If no more elements are there, print it
            if i == n:
 
                # Add this element to previous prefix
                result += prefix
                print(result)
                return
            wordBreakUtil(string[i:], n - i, result+prefix+" ")
 
# Driver Code
if __name__ == "__main__":
    print("First Test:")
    wordBreak("iloveicecreamandmango")
 
    print("\nSecond Test:")
    wordBreak("ilovesamsungmobile")
 
# This code is contributed by harshitkap00r


输出
First Test:
i love ice cream and man go
i love ice cream and mango
i love icecream and man go
i love icecream and mango

Second Test:
i love sam sung mobile
i love samsung mobile

复杂性:

  • 时间复杂度:O(n n )。因为在最坏的情况下有n n组合。
  • 辅助空间:O(n 2 )。由于在最坏的情况下wordBreakUtil(…)函数的递归堆栈。

其中n是输入字符串的长度。