📜  使用所有后缀的特里搜索模式

📅  最后修改于: 2021-05-06 08:18:07             🧑  作者: Mango

问题陈述:给定文本txt [0..n-1]和模式pat [0..m-1],编写一个函数search(char pat [],char txt []),将所有出现的pat [ txt []中的]。您可以假设n> m。

如前一篇文章所述,我们讨论了有效解决上述问题的两种方法。

1)预处理模式:KMP算法,Rabin Karp算法,有限自动机,Boyer Moore算法。

2)预处理文本:后缀树

第一个(预处理模式)获得的最佳时间复杂度为O(n),第二个(预处理文本)获得的最佳时间复杂度为O(m),其中m和n分别是模式和文本的长度。

请注意,第二种方法仅在O(m)时间内进行搜索,当文本不会经常更改且搜索查询很多时,它是首选方法。我们已经讨论了后缀树(所有文本后缀的压缩特里)。

对于要在技术面试或编程环境中进行编码的问题,后缀树的实现可能会很耗时。在这篇文章中,将讨论所有后缀的标准Trie的简单实现。该实现接近后缀树,唯一的是,它是一个简单的Trie而不是压缩的Trie。

正如在后缀树文章中所讨论的,想法是,文本中存在的每个模式(或者我们可以说文本的每个子串)都必须是所有可能后缀之一的前缀。因此,如果我们构建所有后缀的Trie,则可以在O(m)时间中找到模式,其中m是模式长度。

建立一个后缀
1)生成给定文本的所有后缀。
2)将所有后缀视为单个单词并构建一个特里。

让我们考虑示例文本“ banana \ 0”,其中“ \ 0”是字符串终止字符。以下是“ banana \ 0”的所有后缀

banana\0
anana\0
nana\0
ana\0
na\0
a\0
\0

如果我们将以上所有后缀视为单个词并构建一个Trie,我们将得到关注。

如何在内置的Trie中搜索模式?
以下是在内置Trie中搜索模式的步骤。
1)从图案的第一个字符和Trie的根开始,对每个字符进行以下操作。
….. a)对于图案的当前字符,如果当前节点有一条边,则跟随该边。
….. b)如果没有边缘,则打印“文本中不存在图案”并返回。
2)如果模式的所有字符均已处理,即从根到指定模式的字符的路径,则在存在模式的情况下使用print打印所有索引。为了存储索引,我们在每个节点上使用一个列表,该列表存储从该节点开始的后缀索引。

以下是上述想法的实现。

C++
// A simple C++ implementation of substring search using trie of suffixes
#include
#include
#define MAX_CHAR 256
using namespace std;
  
// A Suffix Trie (A Trie of all suffixes) Node
class SuffixTrieNode
{
private:
    SuffixTrieNode *children[MAX_CHAR];
    list *indexes;
public:
    SuffixTrieNode() // Constructor
    {
        // Create an empty linked list for indexes of
        // suffixes starting from this node
        indexes = new list;
  
        // Initialize all child pointers as NULL
        for (int i = 0; i < MAX_CHAR; i++)
          children[i] = NULL;
    }
  
    // A recursive function to insert a suffix of the txt
    // in subtree rooted with this node
    void insertSuffix(string suffix, int index);
  
    // A function to search a pattern in subtree rooted
    // with this node.The function returns pointer to a linked
    // list containing all indexes where pattern is present.
    // The returned indexes are indexes of last characters
    // of matched text.
    list* search(string pat);
};
  
// A Trie of all suffixes
class SuffixTrie
{
private:
    SuffixTrieNode root;
public:
    // Constructor (Builds a trie of suffies of the given text)
    SuffixTrie(string txt)
    {
        // Consider all suffixes of given string and insert
        // them into the Suffix Trie using recursive function
        // insertSuffix() in SuffixTrieNode class
        for (int i = 0; i < txt.length(); i++)
            root.insertSuffix(txt.substr(i), i);
    }
  
    // Function to searches a pattern in this suffix trie.
    void search(string pat);
};
  
// A recursive function to insert a suffix of the txt in
// subtree rooted with this node
void SuffixTrieNode::insertSuffix(string s, int index)
{
    // Store index in linked list
    indexes->push_back(index);
  
    // If string has more characters
    if (s.length() > 0)
    {
        // Find the first character
        char cIndex = s.at(0);
  
        // If there is no edge for this character, add a new edge
        if (children[cIndex] == NULL)
            children[cIndex] = new SuffixTrieNode();
  
        // Recur for next suffix
        children[cIndex]->insertSuffix(s.substr(1), index+1);
    }
}
  
// A recursive function to search a pattern in subtree rooted with
// this node
list* SuffixTrieNode::search(string s)
{
    // If all characters of pattern have been processed,
    if (s.length() == 0)
        return indexes;
  
    // if there is an edge from the current node of suffix trie,
    // follow the edge.
    if (children[s.at(0)] != NULL)
        return (children[s.at(0)])->search(s.substr(1));
  
    // If there is no edge, pattern doesn’t exist in text
    else return NULL;
}
  
/* Prints all occurrences of pat in the Suffix Trie S (built for text)*/
void SuffixTrie::search(string pat)
{
    // Let us call recursive search function for root of Trie.
    // We get a list of all indexes (where pat is present in text) in
    // variable 'result'
    list *result = root.search(pat);
  
    // Check if the list of indexes is empty or not
    if (result == NULL)
        cout << "Pattern not found" << endl;
    else
    {
       list::iterator i;
       int patLen = pat.length();
       for (i = result->begin(); i != result->end(); ++i)
         cout << "Pattern found at position " << *i - patLen<< endl;
    }
}
  
// driver program to test above functions
int main()
{
    // Let us build a suffix trie for text "geeksforgeeks.org"
    string txt = "geeksforgeeks.org";
    SuffixTrie S(txt);
  
    cout << "Search for 'ee'" << endl;
    S.search("ee");
  
    cout << "\nSearch for 'geek'" << endl;
    S.search("geek");
  
    cout << "\nSearch for 'quiz'" << endl;
    S.search("quiz");
  
    cout << "\nSearch for 'forgeeks'" << endl;
    S.search("forgeeks");
  
    return 0;
}


Java
import java.util.LinkedList;
import java.util.List;
class SuffixTrieNode {
  
    final static int MAX_CHAR = 256;
  
    SuffixTrieNode[] children = new SuffixTrieNode[MAX_CHAR];
    List indexes;
  
    SuffixTrieNode() // Constructor
    {
        // Create an empty linked list for indexes of
        // suffixes starting from this node
        indexes = new LinkedList();
  
        // Initialize all child pointers as NULL
        for (int i = 0; i < MAX_CHAR; i++)
            children[i] = null;
    }
  
    // A recursive function to insert a suffix of 
    // the text in subtree rooted with this node
    void insertSuffix(String s, int index) {
          
        // Store index in linked list
        indexes.add(index);
  
        // If string has more characters
        if (s.length() > 0) {
          
            // Find the first character
            char cIndex = s.charAt(0);
  
            // If there is no edge for this character,
            // add a new edge
            if (children[cIndex] == null)
                children[cIndex] = new SuffixTrieNode();
  
            // Recur for next suffix
            children[cIndex].insertSuffix(s.substring(1),
                                              index + 1);
        }
    }
  
    // A function to search a pattern in subtree rooted
    // with this node.The function returns pointer to a 
    // linked list containing all indexes where pattern  
    // is present. The returned indexes are indexes of  
    // last characters of matched text.
    List search(String s) {
          
        // If all characters of pattern have been 
        // processed,
        if (s.length() == 0)
            return indexes;
  
        // if there is an edge from the current node of
        // suffix tree, follow the edge.
        if (children[s.charAt(0)] != null)
            return (children[s.charAt(0)]).search(s.substring(1));
  
        // If there is no edge, pattern doesnt exist in 
        // text
        else
            return null;
    }
}
  
// A Trie of all suffixes
class Suffix_tree{
  
    SuffixTrieNode root = new SuffixTrieNode();
  
    // Constructor (Builds a trie of suffies of the
    // given text)
    Suffix_tree(String txt) {
      
        // Consider all suffixes of given string and
        // insert them into the Suffix Trie using 
        // recursive function insertSuffix() in 
        // SuffixTrieNode class
        for (int i = 0; i < txt.length(); i++)
            root.insertSuffix(txt.substring(i), i);
    }
  
    /* Prints all occurrences of pat in the Suffix Trie S
    (built for text) */
    void search_tree(String pat) {
      
        // Let us call recursive search function for 
        // root of Trie.
        // We get a list of all indexes (where pat is 
        // present in text) in variable 'result'
        List result = root.search(pat);
  
        // Check if the list of indexes is empty or not
        if (result == null)
            System.out.println("Pattern not found");
        else {
  
            int patLen = pat.length();
  
            for (Integer i : result)
                System.out.println("Pattern found at position " +
                                                (i - patLen));
        }
    }
  
    // driver program to test above functions
    public static void main(String args[]) {
          
        // Let us build a suffix trie for text 
        // "geeksforgeeks.org"
        String txt = "geeksforgeeks.org";
        Suffix_tree S = new Suffix_tree(txt);
  
        System.out.println("Search for 'ee'");
        S.search_tree("ee");
  
        System.out.println("\nSearch for 'geek'");
        S.search_tree("geek");
  
        System.out.println("\nSearch for 'quiz'");
        S.search_tree("quiz");
  
        System.out.println("\nSearch for 'forgeeks'");
        S.search_tree("forgeeks");
    }
}
// This code is contributed by Sumit Ghosh


C#
// C# implementation of the approach
using System;
using System.Collections.Generic;
class SuffixTrieNode
{
    static int MAX_CHAR = 256;
  
    public SuffixTrieNode[] children = new SuffixTrieNode[MAX_CHAR];
    public List indexes;
  
    public SuffixTrieNode() // Constructor
    {
        // Create an empty linked list for indexes of
        // suffixes starting from this node
        indexes = new List();
  
        // Initialize all child pointers as NULL
        for (int i = 0; i < MAX_CHAR; i++)
            children[i] = null;
    }
  
    // A recursive function to insert a suffix of 
    // the text in subtree rooted with this node
    public void insertSuffix(String s, int index) 
    {
          
        // Store index in linked list
        indexes.Add(index);
  
        // If string has more characters
        if (s.Length > 0)
        {
          
            // Find the first character
            char cIndex = s[0];
  
            // If there is no edge for this character,
            // add a new edge
            if (children[cIndex] == null)
                children[cIndex] = new SuffixTrieNode();
  
            // Recur for next suffix
            children[cIndex].insertSuffix(s.Substring(1),
                                              index + 1);
        }
    }
  
    // A function to search a pattern in subtree rooted
    // with this node.The function returns pointer to a 
    // linked list containing all indexes where pattern 
    // is present. The returned indexes are indexes of 
    // last characters of matched text.
    public List search(String s) 
    {
          
        // If all characters of pattern have been 
        // processed,
        if (s.Length == 0)
            return indexes;
  
        // if there is an edge from the current node of
        // suffix tree, follow the edge.
        if (children[s[0]] != null)
            return (children[s[0]]).search(s.Substring(1));
  
        // If there is no edge, pattern doesnt exist in 
        // text
        else
            return null;
    }
}
  
// A Trie of all suffixes
public class Suffix_tree
{
  
    SuffixTrieNode root = new SuffixTrieNode();
  
    // Constructor (Builds a trie of suffies of the
    // given text)
    Suffix_tree(String txt) 
    {
      
        // Consider all suffixes of given string and
        // insert them into the Suffix Trie using 
        // recursive function insertSuffix() in 
        // SuffixTrieNode class
        for (int i = 0; i < txt.Length; i++)
            root.insertSuffix(txt.Substring(i), i);
    }
  
    /* Prints all occurrences of pat in the 
    Suffix Trie S (built for text) */
    void search_tree(String pat) 
    {
      
        // Let us call recursive search function 
        // for root of Trie.
        // We get a list of all indexes (where pat is 
        // present in text) in variable 'result'
        List result = root.search(pat);
  
        // Check if the list of indexes is empty or not
        if (result == null)
            Console.WriteLine("Pattern not found");
        else 
        {
            int patLen = pat.Length;
  
            foreach (int i in result)
                Console.WriteLine("Pattern found at position " +
                                                  (i - patLen));
        }
    }
  
    // Driver Code
    public static void Main(String []args) 
    {
          
        // Let us build a suffix trie for text 
        // "geeksforgeeks.org"
        String txt = "geeksforgeeks.org";
        Suffix_tree S = new Suffix_tree(txt);
  
        Console.WriteLine("Search for 'ee'");
        S.search_tree("ee");
  
        Console.WriteLine("\nSearch for 'geek'");
        S.search_tree("geek");
  
        Console.WriteLine("\nSearch for 'quiz'");
        S.search_tree("quiz");
  
        Console.WriteLine("\nSearch for 'forgeeks'");
        S.search_tree("forgeeks");
    }
}
  
// This code is contributed by 29AjayKumar


输出:

Search for 'ee'
Pattern found at position 1
Pattern found at position 9

Search for 'geek'
Pattern found at position 0
Pattern found at position 8

Search for 'quiz'
Pattern not found

Search for 'forgeeks'
Pattern found at position 5

上述搜索函数的时间复杂度为O(m + k),其中m是样式的长度,k是文本中样式出现的次数。