问题陈述:给定文本txt [0..n-1]和模式pat [0..m-1],编写一个函数search(char pat [],char txt []),将所有出现的pat [ txt []中的]。您可以假设n> m。
如前一篇文章所述,我们讨论了有效解决上述问题的两种方法。
1)预处理模式:KMP算法,Rabin Karp算法,有限自动机,Boyer Moore算法。
2)预处理文本:后缀树
第一个(预处理模式)获得的最佳时间复杂度为O(n),第二个(预处理文本)获得的最佳时间复杂度为O(m),其中m和n分别是模式和文本的长度。
请注意,第二种方法仅在O(m)时间内进行搜索,当文本不会经常更改且搜索查询很多时,它是首选方法。我们已经讨论了后缀树(所有文本后缀的压缩特里)。
对于要在技术面试或编程环境中进行编码的问题,后缀树的实现可能会很耗时。在这篇文章中,将讨论所有后缀的标准Trie的简单实现。该实现接近后缀树,唯一的是,它是一个简单的Trie而不是压缩的Trie。
正如在后缀树文章中所讨论的,想法是,文本中存在的每个模式(或者我们可以说文本的每个子串)都必须是所有可能后缀之一的前缀。因此,如果我们构建所有后缀的Trie,则可以在O(m)时间中找到模式,其中m是模式长度。
建立一个后缀
1)生成给定文本的所有后缀。
2)将所有后缀视为单个单词并构建一个特里。
让我们考虑示例文本“ banana \ 0”,其中“ \ 0”是字符串终止字符。以下是“ banana \ 0”的所有后缀
banana\0
anana\0
nana\0
ana\0
na\0
a\0
\0
如果我们将以上所有后缀视为单个词并构建一个Trie,我们将得到关注。
如何在内置的Trie中搜索模式?
以下是在内置Trie中搜索模式的步骤。
1)从图案的第一个字符和Trie的根开始,对每个字符进行以下操作。
….. a)对于图案的当前字符,如果当前节点有一条边,则跟随该边。
….. b)如果没有边缘,则打印“文本中不存在图案”并返回。
2)如果模式的所有字符均已处理,即从根到指定模式的字符的路径,则在存在模式的情况下使用print打印所有索引。为了存储索引,我们在每个节点上使用一个列表,该列表存储从该节点开始的后缀索引。
以下是上述想法的实现。
C++
// A simple C++ implementation of substring search using trie of suffixes
#include
#include
#define MAX_CHAR 256
using namespace std;
// A Suffix Trie (A Trie of all suffixes) Node
class SuffixTrieNode
{
private:
SuffixTrieNode *children[MAX_CHAR];
list *indexes;
public:
SuffixTrieNode() // Constructor
{
// Create an empty linked list for indexes of
// suffixes starting from this node
indexes = new list;
// Initialize all child pointers as NULL
for (int i = 0; i < MAX_CHAR; i++)
children[i] = NULL;
}
// A recursive function to insert a suffix of the txt
// in subtree rooted with this node
void insertSuffix(string suffix, int index);
// A function to search a pattern in subtree rooted
// with this node.The function returns pointer to a linked
// list containing all indexes where pattern is present.
// The returned indexes are indexes of last characters
// of matched text.
list* search(string pat);
};
// A Trie of all suffixes
class SuffixTrie
{
private:
SuffixTrieNode root;
public:
// Constructor (Builds a trie of suffies of the given text)
SuffixTrie(string txt)
{
// Consider all suffixes of given string and insert
// them into the Suffix Trie using recursive function
// insertSuffix() in SuffixTrieNode class
for (int i = 0; i < txt.length(); i++)
root.insertSuffix(txt.substr(i), i);
}
// Function to searches a pattern in this suffix trie.
void search(string pat);
};
// A recursive function to insert a suffix of the txt in
// subtree rooted with this node
void SuffixTrieNode::insertSuffix(string s, int index)
{
// Store index in linked list
indexes->push_back(index);
// If string has more characters
if (s.length() > 0)
{
// Find the first character
char cIndex = s.at(0);
// If there is no edge for this character, add a new edge
if (children[cIndex] == NULL)
children[cIndex] = new SuffixTrieNode();
// Recur for next suffix
children[cIndex]->insertSuffix(s.substr(1), index+1);
}
}
// A recursive function to search a pattern in subtree rooted with
// this node
list* SuffixTrieNode::search(string s)
{
// If all characters of pattern have been processed,
if (s.length() == 0)
return indexes;
// if there is an edge from the current node of suffix trie,
// follow the edge.
if (children[s.at(0)] != NULL)
return (children[s.at(0)])->search(s.substr(1));
// If there is no edge, pattern doesn’t exist in text
else return NULL;
}
/* Prints all occurrences of pat in the Suffix Trie S (built for text)*/
void SuffixTrie::search(string pat)
{
// Let us call recursive search function for root of Trie.
// We get a list of all indexes (where pat is present in text) in
// variable 'result'
list *result = root.search(pat);
// Check if the list of indexes is empty or not
if (result == NULL)
cout << "Pattern not found" << endl;
else
{
list::iterator i;
int patLen = pat.length();
for (i = result->begin(); i != result->end(); ++i)
cout << "Pattern found at position " << *i - patLen<< endl;
}
}
// driver program to test above functions
int main()
{
// Let us build a suffix trie for text "geeksforgeeks.org"
string txt = "geeksforgeeks.org";
SuffixTrie S(txt);
cout << "Search for 'ee'" << endl;
S.search("ee");
cout << "\nSearch for 'geek'" << endl;
S.search("geek");
cout << "\nSearch for 'quiz'" << endl;
S.search("quiz");
cout << "\nSearch for 'forgeeks'" << endl;
S.search("forgeeks");
return 0;
}
Java
import java.util.LinkedList;
import java.util.List;
class SuffixTrieNode {
final static int MAX_CHAR = 256;
SuffixTrieNode[] children = new SuffixTrieNode[MAX_CHAR];
List indexes;
SuffixTrieNode() // Constructor
{
// Create an empty linked list for indexes of
// suffixes starting from this node
indexes = new LinkedList();
// Initialize all child pointers as NULL
for (int i = 0; i < MAX_CHAR; i++)
children[i] = null;
}
// A recursive function to insert a suffix of
// the text in subtree rooted with this node
void insertSuffix(String s, int index) {
// Store index in linked list
indexes.add(index);
// If string has more characters
if (s.length() > 0) {
// Find the first character
char cIndex = s.charAt(0);
// If there is no edge for this character,
// add a new edge
if (children[cIndex] == null)
children[cIndex] = new SuffixTrieNode();
// Recur for next suffix
children[cIndex].insertSuffix(s.substring(1),
index + 1);
}
}
// A function to search a pattern in subtree rooted
// with this node.The function returns pointer to a
// linked list containing all indexes where pattern
// is present. The returned indexes are indexes of
// last characters of matched text.
List search(String s) {
// If all characters of pattern have been
// processed,
if (s.length() == 0)
return indexes;
// if there is an edge from the current node of
// suffix tree, follow the edge.
if (children[s.charAt(0)] != null)
return (children[s.charAt(0)]).search(s.substring(1));
// If there is no edge, pattern doesnt exist in
// text
else
return null;
}
}
// A Trie of all suffixes
class Suffix_tree{
SuffixTrieNode root = new SuffixTrieNode();
// Constructor (Builds a trie of suffies of the
// given text)
Suffix_tree(String txt) {
// Consider all suffixes of given string and
// insert them into the Suffix Trie using
// recursive function insertSuffix() in
// SuffixTrieNode class
for (int i = 0; i < txt.length(); i++)
root.insertSuffix(txt.substring(i), i);
}
/* Prints all occurrences of pat in the Suffix Trie S
(built for text) */
void search_tree(String pat) {
// Let us call recursive search function for
// root of Trie.
// We get a list of all indexes (where pat is
// present in text) in variable 'result'
List result = root.search(pat);
// Check if the list of indexes is empty or not
if (result == null)
System.out.println("Pattern not found");
else {
int patLen = pat.length();
for (Integer i : result)
System.out.println("Pattern found at position " +
(i - patLen));
}
}
// driver program to test above functions
public static void main(String args[]) {
// Let us build a suffix trie for text
// "geeksforgeeks.org"
String txt = "geeksforgeeks.org";
Suffix_tree S = new Suffix_tree(txt);
System.out.println("Search for 'ee'");
S.search_tree("ee");
System.out.println("\nSearch for 'geek'");
S.search_tree("geek");
System.out.println("\nSearch for 'quiz'");
S.search_tree("quiz");
System.out.println("\nSearch for 'forgeeks'");
S.search_tree("forgeeks");
}
}
// This code is contributed by Sumit Ghosh
C#
// C# implementation of the approach
using System;
using System.Collections.Generic;
class SuffixTrieNode
{
static int MAX_CHAR = 256;
public SuffixTrieNode[] children = new SuffixTrieNode[MAX_CHAR];
public List indexes;
public SuffixTrieNode() // Constructor
{
// Create an empty linked list for indexes of
// suffixes starting from this node
indexes = new List();
// Initialize all child pointers as NULL
for (int i = 0; i < MAX_CHAR; i++)
children[i] = null;
}
// A recursive function to insert a suffix of
// the text in subtree rooted with this node
public void insertSuffix(String s, int index)
{
// Store index in linked list
indexes.Add(index);
// If string has more characters
if (s.Length > 0)
{
// Find the first character
char cIndex = s[0];
// If there is no edge for this character,
// add a new edge
if (children[cIndex] == null)
children[cIndex] = new SuffixTrieNode();
// Recur for next suffix
children[cIndex].insertSuffix(s.Substring(1),
index + 1);
}
}
// A function to search a pattern in subtree rooted
// with this node.The function returns pointer to a
// linked list containing all indexes where pattern
// is present. The returned indexes are indexes of
// last characters of matched text.
public List search(String s)
{
// If all characters of pattern have been
// processed,
if (s.Length == 0)
return indexes;
// if there is an edge from the current node of
// suffix tree, follow the edge.
if (children[s[0]] != null)
return (children[s[0]]).search(s.Substring(1));
// If there is no edge, pattern doesnt exist in
// text
else
return null;
}
}
// A Trie of all suffixes
public class Suffix_tree
{
SuffixTrieNode root = new SuffixTrieNode();
// Constructor (Builds a trie of suffies of the
// given text)
Suffix_tree(String txt)
{
// Consider all suffixes of given string and
// insert them into the Suffix Trie using
// recursive function insertSuffix() in
// SuffixTrieNode class
for (int i = 0; i < txt.Length; i++)
root.insertSuffix(txt.Substring(i), i);
}
/* Prints all occurrences of pat in the
Suffix Trie S (built for text) */
void search_tree(String pat)
{
// Let us call recursive search function
// for root of Trie.
// We get a list of all indexes (where pat is
// present in text) in variable 'result'
List result = root.search(pat);
// Check if the list of indexes is empty or not
if (result == null)
Console.WriteLine("Pattern not found");
else
{
int patLen = pat.Length;
foreach (int i in result)
Console.WriteLine("Pattern found at position " +
(i - patLen));
}
}
// Driver Code
public static void Main(String []args)
{
// Let us build a suffix trie for text
// "geeksforgeeks.org"
String txt = "geeksforgeeks.org";
Suffix_tree S = new Suffix_tree(txt);
Console.WriteLine("Search for 'ee'");
S.search_tree("ee");
Console.WriteLine("\nSearch for 'geek'");
S.search_tree("geek");
Console.WriteLine("\nSearch for 'quiz'");
S.search_tree("quiz");
Console.WriteLine("\nSearch for 'forgeeks'");
S.search_tree("forgeeks");
}
}
// This code is contributed by 29AjayKumar
输出:
Search for 'ee'
Pattern found at position 1
Pattern found at position 9
Search for 'geek'
Pattern found at position 0
Pattern found at position 8
Search for 'quiz'
Pattern not found
Search for 'forgeeks'
Pattern found at position 5
上述搜索函数的时间复杂度为O(m + k),其中m是样式的长度,k是文本中样式出现的次数。