实现字符串匹配Bitap算法的Java程序
Bitap 算法是一种近似的字符串匹配算法。该算法判断给定文本是否包含与给定模式“大致相等”的子字符串。这里的近似相等表示如果子串和模式在彼此的给定距离 k 内。该算法首先预先计算一组位掩码,其中包含模式的每个元素一个位。这完成了大部分按位运算,速度很快。
Bitap 算法也称为shift-or、shift-and或 Baeza Yates Gonnet 算法。
例子:
Input:
Text: geeksforgeeks
Pattern: geeks
Output:
Pattern found at index: 0
Input:
Text: Youareawesome
Pattern: Youareamazing
Output:
No Match.
方法:
- 以字符串形式输入文本模式。
- 我们将把这个 String 转换成一个简单的 Char Array
- 如果长度为 0 或超过 63,则返回“ No Match”。
- 现在,通过取补 0 的数组 R。
- 取一个数组“pattern_mask”,将 0 补为 1
- 将模式作为“pattern_mask”的索引,然后使用and运算符将其与 1L(长整数)的补码结果相加,将其向左移动 i 次。
- 现在通过运行循环直到文本长度。
- 我们现在或它与 R 和图案蒙版。
- 现在通过将 1L 左移模式的长度,然后结果是一个和R
- 如果它等于零,我们通过 I-len+1 打印它,否则返回 -1
- 输出是文本的索引,其中模式匹配。
代码:
Java
// Java Program to implement Bitap Algorithm.
import java.io.*;
import java.io.IOException;
public class GFG {
public static void main(String[] args)
throws IOException
{
System.out.println("Bitap Algorithm!");
String text = "geeksforgeeks";
String pattern = "geeks";
// This is an object created of the class for
// extension of functions.
GFG g = new GFG();
// Now here we go to findPattern functions , we two
// arguments.
g.findPattern(text, pattern);
}
public void findPattern(String t, String p)
{
// we convert the String text to Character Array for
// easy indexing
char[] text = t.toCharArray();
// we convert the String pattern to Character Array
// to access each letter in the String easily.
char[] pattern = p.toCharArray();
// Index shows the function bitap search if they are
// equal at a particular index or not
int index = bitap_search(text, pattern);
// If the pattern is not equal to the text of the
// string approximately Then we tend to return -1 If
// index is -1 Then we print there is No match
if (index == -1) {
System.out.println("\nNo Match\n");
}
else {
// Else if there is a match
// Then we print the position of the index at
// where the pattern and the text matches.
System.out.println(
"\nPattern found at index: \n" + index);
}
}
private int bitap_search(char[] text, char[] pattern)
{
// Here the len variable is taken
// This variable accepts the pattern length as its
// value
int len = pattern.length;
// This is an array of pattern_mask of all
// character values in it.
long pattern_mask[]
= new long[Character.MAX_VALUE + 1];
// Here the variable of being long type is
// complemented with 1;
long R = ~1;
// Now if the length of the pattern is 0
// we would return -1
if (len == 0) {
return -1;
}
// Or if the length of the pattern exceeds the
// length of the character array Then we would
// declare that the pattern is too long. We would
// return -1
if (len > 63) {
System.out.println("Pattern too long!");
return -1;
}
// Now filling the values in the pattern mask
// We would run th eloop until the max value of
// character Initially all the values of Character
// are put up inside the pattern mask array And
// initially they are complemented with zero
for (int i = 0; i <= Character.MAX_VALUE; ++i)
pattern_mask[i] = ~0;
// Now len being the variable of pattern length ,
// the loop is set till there Now the pattern being
// the index of the pattern_mask 1L means the long
// integer is shifted to left by i times The result
// of that is now being complemented the result of
// the baove is now being used as an and operator We
// and the pattern_mask and the result of it
for (int i = 0; i < len; ++i)
pattern_mask[pattern[i]] &= ~(1L << i);
// Now the loop is made to run until the text length
// Now what we do this the R array is used
// as an Or function withh pattern_mask at index of
// text of i
for (int i = 0; i < text.length; ++i) {
R |= pattern_mask];
// Now result of the r after the above
// operation
// we shift it to left side by 1 time
R <<= 1;
// If the 1L long integer if shifted left of the
// len And the result is used to and the result
// and R array
// If that result is equal to 0
// We return the index value
// Index=i-len+1
if ((R & (1L << len)) == 0)
return i - len + 1;
}
// if the index is not matched
// then we return it as -1
// stating no match found.
return -1;
}
}
输出
Bitap Algorithm!
Pattern found at index:
0