最长的公共扩展名/ LCE |组合2(减为RMQ)

📌 相关文章

📜 最长的公共扩展名/ LCE |组合2(减为RMQ)

📅 最后修改于: 2021-04-17 10:13:20 🧑 作者: Mango

先决条件：

后缀数组|套装2
开赛算法

最长公共扩展(LCE)问题考虑一个字符串s，并为每对(L，R)计算以L和R开头的s的最长子字符串。在LCE中，我们必须回答每个查询从索引L和R开始的最长公共前缀的长度。

例子：
字串：“ abbababba”
查询： LCE(1、2)，LCE(1、6)和LCE(0、5)

查找从(1，2)，(1，6)和(0，5)给出的索引开始的最长公共前缀的长度。

突出显示为“绿色”的字符串是最长的公共前缀，从相应查询的索引L和R开始。我们必须找到从索引- (1，2)，(1，6)和( 0，5 )开始的最长公共前缀的长度。

最长的公共扩展名

在第1组中，我们解释了在许多查询中查找字符串的LCE长度的朴素方法。在这个集合中，我们将展示如何将LCE问题简化为RMQ问题，从而降低朴素方法的渐近时间复杂度。

将LCE减少为RMQ

假设输入字符串为S ，查询形式为LCE(L，R) 。假设s的后缀数组为Suff [] ，而lcp数组为lcp [] 。

S的两个后缀S _L和S _R之间的最长公共扩展名可以按以下方式从lcp数组获得。

设low为S后缀中S _L的等级(即Suff [low] = L)。
令high为S _R后缀中S _R的等级。在不失一般性的前提下，我们假设low
那么，S _L和S _R的最长公共扩展是lcp(low，high)= min _{(low <= k lcp [k]。}

证明：令S _L = S _L …S _{L + C} …s _n和S _R = S _R …S _{R + c} …s _n ，令c为S _L和S _R的最长公共扩展(即S _L … S _{L + C-1} = s _n …S _{R + c-1} )。我们假设字符串S具有前哨字符，因此S的后缀不是自身的任何其他后缀的前缀。

如果低=高– 1，则i =低，而lcp [low] = c是S _L和S _R的最长公共扩展，我们完成了。
如果low
如果c L。。。 S _{L + lcp [i] -1} = S _R。。。 S _R + lcp [i] -1(通过LCP表的定义)，以及lcp的条目对应于S的排序后缀的事实。
如果c> lcp [i]，则让high = Suff [i]，因此S _high是与位置i相关联的后缀。 S _i_很高。。。 s_{高+ lcp [i] -1} = S _L。。。 S _{L + lcp [i] -1}和s _high 。。。 s_{高+ lcp [i] -1} = S _R。。。 S _{R + lcp [i] -1} ，但由于S _L。。。 S _{L + c-1} = S _R。。。 S _{R + c-1}我们认为应该错误地对lcp数组进行排序，这是一个矛盾。

因此，我们有c = lcp [i]

因此，我们将最长的公共扩展查询减少到了lcp范围内的最小范围查询。

算法

要找到高低，我们必须首先计算后缀数组，然后从后缀数组中计算逆后缀数组。
我们还需要lcp数组，因此我们使用Kasai算法从后缀数组中查找lcp数组。
完成上述操作后，我们只需为每个查询在lcp数组中找到从索引–从低到高(如上所示)的最小值。

最小值是该查询的LCE的长度。

执行

// A C++ Program to find the length of longest common
// extension using Direct Minimum Algorithm
#include
using namespace std;
  
// Structure to represent a query of form (L,R)
struct Query
{
    int L, R;
};
  
// Structure to store information of a suffix
struct suffix
{
    int index;  // To store original index
    int rank[2]; // To store ranks and next rank pair
};
  
// A utility function to get minimum of two numbers
int minVal(int x, int y) { return (x < y)? x: y; }
  
// A utility function to get minimum of two numbers
int maxVal(int x, int y) { return (x > y)? x: y; }
  
// A comparison function used by sort() to compare
// two suffixes Compares two pairs, returns 1 if
// first pair is smaller
int cmp(struct suffix a, struct suffix b)
{
    return (a.rank[0] == b.rank[0])?
                   (a.rank[1] < b.rank[1]):
                   (a.rank[0] < b.rank[0]);
}
  
// This is the main function that takes a string 'txt'
// of size n as an argument, builds and return the
// suffix array for the given string
vector buildSuffixArray(string txt, int n)
{
    // A structure to store suffixes and their indexes
    struct suffix suffixes[n];
  
    // Store suffixes and their indexes in an array
    // of structures.
    // The structure is needed to sort the suffixes
    // alphabatically and maintain their old indexes
    // while sorting
    for (int i = 0; i < n; i++)
    {
        suffixes[i].index = i;
        suffixes[i].rank[0] = txt[i] - 'a';
        suffixes[i].rank[1] =
                 ((i+1) < n)? (txt[i + 1] - 'a'): -1;
    }
  
    // Sort the suffixes using the comparison function
    // defined above.
    sort(suffixes, suffixes+n, cmp);
  
    // At his point, all suffixes are sorted according
    // to first 2 characters.  Let us sort suffixes
    // according to first 4/ characters, then first 8
    // and so on
  
    // This array is needed to get the index in suffixes[]
    // from original index.  This mapping is needed to get
    // next suffix.
    int ind[n];
  
    for (int k = 4; k < 2*n; k = k*2)
    {
        // Assigning rank and index values to first suffix
        int rank = 0;
        int prev_rank = suffixes[0].rank[0];
        suffixes[0].rank[0] = rank;
        ind[suffixes[0].index] = 0;
  
        // Assigning rank to suffixes
        for (int i = 1; i < n; i++)
        {
            // If first rank and next ranks are same as
            // that of previous/ suffix in array, assign
            // the same new rank to this suffix
            if (suffixes[i].rank[0] == prev_rank &&
                suffixes[i].rank[1] == suffixes[i-1].rank[1])
            {
                prev_rank = suffixes[i].rank[0];
                suffixes[i].rank[0] = rank;
            }
            else // Otherwise increment rank and assign
            {
                prev_rank = suffixes[i].rank[0];
                suffixes[i].rank[0] = ++rank;
            }
            ind[suffixes[i].index] = i;
        }
  
        // Assign next rank to every suffix
        for (int i = 0; i < n; i++)
        {
            int nextindex = suffixes[i].index + k/2;
            suffixes[i].rank[1] = (nextindex < n)?
                           suffixes[ind[nextindex]].rank[0]: -1;
        }
  
        // Sort the suffixes according to first k characters
        sort(suffixes, suffixes+n, cmp);
    }
  
    // Store indexes of all sorted suffixes in the suffix array
    vectorsuffixArr;
    for (int i = 0; i < n; i++)
        suffixArr.push_back(suffixes[i].index);
  
    // Return the suffix array
    return  suffixArr;
}
  
/* To construct and return LCP */
vector kasai(string txt, vector suffixArr,
                              vector &invSuff)
{
    int n = suffixArr.size();
  
    // To store LCP array
    vector lcp(n, 0);
  
    // Fill values in invSuff[]
    for (int i=0; i < n; i++)
        invSuff[suffixArr[i]] = i;
  
    // Initialize length of previous LCP
    int k = 0;
  
    // Process all suffixes one by one starting from
    // first suffix in txt[]
    for (int i=0; i0)
            k--;
    }
  
    // return the constructed lcp array
    return lcp;
}
  
// A utility function to find longest common extension
// from index - L and index - R
int LCE(vector lcp, vectorinvSuff, int n,
        int L, int R)
{
    // Handle the corner case
    if (L == R)
        return (n-L);
  
    int low = minVal(invSuff[L], invSuff[R]);
    int high = maxVal(invSuff[L], invSuff[R]);
  
    int length = lcp[low];
  
    for (int i=low+1; isuffixArr = buildSuffixArray(str, str.length());
  
    // An auxiliary array to store inverse of suffix array
    // elements. For example if suffixArr[0] is 5, the
    // invSuff[5] would store 0.  This is used to get next
    // suffix string from suffix array.
    vector invSuff(n, 0);
  
    // Build a lcp vector
    vectorlcp = kasai(str, suffixArr, invSuff);
  
  
    for (int i=0; i


输出： 

LCE (1, 2) = 1
LCE (1, 6) = 3
LCE (0, 5) = 4


简化为RMQ方法的分析
时间复杂度：

构造lcp和后缀数组需要O(N.logN)时间。
要回答每个查询，需要O(| invSuff [R] – invSuff [L] |) 。
因此，整体时间复杂度为O(N.logN + Q.(| invSuff [R] – invSuff [L] |))
在哪里，
 Q = LCE查询数。
 N =输入字符串的长度。
 invSuff [] =输入字符串的后缀数组。
尽管这似乎是一种低效的算法，但是该算法通常优于所有其他算法来回答LCE查询。
在下一组中，我们将详细描述此方法的性能。
辅助空间：我们使用O(N)辅助空间来存储lcp，后缀和反后缀数组。
参考：

 http://www.sciencedirect.com/science/article/pii/S1570866710000377