📜  稀疏集

📅  最后修改于: 2021-04-17 11:02:16             🧑  作者: Mango

如果有大量查询,如何有效地执行以下操作。

  1. 插入
  2. 删除中
  3. 搜寻中
  4. 清除/删除所有元素。

一种解决方案是使用自平衡二叉搜索树,例如红黑树,AVL树等。此解决方案用于插入,删除和搜索的时间复杂度为O(Log n)。

我们还可以使用散列。使用散列时,前三个操作的时间复杂度为O(1)。但是第四次操作的时间复杂度是O(n)。

我们也可以使用位向量(或直接访问表),但是位向量也需要O(n)时间来清除。

稀疏集的性能优于所有BST,哈希和位向量。我们假定给定了数据范围(或一个元素可以具有的最大值)以及可以存储在集合中的最大元素数量。这个想法是要维护两个数组:sparse []和density []。

dense[]   ==> Stores the actual elements
sparse[]  ==> This is like bit-vector where 
              we use elements as index. Here 
              values are not binary, but
              indexes of dense array.
maxVal    ==> Maximum value this set can 
              store. Size of sparse[] is
              equal to maxVal + 1.
capacity  ==> Capacity of Set. Size of sparse
              is equal to capacity.  
n         ==> Current number of elements in
              Set.

insert(x):x为要插入的元素。如果x大于maxValn (当前元素数)大于等于容量,则返回。
如果以上条件均不成立,则将x插入index [n](在基于0的索引数组中最后一个元素之后的位置)的density []中,将n递增1(当前元素个数)并存储n(x在x中的索引)稀疏[x]时为密[]。

search(x):要搜索元素x,我们将x用作sparse []中的索引。值sparse [x]用作density []中的索引。并且如果density [sparse [x]]的值等于x,则返回density [x]。否则,我们返回-1。

delete(x):要删除元素x,我们将其替换为density []中的最后一个元素,并更新sparse []中的最后一个元素的索引。最后,将n减1。

clear():设置n = 0。

print():我们可以通过简单地遍历density []来打印所有元素。

插图:

Let there be a set with two elements {3, 5}, maximum
value as 10 and capacity as 4. The set would be 
represented as below.

Initially:
maxVal   = 10  // Size of sparse
capacity = 4  // Size of dense
n = 2         // Current number of elements in set

// dense[] Stores actual elements
dense[]  = {3, 5, _, _}

// Uses actual elements as index and stores
// indexes of dense[]
sparse[] = {_, _, _, 0, _, 1, _, _, _, _,}

'_' means it can be any value and not used in 
sparse set


Insert 7:
n        = 3
dense[]  = {3, 5, 7, _}
sparse[] = {_, _, _, 0, _, 1, _, 2, _, _,}

Insert 4:
n        = 4
dense[]  = {3, 5, 7, 4}
sparse[] = {_, _, _, 0, 3, 1, _, 2, _, _,}

Delete 3:
n        = 3
dense[]  = {4, 5, 7, _}
sparse[] = {_, _, _, _, 0, 1, _, 2, _, _,}

Clear (Remove All):
n        = 0
dense[]  = {_, _, _, _}
sparse[] = {_, _, _, _, _, _, _, _, _, _,}

下面是上述功能的C++实现。

/* A C program to implement Sparse Set and its operations */
#include
using namespace std;
  
// A structure to hold the three parameters required to
// represent a sparse set.
class SSet
{
    int *sparse;   // To store indexes of actual elements
    int *dense;    // To store actual set elements
    int n;         // Current number of elements
    int capacity;  // Capacity of set or size of dense[]
    int maxValue;  /* Maximum value in set or size of
                     sparse[] */
  
public:
    // Constructor
    SSet(int maxV, int cap)
    {
        sparse = new int[maxV+1];
        dense  = new int[cap];
        capacity = cap;
        maxValue = maxV;
        n = 0;  // No elements initially
    }
  
    // Destructor
    ~SSet()
    {
        delete[] sparse;
        delete[] dense;
    }
  
    // If element is present, returns index of
    // element in dense[]. Else returns -1.
    int search(int x);
  
    // Inserts a new element into set
    void insert(int x);
  
    // Deletes an element
    void deletion(int x);
  
    // Prints contents of set
    void print();
  
    // Removes all elements from set
    void clear() { n = 0; }
  
    // Finds intersection of this set with s
    // and returns pointer to result.
    SSet* intersection(SSet &s);
  
    // A function to find union of two sets
    // Time Complexity-O(n1+n2)
    SSet *setUnion(SSet &s);
};
  
// If x is present in set, then returns index
// of it in dense[], else returns -1.
int SSet::search(int x)
{
    // Searched element must be in range
    if (x > maxValue)
        return -1;
  
    // The first condition verifies that 'x' is
    // within 'n' in this set and the second
    // condition tells us that it is present in
    // the data structure.
    if (sparse[x] < n && dense[sparse[x]] == x)
        return (sparse[x]);
  
    // Not found
    return -1;
}
  
// Inserts a new element into set
void SSet::insert(int x)
{
    //  Corner cases, x must not be out of
    // range, dense[] should not be full and
    // x should not already be present
    if (x > maxValue)
        return;
    if (n >= capacity)
        return;
    if (search(x) != -1)
        return;
  
    // Inserting into array-dense[] at index 'n'.
    dense[n] = x;
  
    // Mapping it to sparse[] array.
    sparse[x] = n;
  
    // Increment count of elements in set
    n++;
}
  
// A function that deletes 'x' if present in this data
// structure, else it does nothing (just returns).
// By deleting 'x', we unset 'x' from this set.
void SSet::deletion(int x)
{
    // If x is not present
    if (search(x) == -1)
        return;
  
    int temp = dense[n-1];  // Take an element from end
    dense[sparse[x]] = temp;  // Overwrite.
    sparse[temp] = sparse[x]; // Overwrite.
  
    // Since one element has been deleted, we
    // decrement 'n' by 1.
    n--;
}
  
// prints contents of set which are also content
// of dense[]
void SSet::print()
{
    for (int i=0; iinsert(dense[i]);
    }
    else
    {
        // Search every element of 's' in this set.
        // If found, add it to result
        for (int i = 0; i < s.n; i++)
            if (search(s.dense[i]) != -1)
                result->insert(s.dense[i]);
    }
  
    return result;
}
  
// A function to find union of two sets
// Time Complexity-O(n1+n2)
SSet* SSet::setUnion(SSet &s)
{
    // Find capacity and maximum value for result
    // set.
    int uCap    = s.n + n;
    int uMaxVal = max(s.maxValue, maxValue);
  
    // Create result set
    SSet *result =  new SSet(uMaxVal, uCap);
  
    // Traverse the first set and insert all
    // elements of it in result.
    for (int i = 0; i < n; i++)
        result->insert(dense[i]);
  
    // Traverse the second set and insert all
    // elements of it in result (Note that sparse
    // set doesn't insert an entry if it is already
    // present)
    for (int i = 0; i < s.n; i++)
        result->insert(s.dense[i]);
  
    return result;
}
  
  
// Driver program
int main()
{
    // Create a set set1 with capacity 5 and max
    // value 100
    SSet s1(100, 5);
  
    // Insert elements into the set set1
    s1.insert(5);
    s1.insert(3);
    s1.insert(9);
    s1.insert(10);
  
    // Printing the elements in the data structure.
    printf("The elements in set1 are\n");
    s1.print();
  
    int index = s1.search(3);
  
    //  'index' variable stores the index of the number to
    //  be searched.
    if (index != -1)  // 3 exists
        printf("\n3 is found at index %d in set1\n",index);
    else            // 3 doesn't exist
        printf("\n3 doesn't exists in set1\n");
  
    // Delete 9 and print set1
    s1.deletion(9);
    s1.print();
  
    // Create a set with capacity 6 and max value
    // 1000
    SSet s2(1000, 6);
  
    // Insert elements into the set
    s2.insert(4);
    s2.insert(3);
    s2.insert(7);
    s2.insert(200);
  
    // Printing set 2.
    printf("\nThe elements in set2 are\n");
    s2.print();
  
    // Printing the intersection of the two sets
    SSet *intersect = s2.intersection(s1);
    printf("\nIntersection of set1 and set2\n");
    intersect->print();
  
    // Printing the union of the two sets
    SSet *unionset = s1.setUnion(s2);
    printf("\nUnion of set1 and set2\n");
    unionset->print();
  
    return 0;
}

输出 :

The elements in set1 are
5 3 9 10 

3 is found at index 1 in set1
5 3 10 

The elements in set2 are-
4 3 7 200 

Intersection of set1 and set2
3 

Union of set1 and set2
5 3 10 4 7 200 

附加操作:
以下是使用稀疏集也可以有效实现的操作。在已知元素的范围和最大数量的假设下,它的性能优于此处讨论的所有解决方案和基于位向量的解决方案。

联盟():
1)创建一个空的稀疏集,即结果。
2)遍历第一组并将其所有元素插入结果。
3)遍历第二组并将其所有元素插入结果中(请注意,稀疏集不会插入条目(如果已经存在))
4)返回结果。

路口():
1)创建一个空的稀疏集,即结果。
2)让两个给定集合中的较小者为第一集合,而较大的为第二集合。
3)考虑较小的集合,并在第二秒内搜索它的每个元素。如果找到元素,则将其添加到结果中。
4)返回结果。

此数据结构的常见用法是在编译器中使用寄存器分配算法,该算法具有固定的Universe(机器中的寄存器数量),并且在单个处理运行期间会频繁更新和清除(就像Q查询一样)。

参考:
http://research.swtch.com/sparse
http://codingplayground.blogspot.in/2009/03/sparse-sets-with-o1-insert-delete.html