如果有大量查询,如何有效地执行以下操作。
- 插入
- 删除中
- 搜寻中
- 清除/删除所有元素。
一种解决方案是使用自平衡二叉搜索树,例如红黑树,AVL树等。此解决方案用于插入,删除和搜索的时间复杂度为O(Log n)。
我们还可以使用散列。使用散列时,前三个操作的时间复杂度为O(1)。但是第四次操作的时间复杂度是O(n)。
我们也可以使用位向量(或直接访问表),但是位向量也需要O(n)时间来清除。
稀疏集的性能优于所有BST,哈希和位向量。我们假定给定了数据范围(或一个元素可以具有的最大值)以及可以存储在集合中的最大元素数量。这个想法是要维护两个数组:sparse []和density []。
dense[] ==> Stores the actual elements
sparse[] ==> This is like bit-vector where
we use elements as index. Here
values are not binary, but
indexes of dense array.
maxVal ==> Maximum value this set can
store. Size of sparse[] is
equal to maxVal + 1.
capacity ==> Capacity of Set. Size of sparse
is equal to capacity.
n ==> Current number of elements in
Set.
insert(x):令x为要插入的元素。如果x大于maxVal或n (当前元素数)大于等于容量,则返回。
如果以上条件均不成立,则将x插入index [n](在基于0的索引数组中最后一个元素之后的位置)的density []中,将n递增1(当前元素个数)并存储n(x在x中的索引)稀疏[x]时为密[]。
search(x):要搜索元素x,我们将x用作sparse []中的索引。值sparse [x]用作density []中的索引。并且如果density [sparse [x]]的值等于x,则返回density [x]。否则,我们返回-1。
delete(x):要删除元素x,我们将其替换为density []中的最后一个元素,并更新sparse []中的最后一个元素的索引。最后,将n减1。
clear():设置n = 0。
print():我们可以通过简单地遍历density []来打印所有元素。
插图:
Let there be a set with two elements {3, 5}, maximum
value as 10 and capacity as 4. The set would be
represented as below.
Initially:
maxVal = 10 // Size of sparse
capacity = 4 // Size of dense
n = 2 // Current number of elements in set
// dense[] Stores actual elements
dense[] = {3, 5, _, _}
// Uses actual elements as index and stores
// indexes of dense[]
sparse[] = {_, _, _, 0, _, 1, _, _, _, _,}
'_' means it can be any value and not used in
sparse set
Insert 7:
n = 3
dense[] = {3, 5, 7, _}
sparse[] = {_, _, _, 0, _, 1, _, 2, _, _,}
Insert 4:
n = 4
dense[] = {3, 5, 7, 4}
sparse[] = {_, _, _, 0, 3, 1, _, 2, _, _,}
Delete 3:
n = 3
dense[] = {4, 5, 7, _}
sparse[] = {_, _, _, _, 0, 1, _, 2, _, _,}
Clear (Remove All):
n = 0
dense[] = {_, _, _, _}
sparse[] = {_, _, _, _, _, _, _, _, _, _,}
下面是上述功能的C++实现。
/* A C program to implement Sparse Set and its operations */
#include
using namespace std;
// A structure to hold the three parameters required to
// represent a sparse set.
class SSet
{
int *sparse; // To store indexes of actual elements
int *dense; // To store actual set elements
int n; // Current number of elements
int capacity; // Capacity of set or size of dense[]
int maxValue; /* Maximum value in set or size of
sparse[] */
public:
// Constructor
SSet(int maxV, int cap)
{
sparse = new int[maxV+1];
dense = new int[cap];
capacity = cap;
maxValue = maxV;
n = 0; // No elements initially
}
// Destructor
~SSet()
{
delete[] sparse;
delete[] dense;
}
// If element is present, returns index of
// element in dense[]. Else returns -1.
int search(int x);
// Inserts a new element into set
void insert(int x);
// Deletes an element
void deletion(int x);
// Prints contents of set
void print();
// Removes all elements from set
void clear() { n = 0; }
// Finds intersection of this set with s
// and returns pointer to result.
SSet* intersection(SSet &s);
// A function to find union of two sets
// Time Complexity-O(n1+n2)
SSet *setUnion(SSet &s);
};
// If x is present in set, then returns index
// of it in dense[], else returns -1.
int SSet::search(int x)
{
// Searched element must be in range
if (x > maxValue)
return -1;
// The first condition verifies that 'x' is
// within 'n' in this set and the second
// condition tells us that it is present in
// the data structure.
if (sparse[x] < n && dense[sparse[x]] == x)
return (sparse[x]);
// Not found
return -1;
}
// Inserts a new element into set
void SSet::insert(int x)
{
// Corner cases, x must not be out of
// range, dense[] should not be full and
// x should not already be present
if (x > maxValue)
return;
if (n >= capacity)
return;
if (search(x) != -1)
return;
// Inserting into array-dense[] at index 'n'.
dense[n] = x;
// Mapping it to sparse[] array.
sparse[x] = n;
// Increment count of elements in set
n++;
}
// A function that deletes 'x' if present in this data
// structure, else it does nothing (just returns).
// By deleting 'x', we unset 'x' from this set.
void SSet::deletion(int x)
{
// If x is not present
if (search(x) == -1)
return;
int temp = dense[n-1]; // Take an element from end
dense[sparse[x]] = temp; // Overwrite.
sparse[temp] = sparse[x]; // Overwrite.
// Since one element has been deleted, we
// decrement 'n' by 1.
n--;
}
// prints contents of set which are also content
// of dense[]
void SSet::print()
{
for (int i=0; iinsert(dense[i]);
}
else
{
// Search every element of 's' in this set.
// If found, add it to result
for (int i = 0; i < s.n; i++)
if (search(s.dense[i]) != -1)
result->insert(s.dense[i]);
}
return result;
}
// A function to find union of two sets
// Time Complexity-O(n1+n2)
SSet* SSet::setUnion(SSet &s)
{
// Find capacity and maximum value for result
// set.
int uCap = s.n + n;
int uMaxVal = max(s.maxValue, maxValue);
// Create result set
SSet *result = new SSet(uMaxVal, uCap);
// Traverse the first set and insert all
// elements of it in result.
for (int i = 0; i < n; i++)
result->insert(dense[i]);
// Traverse the second set and insert all
// elements of it in result (Note that sparse
// set doesn't insert an entry if it is already
// present)
for (int i = 0; i < s.n; i++)
result->insert(s.dense[i]);
return result;
}
// Driver program
int main()
{
// Create a set set1 with capacity 5 and max
// value 100
SSet s1(100, 5);
// Insert elements into the set set1
s1.insert(5);
s1.insert(3);
s1.insert(9);
s1.insert(10);
// Printing the elements in the data structure.
printf("The elements in set1 are\n");
s1.print();
int index = s1.search(3);
// 'index' variable stores the index of the number to
// be searched.
if (index != -1) // 3 exists
printf("\n3 is found at index %d in set1\n",index);
else // 3 doesn't exist
printf("\n3 doesn't exists in set1\n");
// Delete 9 and print set1
s1.deletion(9);
s1.print();
// Create a set with capacity 6 and max value
// 1000
SSet s2(1000, 6);
// Insert elements into the set
s2.insert(4);
s2.insert(3);
s2.insert(7);
s2.insert(200);
// Printing set 2.
printf("\nThe elements in set2 are\n");
s2.print();
// Printing the intersection of the two sets
SSet *intersect = s2.intersection(s1);
printf("\nIntersection of set1 and set2\n");
intersect->print();
// Printing the union of the two sets
SSet *unionset = s1.setUnion(s2);
printf("\nUnion of set1 and set2\n");
unionset->print();
return 0;
}
输出 :
The elements in set1 are
5 3 9 10
3 is found at index 1 in set1
5 3 10
The elements in set2 are-
4 3 7 200
Intersection of set1 and set2
3
Union of set1 and set2
5 3 10 4 7 200
附加操作:
以下是使用稀疏集也可以有效实现的操作。在已知元素的范围和最大数量的假设下,它的性能优于此处讨论的所有解决方案和基于位向量的解决方案。
联盟():
1)创建一个空的稀疏集,即结果。
2)遍历第一组并将其所有元素插入结果。
3)遍历第二组并将其所有元素插入结果中(请注意,稀疏集不会插入条目(如果已经存在))
4)返回结果。
路口():
1)创建一个空的稀疏集,即结果。
2)让两个给定集合中的较小者为第一集合,而较大的为第二集合。
3)考虑较小的集合,并在第二秒内搜索它的每个元素。如果找到元素,则将其添加到结果中。
4)返回结果。
此数据结构的常见用法是在编译器中使用寄存器分配算法,该算法具有固定的Universe(机器中的寄存器数量),并且在单个处理运行期间会频繁更新和清除(就像Q查询一样)。
参考:
http://research.swtch.com/sparse
http://codingplayground.blogspot.in/2009/03/sparse-sets-with-o1-insert-delete.html