Wavelet Trees

Tags: data-structures, wavelet-trees

Today, I will talk about a new data structure:

I read about them around a week back, and can't find the exact link where this was in discussion.

Consider the following problems, and how I would have solved them normally:

1. Number of elements in subarray $A[L...R]$ that are less than or equal to $y$.

(Persistence Segment Tree?)

2. Number of occurrences of element $x$ in subarray $A[L...R]$.

(Subpart of 1st problem)

3. The $k^{th}$ smallest element in subarray $A[L...R]$.

(Ordered multiset + BIT would work for subarrays beginning from index 1)

I know you might have many other solutions, and you might think what I am trying to prove.

What if I told you, all of the above can be easily done in O(logn) using Wavelet Trees :o. Plus, its very easy to code :D Awesome, isn't it?

I will be covering only range queries today that are mentioned above. There are some update queries too, that I might post in the coming posts.

Okay, so I will be explaining what a wavelet tree is, and how we use it for finding the $k^{th}$ smallest number in any given subarray $A[L...R]$.

I will strongly recommend you to read section 2(around 4 pages) from the above paper. If really lazy, read just 2.5 pages beginning from section 2, and you will get a fair idea of what I am talking below.

So wavelet trees are binary trees, where root node is represented by original array $A$ and the range $[lo, hi]$ in which the elements of array falls.

Let $mid=(lo+hi)/2$

Now, we partition(stable i.e the order of elements do not change) the array $A$ into two parts such that the left part contains all numbers that are $\le mid$, and right part contains all numbers that are $>mid$.

This is something similar to what we do in quicksort. Isn't it?

The range for left part becomes $[lo, mid]$ and for right part becomes $[mid+1, hi]$.

We keep on doing this recursively for left and right parts, so the height of tree will be $O(log(max(A)))$.

You can open the code and refer it as you read all of this.

So for every node $U$ of tree, we have the range $[lo, hi]$ in which its element lies and its array $S[U]$ represents a subsequence of original array.

For every node $U$, we will have a vector $b$ such that $b[i]$ tells us the number of elements from first $i$ elements of $S[U]$ that are $\le mid$. i.e it represents how many elements from first $i$ elements will go to the left subtree of $U$.

We can safely say that $(i-b[i])$ from first $i$ elements will go to right subtree.

Now, the most awesome thing is that elements in subarray $[l, r]$ of $S[U]$ are exactly divided into the left and right subtrees of $U$ depending on whether the element is less than or greater than mid value.

Okay, I will explain a bit more in detail.

Let's consider the subarray of type $[1...R]$.

Now $b[R]$ represents the number of elements that go to left subtree of $U$.

The rest $(R-b[R])$ nos go to right subtree of $U$.

So, we can query the present subarray $[1...R]$ by querying the starting $b[R]$ elements of left subtree and first $(R-b[R])$ elements of right subtree.

When we want to do same thing on general subarrays $[L...R]$, we can see that $b[L-1]$ nos go to left subtree from first $(L-1)$ elements, and $b[R]$ nos go to left from first $R$ elements. This means that those elements in range $[L...R]$ that go to left are exactly represented by nos from position $(b[L-1]+1)$ to $b[R]$ in the left subtree.

Tags: data-structures, wavelet-trees

Today, I will talk about a new data structure:

**Wavelet Trees.**I read about them around a week back, and can't find the exact link where this was in discussion.

**Anyway, you can read about it from this paper. It's something really powerful, easy to code, and perhaps very new to competitive programming(I haven't heard about it in my experience of 2 years).**

Consider the following problems, and how I would have solved them normally:

1. Number of elements in subarray $A[L...R]$ that are less than or equal to $y$.

(Persistence Segment Tree?)

2. Number of occurrences of element $x$ in subarray $A[L...R]$.

(Subpart of 1st problem)

3. The $k^{th}$ smallest element in subarray $A[L...R]$.

(Ordered multiset + BIT would work for subarrays beginning from index 1)

I know you might have many other solutions, and you might think what I am trying to prove.

What if I told you, all of the above can be easily done in O(logn) using Wavelet Trees :o. Plus, its very easy to code :D Awesome, isn't it?

I will be covering only range queries today that are mentioned above. There are some update queries too, that I might post in the coming posts.

Okay, so I will be explaining what a wavelet tree is, and how we use it for finding the $k^{th}$ smallest number in any given subarray $A[L...R]$.

**TUTORIAL**I will strongly recommend you to read section 2(around 4 pages) from the above paper. If really lazy, read just 2.5 pages beginning from section 2, and you will get a fair idea of what I am talking below.

So wavelet trees are binary trees, where root node is represented by original array $A$ and the range $[lo, hi]$ in which the elements of array falls.

Let $mid=(lo+hi)/2$

Now, we partition(stable i.e the order of elements do not change) the array $A$ into two parts such that the left part contains all numbers that are $\le mid$, and right part contains all numbers that are $>mid$.

This is something similar to what we do in quicksort. Isn't it?

The range for left part becomes $[lo, mid]$ and for right part becomes $[mid+1, hi]$.

We keep on doing this recursively for left and right parts, so the height of tree will be $O(log(max(A)))$.

You can open the code and refer it as you read all of this.

So for every node $U$ of tree, we have the range $[lo, hi]$ in which its element lies and its array $S[U]$ represents a subsequence of original array.

For every node $U$, we will have a vector $b$ such that $b[i]$ tells us the number of elements from first $i$ elements of $S[U]$ that are $\le mid$. i.e it represents how many elements from first $i$ elements will go to the left subtree of $U$.

We can safely say that $(i-b[i])$ from first $i$ elements will go to right subtree.

Now, the most awesome thing is that elements in subarray $[l, r]$ of $S[U]$ are exactly divided into the left and right subtrees of $U$ depending on whether the element is less than or greater than mid value.

Okay, I will explain a bit more in detail.

Let's consider the subarray of type $[1...R]$.

Now $b[R]$ represents the number of elements that go to left subtree of $U$.

The rest $(R-b[R])$ nos go to right subtree of $U$.

So, we can query the present subarray $[1...R]$ by querying the starting $b[R]$ elements of left subtree and first $(R-b[R])$ elements of right subtree.

When we want to do same thing on general subarrays $[L...R]$, we can see that $b[L-1]$ nos go to left subtree from first $(L-1)$ elements, and $b[R]$ nos go to left from first $R$ elements. This means that those elements in range $[L...R]$ that go to left are exactly represented by nos from position $(b[L-1]+1)$ to $b[R]$ in the left subtree.

**Kth Smallest Element in subarray [L...R]**
So, if the query is find Kth smallest element in subarray $[L...R]$, lets begin from the root.

Now, $inLeft = (b[R] - b[L-1])$ gives the number of elements from subarray $[L...R]$ that go to left subtree. If this number is more than $K$, then we recursively go to left subtree.

Now, $inLeft = (b[R] - b[L-1])$ gives the number of elements from subarray $[L...R]$ that go to left subtree. If this number is more than $K$, then we recursively go to left subtree.

Otherwise, we go recursively to right subtree and find the $(K-inLeft)$ smallest element in that.

Clearly, the complexity is the height of tree i.e $O(log(maxValue))$.

Refer this code for understanding the implementation and explanation mentioned above.

Finally, this is the complete code that solves all the 3 queries mentioned above.

If you have any doubts, read the paper as mentioned (Section 2), and read the code for better understanding. If doubts still persist, write down below.

Note that there can be space issues with Wavelet Trees if array elements value are around $10^9$. You can avoid them by using coordinate compression. Another approach is to use Wavelet Matrix(Section 4.2) in above paper.

PS: I am really sleepy, and not sure if there are any bugs. I don't think there are but still its better if you test them by submitting on some related problem.

Made slight modification to your code to get the sum of elements less than or equal to k in [l..r] range. https://ideone.com/aYGK7v

ReplyDeleteYes that can also be done :) Nice observation though!

DeleteWhat purpose do the sum function serve?

DeleteTo get the sum of elements less than or equal to k in a given range l, r.

DeleteOK, that's nice.

DeleteThis comment has been removed by the author.

ReplyDeleteI am unable to get correct answer on this Problem www.spoj.com/problems/MKTHNUM/ which is direct implementation of above Data structure. Not able to find bugs in your code.

ReplyDeleteDid you account for negative values? you would have to modify the code to build the tree from negative max value till positive max value, unfortunately that takes too much time and you will get a TLE.

DeleteOne more addition that can be done here to reduce the complexity from O(log(max(A))) to O(log(n)), where n is the number of elements in the array. For this, we can sort the array, and assign unique index positions(mapping) to each possible value in the array, then replace the array with those mappings, then build the array on top of the modified array. For MKTHNUM, this can be done as follows: https://ideone.com/e84JLY

ReplyDelete*then build the wavelet tree on top of the modified array.

Delete