Friday, June 30, 2017

An Interesting Problem On Array Conversion [AtCoder]

rachit jain blog, Rachit Jain, rachit jain iit roorkee, rachit jain iitr, rachit iitr,rachit codechef, rachit hackerrank, rachit jain microsoft
Tags: directed-graphs, hamiltonian paths..

Today, I will talk about a problem from Atcoder.  
You are given 2 arrays A,B. You need to tell the minimum no of moves to convert A into B. In each move, you replace any A[i] by the xor of all elements in A. If its not possible to convert A to B, print -1.

Let us use 1-based indexing.
Now, place xor(A) at position 0. Then the move is basically swapping numbers at position 0 and some arbitrary position i.
So answer will only exist if both multisets of A, B (after including the xor of whole array) are same.

You can refer to editorial given at AtCoder, but I find my approach a bit easier to understand, though they are more or less the same thing.

So, in each move we swap a number $x$ at position $0$ with a number $y$ at position $i$.
Let this be represented by a directed edge from $x$ to $y$. Then, in next move, we will swap $y$ with some $z$, then $z$ with blabla... and finally swap some blabla with $b[0]$.

Isn't this a Hamiltonian Path from $a[0]$ to $b[0]$?

Note that a directed edge from $x$ to $y$ means that the correct position of $x$ is actually the position of $y$ in $A$.
$x→y$ means $x$ is at 0 position, and we place it at position of $y$, and bring $y$ to position $0$.
So an edge represents putting $x$ into its correct position, and then bringing $y$ to position $0$ so that we can place it in its correct position, and so on.

Consider an example,
A: [1, 3, 2]
B: [2, 1, 3]

Now using the available move option of swapping any element with first element, we need to transform array $A$ into $B$.

We perform 1→3 , $A$: [3,1,2]
We perform 3→2, $A$: [2,1,3]

You can see that the set of collection of edges denoting our operations are same as the set of collection of edges {b[i]→a[i] , 0 $\le i \le n$}
This is because for given $a[0]$, and let its position in $B$ be $i$, then we basically swap the numbers at position $0$ and $i$. Thus, the edge $a[0]→a[i]$ is exactly represented by $b[i]→a[i]$.

Now, $B$ is nothing but a permutation of $A$, and our edges are $b[i]→a[i]$. This is a standard thing that we often see in the world of competitive programming. The resultant graph is basically a forest of cycles.

Now since we want a hamiltonian path from $a[0]$ to $b[0]$, we obviously need connectivity. For all cycles that aren't a part of $a[0]$ we will have to break down and re-edit their edges a little bit so that those cycles will also contain $a[0]$ as an element in it.

If you didn't understand what I said above, try converting [1,3,2,7,8,9] to [2,1,3,9,7,8] by using move operation of swapping elements at position 0 and some arbitrary position $i$.

In terms of math, let $C$ be a cycle that doesn't yet have $a[0]$ as a member in it. (For eg: in above example, $7→8→9→7$ is a cycle that doesn't include a[0] i.e 1). Pick two consecutive edges from $C$, $x→y$ and $y→z$.
This means that when $x$ comes at pos $0$, we put it at position of $y$,
and then put $y$ at position of $z$.
Now see what happens when we swap $a[0]$ with $y$. Now $y$ comes at position $0$, and its destination position is still the position of $z$.
But destination position of $x$ is now the position of $a[0]$ after swap as it took the place of $y$.
And finally $a[0]$ has to come back to position $0$ , so destination position of $a[0]$ now becomes position of $y$.
So the new edges are $x→a[0]$, $a[0]→y$ and $y→z$. So $a[0]$ is successfully inserted into the cycle $C$ at the cost of one additional edge.

NOTE:
Bringing $a[0]$ into the cycle was necessary because our move operation allows us to swap any number with the number at position $0$. So once we get $a[0]$, we can apply the operation again and again i.e traverse the edges until the cycle is completed and all numbers are placed at their correct position.  

I hope this opens up your mind a bit. You now have the tools needed. I will recommend you to play around with it, consider a few examples, make the graphs, see how things are functioning.

Finally here is the AC code.

Saturday, June 24, 2017

Introduction to a New Data Structure: WAVELET TREES

Wavelet Trees, Wavelet trees editorial, wavelet trees tutorial, wavelet tree rachit jain, Rachit Jain, rachit jain iit roorkee, rachit iitr, rachit jain iitr,rachit codechef, rachit hackerrank, rachit jain microsoft
Wavelet Trees
Tags: data-structures, wavelet-trees

Today, I will talk about a new data structure: Wavelet Trees.  
I read about them around a week back, and can't find the exact link where this was in discussion.

Anyway, you can read about it from this paper. It's something really powerful, easy to code, and perhaps very new to competitive programming(I haven't heard about it in my experience of 2 years).

Consider the following problems, and how I would have solved them normally:
1. Number of elements in subarray $A[L...R]$ that are less than or equal to $y$.
(Persistence Segment Tree?)
2. Number of occurrences of element $x$ in subarray $A[L...R]$.
(Subpart of 1st problem)
3. The $k^{th}$ smallest element in subarray $A[L...R]$.
(Ordered multiset + BIT would work for subarrays beginning from index 1)

I know you might have many other solutions, and you might think what I am trying to prove.

What if I told you, all of the above can be easily done in O(logn) using Wavelet Trees :o.  Plus, its very easy to code :D Awesome, isn't it?

I will be covering only range queries today that are mentioned above. There are some update queries too, that I might post in the coming posts.

Okay, so I will be explaining what a wavelet tree is, and how we use it for finding the $k^{th}$ smallest number in any given subarray $A[L...R]$.

TUTORIAL
I will strongly recommend you to read section 2(around 4 pages) from the above paper. If really lazy, read just 2.5 pages beginning from section 2, and you will get a fair idea of what I am talking below.

So wavelet trees are binary trees, where root node is represented by original array $A$ and the range $[lo, hi]$ in which the elements of array falls.
Let $mid=(lo+hi)/2$
Now, we partition(stable i.e the order of elements do not change) the array $A$ into two parts such that the left part contains all numbers that are $\le mid$, and right part contains all numbers that are $>mid$.
This is something similar to what we do in quicksort. Isn't it?
The range for left part becomes $[lo, mid]$ and for right part becomes $[mid+1, hi]$.

We keep on doing this recursively for left and right parts, so the height of tree will be $O(log(max(A)))$.

You can open the code and refer it as you read all of this.

So for every node $U$ of tree, we have the range $[lo, hi]$ in which its element lies and its array $S[U]$ represents a subsequence of original array.
For every node $U$, we will have a vector $b$ such that $b[i]$ tells us the number of elements from first $i$ elements of $S[U]$ that are $\le mid$. i.e it represents how many elements from first $i$ elements will go to the left subtree of $U$.
We can safely say that $(i-b[i])$ from first $i$ elements will go to right subtree.

Now, the most awesome thing is that elements in subarray $[l, r]$ of $S[U]$ are exactly divided into the left and right subtrees of $U$ depending on whether the element is less than or greater than mid value.

Okay, I will explain a bit more in detail.
Let's consider the subarray of type $[1...R]$.
Now $b[R]$ represents the number of elements that go to left subtree of $U$.
The rest $(R-b[R])$ nos go to right subtree of $U$.
So, we can query the present subarray $[1...R]$ by querying the starting $b[R]$ elements of left subtree and first $(R-b[R])$ elements of right subtree.

When we want to do same thing on general subarrays $[L...R]$, we can see that $b[L-1]$ nos go to left subtree from first $(L-1)$ elements, and $b[R]$ nos go to left from first $R$ elements. This means that those elements in range $[L...R]$ that go to left are exactly represented by nos from position $(b[L-1]+1)$ to $b[R]$  in the left subtree.

Kth Smallest Element in subarray [L...R]
So, if the query is find Kth smallest element in subarray $[L...R]$, lets begin from the root.
Now, $inLeft = (b[R] - b[L-1])$ gives the number of elements from subarray $[L...R]$ that go to left subtree. If this number is more than $K$, then we recursively go to left subtree. 
Otherwise, we go recursively to right subtree and find the $(K-inLeft)$ smallest element in that.  
Clearly, the complexity is the height of tree i.e $O(log(maxValue))$.  

Refer this code for understanding the implementation and explanation mentioned above.

Finally, this is the complete code that solves all the 3 queries mentioned above. 

If you have any doubts, read the paper as mentioned (Section 2), and read the code for better understanding. If doubts still persist, write down below.  

Note that there can be space issues with Wavelet Trees if array elements value are around $10^9$. You can avoid them by using coordinate compression. Another approach is to use Wavelet Matrix(Section 4.2) in above paper.

PS: I am really sleepy, and not sure if there are any bugs. I don't think there are but still its better if you test them by submitting on some related problem.

Wednesday, June 14, 2017

A Superb Problem on Hashing + Queries on Array [CodeChef]

Rachit Jain, rachit jain blog, rachit jain iitr, rachit iit roorkee, rachit iitr, rachit codechef, rachit hackerrank, rachit jain microsoft
Today, I will talk about this June Challenge Problem from Codechef.

The beauty about the problem is the solution. I broke down the problem into subproblems, solved them, combined them. Though there exist multiple solutions, I think my solution is worth sharing. You can surely learn a lot of things from this ;)

The problem is: you are given an array, and in each query you receive two subarrays. You need to tell whether these subarrays are almost similar.

Two subarrays are almost similar if after you sort them both, there's atmost one mismatch.

So how do I do this in O(logn) per query? Or maybe O(1) ?
I was extremely puzzled how shall I carry this forward and build a solution.
Using hashing, I can compute hash value of any subarray in O(1) using partial sums.
But this only allows to compare whether two subarrays are IDENTICAL, and what we need is to allow a mismatch.

If you do not understand any of the following, take time and think. It will be clear.

So, here is what I did:
1. Compute the hash values of both subarrays: $h1,h2$
2. If they are equal, we are done.
3. Otherwise, let $val = |h1-h2|$  and let $x,y$ be the mismatched elements in two subarrays.
4. Then I computed xor of both subarrays. This gives us xor of $x$ and $y$.
5. This can never be zero, so there's a bit set in this. Otherwise, print NO. Let the bit set be $k^{th}$ bit.
6. I found the xor of both subarrays such that I only took those numbers into consideration whose $k^{th}$ bit is set. Let that be $v1,v2$ respectively.

Now xor of $v1, v2$ will exactly give us that value of one of $x,y$.  Think why this is true.
For now, let's assume we get the value $x$.
Then I check whether the resultant value ($x$)is present in a subarray or not. If it isn't present in any subarray, print NO.

We can find the other variable i.e $y$. How?
1. Let $hash[x]$ be the hashed value of numebr $x$.
2. Then the hash sum of other elements that are common to both, $com=h1-hash[x]$.
3. Then what is $tem=(h2-com)$?

$tem$ is the hashed value of element $y$. So we check in reverse map to see the corresponding number, and finally get the value of $y$. If there is no number present in reverse map, print NO.


Now we have successfully found the mismatch pair.  But we must keep in mind that if we have come this far, this doesn't mean the answer exist. The subarrays still might NOT be almost similar.
See this
$A=\{1,3,5\}$
$B=\{1,5,6\}$
The mismatch elements are $x=3,y=6$ as other elements are same. But once we sort them, their are more than 1 indices where a mismatch occurs. This is because the rank of mismatch pairs wasn't same.
Now only thing left is to check if the rank of $x$ in first subarray and rank of $y$ in second subarray is same. For this I used a bit with nodes being an ordered multiset.

I really felt so good after doing this.

Finally the AC code for this problem is here.

Art of Time Management | Achieving Multiple Things in Life

rachit jain blog, Rachit Jain, rachit jain iit roorkee, rachit jain iitr, rachit iitr,rachit codechef, rachit hackerrank, rachit jain micr...