this June Challenge Problem from Codechef.

The beauty about the problem is the solution. I broke down the problem into subproblems, solved them, combined them. Though there exist multiple solutions, I think my solution is worth sharing. You can surely learn a lot of things from this ;)

The problem is: you are given an array, and in each query you receive two subarrays. You need to tell whether these subarrays are almost similar.

Two subarrays are almost similar if after you sort them both, there's atmost one mismatch.

So how do I do this in O(logn) per query? Or maybe O(1) ?

I was extremely puzzled how shall I carry this forward and build a solution.

Using hashing, I can compute hash value of any subarray in O(1) using partial sums.

But this only allows to compare whether two subarrays are IDENTICAL, and what we need is to allow a mismatch.

If you do not understand any of the following, take time and think. It will be clear.

So, here is what I did:

1. Compute the hash values of both subarrays: $h1,h2$

2. If they are equal, we are done.

3. Otherwise, let $val = |h1-h2|$ and let $x,y$ be the mismatched elements in two subarrays.

4. Then I computed xor of both subarrays. This gives us xor of $x$ and $y$.

5. This can never be zero, so there's a bit set in this. Otherwise, print NO. Let the bit set be $k^{th}$ bit.

6. I found the xor of both subarrays such that I only took those numbers into consideration whose $k^{th}$ bit is set. Let that be $v1,v2$ respectively.

Now xor of $v1, v2$ will exactly give us that value of one of $x,y$. Think why this is true.

For now, let's assume we get the value $x$.

Then I check whether the resultant value ($x$)is present in a subarray or not. If it isn't present in any subarray, print NO.

We can find the other variable i.e $y$. How?

1. Let $hash[x]$ be the hashed value of numebr $x$.

2. Then the hash sum of other elements that are common to both, $com=h1-hash[x]$.

3. Then what is $tem=(h2-com)$?

$tem$ is the hashed value of element $y$. So we check in reverse map to see the corresponding number, and finally get the value of $y$. If there is no number present in reverse map, print NO.

Now we have successfully found the mismatch pair. But we must keep in mind that if we have come this far, this doesn't mean the answer exist. The subarrays still might NOT be almost similar.

See this

$A=\{1,3,5\}$

$B=\{1,5,6\}$

The mismatch elements are $x=3,y=6$ as other elements are same. But once we sort them, their are more than 1 indices where a mismatch occurs. This is because the rank of mismatch pairs wasn't same.

Now only thing left is to check if the rank of $x$ in first subarray and rank of $y$ in second subarray is same. For this I used a bit with nodes being an ordered multiset.

I really felt so good after doing this.

Finally the AC code for this problem is here.

Today, I will talk about The beauty about the problem is the solution. I broke down the problem into subproblems, solved them, combined them. Though there exist multiple solutions, I think my solution is worth sharing. You can surely learn a lot of things from this ;)

The problem is: you are given an array, and in each query you receive two subarrays. You need to tell whether these subarrays are almost similar.

Two subarrays are almost similar if after you sort them both, there's atmost one mismatch.

So how do I do this in O(logn) per query? Or maybe O(1) ?

I was extremely puzzled how shall I carry this forward and build a solution.

Using hashing, I can compute hash value of any subarray in O(1) using partial sums.

But this only allows to compare whether two subarrays are IDENTICAL, and what we need is to allow a mismatch.

If you do not understand any of the following, take time and think. It will be clear.

So, here is what I did:

1. Compute the hash values of both subarrays: $h1,h2$

2. If they are equal, we are done.

3. Otherwise, let $val = |h1-h2|$ and let $x,y$ be the mismatched elements in two subarrays.

4. Then I computed xor of both subarrays. This gives us xor of $x$ and $y$.

5. This can never be zero, so there's a bit set in this. Otherwise, print NO. Let the bit set be $k^{th}$ bit.

6. I found the xor of both subarrays such that I only took those numbers into consideration whose $k^{th}$ bit is set. Let that be $v1,v2$ respectively.

Now xor of $v1, v2$ will exactly give us that value of one of $x,y$. Think why this is true.

For now, let's assume we get the value $x$.

Then I check whether the resultant value ($x$)is present in a subarray or not. If it isn't present in any subarray, print NO.

We can find the other variable i.e $y$. How?

1. Let $hash[x]$ be the hashed value of numebr $x$.

2. Then the hash sum of other elements that are common to both, $com=h1-hash[x]$.

3. Then what is $tem=(h2-com)$?

$tem$ is the hashed value of element $y$. So we check in reverse map to see the corresponding number, and finally get the value of $y$. If there is no number present in reverse map, print NO.

Now we have successfully found the mismatch pair. But we must keep in mind that if we have come this far, this doesn't mean the answer exist. The subarrays still might NOT be almost similar.

See this

$A=\{1,3,5\}$

$B=\{1,5,6\}$

The mismatch elements are $x=3,y=6$ as other elements are same. But once we sort them, their are more than 1 indices where a mismatch occurs. This is because the rank of mismatch pairs wasn't same.

Now only thing left is to check if the rank of $x$ in first subarray and rank of $y$ in second subarray is same. For this I used a bit with nodes being an ordered multiset.

I really felt so good after doing this.

Finally the AC code for this problem is here.

Another approach for one mismatch could be to find longest prefix and longest suffix matches for the given segments using persistent segtree. For one mismatch, the sum of those two should be equal to (length of segment - 1).

ReplyDeleteAh nice!

DeleteThis is really a nice approach.

ReplyDeleteEven I solved using hashing but I tried to detect similarity using the frequency distribution of the 2 sub-arrays . If fA and fB denotes the frequency distribution of the 2 sub-arrays then the necessary and sufficient conditions for exactly one dissimilarity is :

There exists exactly 2 numbers i and j such that

1). fA[i] = fB[i] - 1 and fA[j] = fB[j] + 1

2). and there are no numbers in the range [i ,j] in both the sub-arrays.

When I was solving this problem , I was thinking of much general problem namely finding the hamming distance between two sub-strings. Is there any known algorithm or a widely known way to solve it.

Please if you can tell me how you will hash any subarray in O(1) using partial sum.

ReplyDeleteIf you will share some link or just explain it will be very helpful:)

let dp[i] = sum of hash values of a[1],a[2],...,a[i].

DeleteThen dp[0] = 0, dp[i] = h[a[i]] + dp[i-1].

hash value for subarray A[L...R] = dp[R] - dp[L-1].

This comment has been removed by the author.

DeleteNice approach :D I've learned a lot from your blog to this problem! I was unable to solve it :P

ReplyDeleteIf you have x and x^y, you immediately get y

ReplyDeleteNice solution, btw!

DeleteYes, you are right.

DeleteThanks.

DeleteAlso you can solve last subproblem offline for all queries with classic Bit.

ReplyDeleteHmm I was thinking about your solution when I was running today. And I'm afraid that it may not be correct. Consider this case:

ReplyDelete{a1, a2, ..., x, b1, b2, ...}

{c1, c2, ..., y, d1, d2, ...}

It may happen that two sets are not similar even though rank of x and y is the same. It's enough to find such number a_i, b_i, c_i and d_i that a1 ^ a2 ^ ... ^ b1 ^ b2 ^ ... c1 ^ c2 ^ ... ^ d1 ^ d2 ^ ... = 0 and some a_i != c_i or b_i != d_i.

Example:

{0, 0, 6, 15}

{1, 2, 7, 12}

x = 6, y = 7

It's not easy, so probably testcases do not contain such example.

Am I right?

You have not understood how my algorithm works or missing the fact that I validate my solution by checking the hash sums.

DeletePlease point me where I'm wrong.

Delete