Sunday 15 June 2014

The 3SUM Problem

Another very frequently-visited problem is "The 3SUM Problem", sometimes also known as the Subset Sum Problem This problem lays path to many concepts in Algorithmic Science such as sub-quadratic time problems, 3SUM hard problems, etc.

INPUT :

A sequence (taken in as an array A) of n distinct integers, preferably whose absolute value (modulus) can be given by a polynomial p(n).

OUTPUT :
The combinations of 3 integers from the sequence whose sum equals 0 i.e. the numbers i, j and k such that i+j+k = 0. Note that the integers i, j, k have to be distinct and the combinations themselves have to be distinct. For example, i,j,k and k,j,i represent the same combination and therefore, are not distinct.


THE "BRUTE FORCE" METHOD :
The obvious and perhaps the most trivial method would be to scan the array 3 times in one direction (to ensure distinctness of the combinations) and take every possible combination of 3 integers and check whether their sum is 0. For an n-input array, the number of combinations will be nC3 = O(n3). The following algorithm is an illustration of the above method.

Algorithm 3SUM (A, n) 
  1. for i = 0 upto (n-2)
  2.        for j = i+1 upto (n-1)
  3.              for k = j+1 upto n
  4.                     if (A[i] + A[j] + A[k] == 0) then print A[i], A[j], A[k] 
Thus, we see that the algorithm is nothing but 3 nested for loops giving a performance of O(n3). Can we do any better?

THE BINARY-SEARCH METHOD :
This method uses the idea that instead of traversing the array for 3 elements requiring 3 nested for loops, we can traverse the array only 2 times using the following simple manipulation:
                            if x+y+z = 0, then z = - (x+y)
Thus, if we choose a particular "x" and a "y", we just need to search the array for the "z". This searching can be done by an efficient search technique, The Binary Search Technique. This technique, however requires us to first sort the array, hence that will be our first step. 

We assume a routine binarySearch(Array, low, high, key) which searches for "key" within the bounds "low" and "high" and returns the index of the key, if found or if not found; returns -1.

Algorithm 3SUM PLUS (A, n)
  1. sort(A)
  2. for i = 0 upto n
  3.       for j = i+1 upto (n-1)
  4.            k = binarySearch(A, j, n, -(A[i] + A[j]) )
  5.            if(k != -1) then print A[i], A[j], k
The if statement is executed in O(1). We know that binarySearch has a performance of O(log2n). However, it is called within 2 nested for loops i.e. nC2 = O(n2) times, thus giving the routine 3SUM PLUS  a performance of  O(n2*log2n). Note that the method used for sorting can be anything from insertion (O(n2)) to merge sort (O(n*log2n)) as they are both asymptotically smaller than O(n2*log2n). However, can we do still better? (obviously we can, or else this question wouldnt be here!)

THE HASH MAP METHOD :
We continue with the above idea of traversing the array only twice instead of 3 times. We will prove later that the sorting technique used doesn't play a role in the final performance, so the only scope of improvement is in the search technique. We need a search technique faster than the Binary Search. Hash-based searching is an example. Searches performed using a hash table take O(1) if we know the hash index or the function that gives the hash index. Therefore, we will copy the entire array into a hash table and then inside the nested for loop, we search the hash table for the integer z = - (x+y). 

Algorithm 3SUM PLUS PLUS (A, n)
  1. sort(A)
  2. for i = 0 upto n
  3.       hash.put(<unique key determined by a hash function>, A[i])
  4. for ends
  5. for i = 0 upto n
  6.       for j = i+1 upto (n-1)
  7.           k = hash.search(- (A[i] + A[j]) )
  8.           if (k != null) then print A[i], A[j], k        
  9.   nested for ends
Here again, we have 2 for loops nested giving O(n2). Thus again, the method used in sorting doesn't affect the overall performance. 3SUM PLUS PLUS routine takes O(n2) which is asymptotically better than O(n2*log2n) and O(n3). 
Wikipedia offers another interesting method which also gives O(n2) but does not use hash tables. Go, have a look here

FUN FACT : Although it hasn't been successfully proved that the 3SUM problem can't be solved faster than O(n2), no better algorithm has yet been found. Do you have it in you to find it? Leave us a comment!

No comments:

Post a Comment