Friday 30 May 2014

The Maximum Subarray Problem

Let us take another common problem in the world of computing and see various algorithms to solve it. Once again, we will start will a trivial or a "naive" method and then analyze and try to improve the efficiency of our algorithms.

INPUT :
A sequence of n real numbers (can be positive or negative)  which may indicate real-world numbers like stock prices over a period, or the business profit sheet of an individual over a period, where negative numbers indicate loss.
OUTPUT :
A smaller sequence from the original sequence wherein the numbers' sum is the maximum. This sequence may indicate the period wherein its most profitable to hold a stock as its value is the maximum or the period wherein the individual made the maximum profit. Since, practically, the sequence of numbers will be inputted as an array by the computer, we need to find the maximum non-empty contiguous subarray of that array.

1) THE TRIVIAL METHOD :
The trivial method is to traverse the entire array once and look for the difference in every combination-pair of the numbers. For an n-element array, the number of possible combinations will be nC2= n(n-1)/2. This takes 
ϴ(n2) asymptotically. (Recall that ϴ notation denotes asymptotically bound function between two functions c1f(n) and c2f(n), thus giving the "average case".) Also, the comparison of the various differences will take a constant amount of time, thus giving the trivial method a Ω(n2).

2) THE DIVIDE AND CONQUER METHOD :
The Divide and Conquer approach is a very common method in Computer Science problems. It works on 3 basic stages:
  1. Divide
  2. Solve/Conquer
  3. Merge
Recursion is very widely used in divide-and-conquer approaches. Our divide-and-conquer algorithm works on the principle that an array can be divided into m (preferably equal) parts and then we can search each part-array for the maximum subarray. The combination of the maximum number of such contiguous part-subarrays will eventually merge into the required subarray. For our convenience, let us take m to be 2. Our algorithm has the following stages.
  1. Recursively divide the array into 2 equal parts
  2. Find the maximum subarray on the left half of the array
  3. Find the maximum subarray on the right half of the array
  4. Find the maximum subarray that "crosses over" the division i.e. that contains the point of division
Our pseudocode of the above algorithm accepts 3 parameters: The array A, a starting index "low", an ending index "high" and returns 3 parameters: a starting index lower, an ending index "higher" and the maximum sum "maxSum".

Algorithm findMaxSubarray(A, low, high)
  1. if low == high then return (low, high, A[low])
  2. mid = ⌊(low+high)/2⌋
  3. (left-low, left-sum, left-high) = findMaxSubarray(A, low, mid)
  4. (cross-low, cross-sum, cross-high) = findMaxCrossOverSubarray(A, low, mid, high)
  5. (right-low, right-sum, right-high) = findMaxSubarray(A, mid, high)
  6. if left-sum > cross-sum and left-sum > right-sum then return (left-low, left-sum, left-high)
  7. else if right-sum > cross-sum and right-sum > left-sum then return (right-low, right-sum, right-high)
  8. else return (cross-low, cross-sum, cross-high)
The subroutine "findMaxCrossOverSubarray" returns the maximum subarray that "crosses-over" the midpoint "mid". The pseudocode of this subroutine is:

Algorithm findMaxCrossOverSubarray(A, low, mid, high)
  1. leftSum = rightSum = -
  2. leftIndex = rightIndex = -
  3. sum = 0
  4. for i = mid downto low
  5. sum += A[i]
  6. if leftSum < sum then leftSum = sum, leftIndex = i
  7. for j = mid upto high
  8. sum += A[j]
  9. if rightSum < sum then rightSum = sum, rightIndex = j
  10. return (leftIndex, leftSum+rightSum, rightIndex)
It can be proved that the above pseudocodes take ϴ(n*log2n), which is way better than the trivial method. The java version of the above pseudocodes is as follows:

NOTE :
As we are attempting to return more than one parameter from the function, we cannot do it in Java. Thus, we encapsulate all the 3 different entities into a user-defined data type (class). This introduces the overhead of instantiating the class objects and their handling. Thus, we see, that although the time asymptotically remains ϴ(n*log2n), the overhead memory allocation increases in Java. In languages like Python where functions can return more than one variable, this overhead is avoided. This is a classic example of the constraints introduced during actual application of algorithms into languages.

KADANE'S ALGORITHM:
We end our discussion with the best possible algorithm for the problem stated. Kadane introduced an algorithm to solve the maximum subarray problem in O(n) since it is nothing but a for loop which is traversed n times. Kadane's algforithm is an implementation of "the two finger rule" which states that one finger is kept at the first index of the array and another finger is allowed to scan through the array finding the sum of the elements as it goes". The java program which implements Kadane's Algorithm is:


No comments:

Post a Comment