Programming Concepts And Algorithms: Arrays

Sunday, 8 May 2016

Translate Numbers to Their English Phrases

Problem Statement:
Write a program that takes an integer between 0-99,999 as input and output its English phrase.
For example :

Input: 77890
Output: Seventy Seven Thousand, Eight Hundred, and Ninety

Breaking down the problem:
Such programs are commonly used in NLP to communicate with a user. They rarely use complex data structures or the programs are rarely complicated by themselves. They mostly are a chain of if-else conditions that handle particular cases. However, the sharpness of the programmer is tested here in the sense that he/she should handle all the possible inputs.

If one observes above output, the commas and the word "and" are not associated with any digit in the number. However, according to the problem statement, they are very important. Such elements of the program, which need to be added for the sake of making the output "look better" or "closer to English" are often the main hurdles.

Hence, there are 2 main sub-problems here. The first is to interpret the digits and convert them into their English words (taking care of their place in the number like the unnits place, tens place, etc.) and second is to take into account all the commas and the word "and" along with their placement.

Approach:
The main hint here is the range of inputs - from 0 to 99,999. If one observes carefully, the last two digits and the first two digits always make one phrase in English. In the above input, the last digits "90" form Ninety and the first two digits "77" form Seventy Seven (along with thousand - their place value, but thats easy to predict). Hence, these will be considered in pairs. The third digit i.e. the digit in the hundred's place, is always alone. There is always a "Seven Hundred" or a "Three Hundred". There is never a "Seventy Three Hundred". Hence, logically, this is to be considered alone.

Our first step should be to teach our program the English words, starting with the basics. The program needs to know that 0 becomes "Zero" and 1 becomes "One" and so on. Additionally, it also needs to know that 2 in tens place (or in the higher position whenever pairs are considered by the above logic) becomes "Twenty", 3 in similar position becomes "Thirty" and so on.

English is a funny language. It is widely regarded as the toughest language for a program to learn, because of its sudden and seemingly illogical variations. For example, 11 has to become "Eleven" and not "Onety One". There is no reason why it cannot be "Onety One" for the sake of symmetry. But, such is life, and hence our program has to now learn about the phrases for the numbers from 11-19.

Trivia: The oldest Indian language, Sanskrit, is widely regarded as the easiest language for a program to learn owing to its symmetry.

The next step is to implement the above approach. This is easy enough to do with a nested if-else tree. However, identifying the boundary conditions and all the cases and accomodating them was a pain to the author himself. For example, it is easy to overlook a small flaw in the above approach. If we take digits in pairs and check for words for them in an array, it becomes difficult for numbers with 0 as the lower number. For example, 77 becomes "Seventy Seven" by 70 is just "Seventy" and not "Seventy Zero". Hence, we need to make the 0th index of our array "unitsPlace" as an empty string. However, another issue now arises - how to handle if the input itself is just 0? This can be handles by adding yet another condition which checks if the input is 0, if it is - it simply prints "Zero" and exits. Any other 0s that might come will be translated to the null string.

The final step for ", and" is added to check if a phrase is not left hanging with those words as its last. There needs to be some word after "and" or else the program has some error.

Friday, 11 December 2015

The N Transmitters Problem

The Problem:
   There is a one-dimensional road of infinite length, which extends from -∞ to +∞. There are "n' one-dimensional transistors placed on the road at various points. Each transistor emits a signal. For every transistor, the signal can be transmitted across a region "r" which may differ from transmitter to transmitter. Since the road is one-dimensional, by radius "r", we mean that the signal can be received up to "r" distance to its left and "r" distance to its right.
   There is a group of one-dimensional beings that wish to stay on this one-dimensional road. A region on this road is said to be habitable if and only if it can get at least "k" signals from "k" different transmitters. Given the road and the transmitters and their signal radius and the value of "k", we need to find out all the regions that can be considered habitable.

Assumptions:
   1) There is no concept of signal strength in our world. A signal will have full strength up to its radius "r" and beyond that, it will have zero strength.
   2) We have considered, without the loss of generality in the algorithm, the coordinates and the ranges of the transmitters to be purely integers. There will be no change in the algorithm in case we take them as floats, or doubles.
   3) There will not be more than one transmitter on a point. If that is the case, the transmitter whose signal can travel farthest will be considered.

Brute Force Approach:
The brute force approach would be to keep a set of each n transmitters and the regions to which each can provide signals. In this case, the required answer would be the intersection of any k sets of the n total sets just formed. The time complexity of this approach is O(ⁿC_k)

O(nLogn) Approach:

   Algorithm Axiom: Solutions become much more efficient when the data is arranged in some particular order.
                Imagine us being a part of the one-dimensional world mentioned above. We have an infinite road in front of us. Beginning our search from somewhere in the middle is difficult because we do not know about the transmitters to our left and right. So we can make no sound decision about which direction to start looking for habitable regions in, If however, say, someone tells us that the region we are currently in is the left-most end of the range of the left-most transmitter, we know for a fact that there can be no habitable regions to our left. This makes our next step clear - to go to the right in the search of habitable regions.
   Coming back to our algorithm, if we sort the input transmitters in the non-decreasing order of their coordinates. If we start from the first transmitter, we will traverse the list of transmitters in one direction, along with keeping a track of whether the region we are currently in habitable or not. The only question that now remains is how to keep track of habitable regions. It is impossible to keep a count of all the signals available at every point on the road, as there are infinite points.
                  Point to be noted: All habitable regions will begin from a transmitter's left-most signal limit and end at (possible another) transmitter's right-most signal limit.
Our task is simplified if we take into consideration the above point. To identify a region, we only require its start and end points, and these points will only be the let and right limits of some transmitters. Hence, instead of analyzing all points, we analyze only those points where a transmitters range ends or begins. There are maximum "2n" such points for "n" transmitters.

Algorithm NTransmitters (transmittersList, Integer k):
1) sortAscending(transmittersList) on their coordinates;

2) beginsEnds := list of all the left-limits and right-limits of all the transmitter signals;
3) Traverse only begin and end lists from beginsEnds[0]; //beginsEnds[0] will hold the left-most end of                           the road where a signal is being received. The traversal will not include all the points, just the                             "begin" and "end" points of the list.
4) If beginsEnds[i] = "begin":
   count := count+1
    Else:
         count := count-1
5) If count >= k :
               output region as habitable;

Following is a well-commented Java code for the above algorithm. The only other noteworthy thing is that we have maintained a TreeMap instead of the usual HashMap. TreeMap extends HashMap, but sorts the data on their key value. This is needed so that we start from the left-most begin point. (it will be the one with the least coordinate value). Also, we have maintained 2 separate lists of begin and end to optimize memory requirements, As always, anyone is free to ask any doubts or share any comments in the comments section.