Sunday 8 May 2016

Translate Numbers to Their English Phrases


Problem Statement:
                    Write a program that takes an integer between 0-99,999 as input and output its English phrase.
For example :

             Input: 77890
             Output: Seventy Seven Thousand, Eight Hundred, and Ninety

Breaking down the problem:
              Such programs are commonly used in NLP to communicate with a user. They rarely use complex data structures or the programs are rarely complicated by themselves. They mostly are a chain of if-else conditions that handle particular cases. However, the sharpness of the programmer is tested here in the sense that he/she should handle all the possible inputs.

              If one observes above output, the commas and the word "and" are not associated with any digit in the number. However, according to the problem statement, they are very important. Such elements of the program, which need to be added for the sake of making the output "look better" or "closer to English" are often the main hurdles.
             
              Hence, there are 2 main sub-problems here. The first is to interpret the digits and convert them into their English words (taking care of their place in the number like the unnits place, tens place, etc.) and second is to take into account all the commas and the word "and" along with their placement.

Approach:
              The main hint here is the range of inputs - from 0 to 99,999. If one observes carefully, the last two digits and the first two digits always make one phrase in English. In the above input, the last digits "90" form Ninety and the first two digits "77" form Seventy Seven (along with thousand - their place value, but thats easy to predict). Hence, these will be considered in pairs. The third digit i.e. the digit in the hundred's place, is always alone. There is always a "Seven Hundred" or a "Three Hundred". There is never a "Seventy Three Hundred". Hence, logically, this is to be considered alone.

               Our first step should be to teach our program the English words, starting with the basics. The program needs to know that 0 becomes "Zero" and 1 becomes "One" and so on. Additionally, it also needs to know that 2 in tens place (or in the higher position whenever pairs are considered by the above logic) becomes "Twenty", 3 in similar position becomes "Thirty" and so on.

                English is a funny language. It is widely regarded as the toughest language for a program to learn, because of its sudden and seemingly illogical variations. For example, 11 has to become "Eleven" and not "Onety One". There is no reason why it cannot be "Onety One" for the sake of symmetry. But, such is life, and hence our program has to now learn about the phrases for the numbers from 11-19.

Trivia: The oldest Indian language, Sanskrit, is widely regarded as the easiest language for a program to learn owing to its symmetry.

                The next step is to implement the above approach. This is easy enough to do with a nested if-else tree. However, identifying the boundary conditions and all the cases and accomodating them was a pain to the author himself. For example, it is easy to overlook a small flaw in the above approach. If we take digits in pairs and check for words for them in an array, it becomes difficult for numbers with 0 as the lower number. For example, 77 becomes "Seventy Seven" by 70 is just "Seventy" and not "Seventy Zero". Hence, we need to make the 0th index of our array "unitsPlace" as an empty string. However, another issue now arises - how to handle if the input itself is just 0? This can be handles by adding yet another condition which checks if the input is 0, if it is - it simply prints "Zero" and exits. Any other 0s that might come will be translated to the null string.

                  The final step for ", and" is added to check if a phrase is not left hanging with those words as its last. There needs to be some word after "and" or else the program has some error.

No comments:

Post a Comment