Contents Introduction Types of Speech Recognition Human Ears Working Process of speech recognition Feature Extraction Acoustic and Language model Applications Advantages Disadvantages Conclusion References
Introduction Speech Recognition is also known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task. It is the translation of spoken words into text.
Types of Speech Recognition Speaker dependent Speaker-dependent software is commonly used for dictation software. Speaker-dependent software works by learning the unique characteristics of a single person’s voice, in a way similar to voice recognition. New users must first “train” the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few utter before they can use the speech recognition software.
Types of Speech Recognition Speaker independent Speaker-independent software is designed to recognize anyone’s voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems — where businesses can’t ask callers to read pages of text before using the system. These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems.
Listening Anatomy  Articulation produces sound waves which the ear conveys to the brain for processing
Process of Speech Recognition
8 Feature Extraction It means capturing important qualities while discarding unimportant and distracting features. Sources of Variability in Speech (a) Speaker (b) Microphone (c) Pitch (d) Environment
Acoustic Model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.
Language Model Language modeling is used in many natural language processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence. In American English, the phrases "recognize speech" and "wreck a nice beach" are pronounced almost the same but mean very different things. A statistical language model is a probability distribution over sequences of words. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
Accuracy Error rates increase as the vocabulary size grows. Vocabulary is hard to recognize if it contains confusable words. Isolated, Discontinuous or continuous speech. Word error rate (WER) is a common metric of the performance of a speech recognition . Wiki
Tool Used For ASR SPRAAK (Speech Processing, Recognition and Automatic Annotation Kit) open source speech recognition package. efficient decoder in a proven Hidden Markov model (HMM) architecture. SPRAAK uses 'scons' (a Python build-tool)
Advantages Documents can be generated up to three times as fast with speech recognition as they can if they are typed. People with disabilities Reduce errors i.e. it eliminates spelling problems. Reduces businesses' labour costs in call centre by allowing them to reduce the size of the staff on duty by replacing workers with Speech Recognition Software. Speech recognition technology can also replace touch-tone dialing . Not necessary to sit at a keyboard or work with a remote control.
Disadvantages Difficult to build a perfect system. Filtering background noise is a task that can even be difficult for humans to accomplish. Every human being has differences such as their voice, mouth, and speaking style. Words are to be spoken clearly and loudly. If the microphone is used, then it should be close to the user.
16 Applications 1. Dictation 2. In-car systems 3. Robotics 4. High-performance fighter aircraft 5. Voice dialing 6. Ok Google 7. Voice Security System
Conclusion The world of speech recognition is rapidly changing and evolving. Early applications of the technology have achieved varying degrees of success. The promise for the future is significantly higher performance for almost every speech recognition technology area, with more robustness to speakers, background noises etc. This will ultimately lead to reliable, robust voice interfaces to every telecommunications service that is offered, thereby making them universally available.
18 Future Scope We can design and implement speech recognition and rectification system for articulatory handicapped people which will be a great work for society. And hence we can reduce the speech communication problems faced by articulatory handicapped people in their day to day life. ASR tools available today are much affected by noise present in the surrounding and thus produces less efficient result.
References  Parwinder pal Singh And Er. Bhupinder singh” Speech Recognition as Emerging Revolutionary Technology”, Volume 2, Issue 10, October 2012.  Sanjivani S. Bhabad And Gajanan K. Kharate “An Overview of Technical Progress in Speech Recognition”,Proc. IEEE,Vol. 3,Issue 3 ,Mar. 2013.  L. R. Rabiner, “APPLICATIONS OF SPEECH RECOGNITION IN THE AREA OF TELECOMMUNICATION”,Proc. IEEE, Vol. 82, No. 4, pp. 199-228, Feb. 1994. http://www.hyoka.koho.titech.ac.jp/eprd/recently/research/research.php?id %3D386%26page_lang %3Den&h=426&w=650&tbnid=fI32gnA61A4vOM:&docid=v2dgmilrU_mkbM&ei=Y3BVt jnCsvK0ASzsrTwAw&tbm=isch&ved=0ahUKEwiYpfrDifrKAhVLJZQKHTMZDT4QMwg- KBcwFw