Write up on the FDP held on 27th December, 2014
Presenter: Ms. ShwetaBansal
(Asst. Prof. of Engineering Department)
The Presenter gives her Presentation on the topic “Speech Processing”. In this topic she defines speech as “when we as humans speak, air pass from our lungs through our mouth and nasal cavity, and this air stream is restricted and changed with our tongue and lips. This produces contractions and expansions of the air, an acoustic wave, a sound. The sounds we form, the vowels and consonants, are usually called phones. The phones are combined together into words. Speech signal refers to the analog electrical representation of the contractions and expansions of air.”
She includes processing of speech in the following way:
Speech recognition (speech to text)
Speech synthesis (text to speech)
Speaker identification (identify the person who is speaking)
Speech Recognition is hard due to the following reasons:
Digitization: Converting analogue signal into digital representation
Signal processing: Separating speech from background noise
Phonetics: Variability in human speech
Phonology: Recognizing individual sound distinctions (similar phonemes)
Lexicology and syntax: Disambiguating homophonesFeatures of continuous speech
Variation among speakers due to
Vocal range (f0, and pitch range)
Voice quality (growl, whisper, physiological elements such as nasality, adenoidality, etc)
ACCENT!!! (Vowel systems, consonants, allophones, etc.)
Variation within speakers dueto
Health, emotional state
Ambient conditions
Speech style: formal read vs. Spontaneous
Identifying phonemes:
Differences between some phonemes are sometimes very small
Mismatched Phonemes
Parameters of ASR
Different types of tasks with different difficulties
Speaking mode (isolated words/continuous speech)
Speaking style (read/spontaneous)
Enrollment (speaker-independent/dependent)
Vocabulary (small < 20 wd/large >20kword)
Speaking Mode
Isolated speech - the speaker has to speak word-by-word into the system.
Connected speech - the speaker can speak a number of words without stopping.
Continuous speech - like human.
Audible Range (Hearing Range)
Humans can generally hear sounds with frequencies between 20 Hz and 20,000 Hz(the audio range or hearing range) although this range varies significantly with age, occupational hearing damage, and gender. The majority of people can no longer hear 20,000 Hz by the time they are teenagers, and progressively lose the ability to hear higher frequencies as they get older.
Most human speech communication takes place between 200 and 8,000 Hz and the human ear is most sensitive to frequencies around 1,000-3,500 Hz.
Sound above the hearing range is known as ultrasound and that below the hearing range as infrasound.
Some Software of speech recognition
Dragon Naturally Speaking
Speak Q
Microsoft Accessibility
Dictate (MAC Product)