SHBT Quals 2007 Question # Student #5

Question I

Part A.

1. The woman’s symptoms could be due to otitis media (ear infection), noise-induced hearing loss, blockage from wax buildup in the ear canal, blockage of the Eustachian tube from a sinus or respiratory infection, or a run-in with ototoxic medication which could have caused temporary or permanent hair cell damage.

2. The audiology department has many tests at its disposal to distinguish between possible etiologies. An audiogram is useful for determining the extent of the hearing loss and which frequencies are affected, especially if a past audiogram is available to compare past performance. Additionally, since bone-conducted thresholds can be compared with air-conducted thresholds, it could be possible to rule out causes like noise-induced hearing loss that imply sensorineural problems (if there is an air-bone gap) or causes like wax blockage that imply conductive hearing loss (if there is no air-bone gap).

Weber and Rinne tests would also reveal information about conductive vs. sensorineural hearing loss, as well as laterality of the hearing loss. The results are less informative those of an audiogram, but they have the advantage of being quick screening tests that can be performed much faster than a formal audiogram.

Acoustic immittance measures such as tympanometry and middle ear reflex testing help to resolve where along the acoustic pathway the problem lies. Tympanometric tests evaluate the movement of the eardrum, detecting stiffness. Measuring acoustic reflex thresholds allows the audiologist to look for abnormalities in the contraction of the middle ear muscles.

Testing for otoacoustic emissions (OAEs), spontaneous and evoked, allow the audiologist to determine the status of the patient’s cochlear amplification system, that is, the “cochlear amplifier” that is regulated by the functioning of the outer hair cells. Irregular or a lack of SOAEs would imply a sensorineural problem and not a simple blockage of the ear.

Test of auditory brainstem response (ABR) and cochlear microphonic (CM) focus on sensorineural etiologies. These tests can confirm a normal or abnormal hair cell response.

3. Otitis media, wax blockage, or Eustachian tube blockage: The audiogram and Weber/Rinne tests would display an air-bone gap in the left ear, with a magnitude of approximately 30 dB of hearing loss. Tympanometry would indicate [[[]]]

Noise-induced hearing loss or ototoxic medication: The audiogram and Weber/Rinne tests would indicate no air-bone gap, with bone-conducted sound thresholds just as bad as air-conducted sound thresholds in the left ear. Tympanometry would be normal. Depending on the degree of noise/drug exposure, SOAEs would be diminished or abolished, a result of OHC damage. If IHCs were damaged, the ABR/CM measurements would be abnormal; ABR thresholds would be elevated, and the CM would not reverse polarity when responding to clicks with alternating condensation and rarefaction.

4. This patient could receive some benefit from a hearing aid if there are enough working hair cells left in the cochlea; that is, if the permanent hearing loss in the left ear is due to a conduction problem and not due to cochlear cell death. However, permanent loss suggests a sensorineural deficit, in which case the hearing aid would be amplifying sounds for a cochlear that could not detect them, rending it less useful for the patient. In this case, a cochlear amplifier, which delivers sounds directly to the auditory nerve and bypasses the cochlea, might have a better expected outcome.

Part B.

1. The right ear exhibits increased thresholds for air-conducted sound. Measuring the bone conducted threshold allows us to rule out conductive hearing loss caused by imperfect conduction of sound through the middle ear; if bone conducted hearing is fine, then the sensory cells of the inner ear are fine, and the blockage in hearing is due to the middle ear. If, however (as is shown to be the case), the bone conducted thresholds are just as low as those of air, it means that the vibrations reaching the cochlea do not excite it and the problem lies with the sensory cells of the inner ear.

2. Masking was used to avoid threshold improvement by virtue of sound reaching the contralateral ear. An intense air-conducted sound can vibrate bone, allowing the vibration to reach both ears through bone conduction through the skull (when we really only want to measure the response in one ear). Masking noise interacts with the target sound in the contralateral cochlea and eliminates detection.

Bone thresholds need to be masked to at all frequencies because the sound is conducted across all the bones of the skull and thus automatically reaches both cochleae through the bony shell. Air conducted thresholds only need to be masked at high frequencies, when the target tone is made so loud (65-90 dB) as to potentially vibrate the bone and reach the contralateral cochlea.

3. (iii) This person’s speech understanding is significantly impaired. This person has normal thresholds for sounds up to 500 Hz, but then a steep dropoff. Sounds from 1-4 kHz are very important for speech understanding

4. Yes, my answer would change to (ii): impaired for speech in a noisy background. Lower thresholds at 1500 and 2000 Hz would make a large difference in speech understanding, enabling this person to hear out vowel formant transitions. Only higher-frequency sounds such as fricatives would be difficult to distinguish from each other.

5. This person’s audiogram indicates complete loss of high frequency input to auditory cortex. Since no sound responses are reaching left auditory cortex from 2-16 Hz, I would expect the tonotopy of A1 (primary auditory cortex) to shift downwards, devoting these areas to processing of lower frequency sounds. The new cortical map would reflect larger representations of lower-frequency sounds; that is, the neurons that had previously responded best to high-frequency sounds would decrease their best frequency. The final result is an expanded cortical representation of low frequencies (below 1000 Hz) across cortical centers, with a shrunken neural representation for high frequencies that are no longer getting any input.

6. The brain is plastic, able to adapt to altered inputs, but to do so it requires Hebbian learning: persistent and repeated stimulation over time. Consistently modified inputs over a long time scale are essential for cortical reorganization; if hearing loss was temporary, and if normal (high-frequency) inputs were made available again to cortex, no reorganization would occur.

Sources:

1. HST.724 clinical notes, March 11, 2006

2. Immittance Audiometry (Audiology services at the Washington University in St. Louis School of Medicine): http://oto.wustl.edu/audiology/immit.htm


Question II

(a) The name is Mary (or, at least, this name is consistent with the given spectrogram!). The name is uttered from approximately 700-1200 ms. It begins with a weak nasal murmur (low frequency energy) and exhibits rising formants out of that nasal, particularly F2, implying an [m] as the initial segment.

The first vowel (from 850-950 ms) has an F1 of about 500 Hz and a rapidly-moving F2 that sweeps from 1500 Hz to 2000 Hz and back again. A likely candidate is the vowel [e] pronounced by a male speaker (consistent with the F0 in this utterance, which is around 125 Hz).

There is a falling formant transition from the vowel to the next consonant. At about 925 ms, there is evidence that F3 is beginning to plummet, evidencing a possible [ɹ]. However, F3 is not visible between 950-1050 ms, making it difficult to discern if it has dropped down for the [ɹ] or if it is simply weak. Semivowels are particularly difficult to tell apart, so I am least confident about this segment, but the only other likely possibilities ([l], [w]) do not form a recognizable name.

Finally, the name ends in a high front vowel, mostly likely [i], as evidenced by the extremely low first formant and high second formant.

Broadly, there is no evidence of frication noise (no high frequency non-periodic energy) throughout the signal, ruling out the presence of obstruents. Voicing seems to be maintained throughout the name, only ceasing at the closure of the [t] in “today.” The evidence points towards the name “Mary” [meɹi].

(b) The length of the speaker’s vocal tract can be estimated from F3, the frequency of the third formant. Modeling the vocal tract as a tube closed at one end, F3 corresponds to the third resonance, or 5c/4l. Since F3 hovers around 2750 Hz throughout the vowels of this utterance, the length of the vocal tract is approximately:

l = 5c/4(F3) = (5*35400)/(4*2750) = 16.1 cm [for c = 35400 cm/s]

(c) When a speaker produces an [s], the tongue is raised to the alveolar ridge, creating a constriction which divides the oral cavity into front and back regions. The frication noise source at the constriction is filtered by the resonant characteristics of the front cavity. There is a peak in the noise at approximately 3.7 kHz, corresponding to the natural frequency of the length of the front cavity. Using c/4l as the first resonance,

l = c/(4*3700) = 2.4 cm

(d) This peak and valley are probably due to subglottal resonances: that is, the effect of the coupling between the vocal tract and the subglottal cavities (trachea, bronchi, and lungs). This coupling introduces pole-zero pairs that can be computed by finding the natural resonance of the fixed subglottal configuration of branches and sub-branches alone. The subglottal system has a prominent pole around 1550 Hz for an adult male speaker[CITE], corresponding closely to the small peak at 1600 Hz evident in this spectrum. In addition, the zeros of the transfer function will be close to the poles, explaining the valley nearby at 1400 Hz.

(e)

i. The frequency of this EL source can be measured by counting the striations on the spectrogram, which correspond to pulses in the source. There are 11 pulses during a 100-ms interval, placing the fundamental frequency at approximately 110 Hz.

ii. The source-filter model states that the source function, filtered through the vocal tract transfer function and the radiation characteristic, yields the output spectrum. Assuming similar vocal tract filtering parameters for both versions of the utterance “saw,” differences in the output spectrum will be largely due to the differing frequency components in the source spectrum. (There will be a few expected filtering differences due to the shorter vocal tract for the supraglottal EL source, as well as those due to pronunciation variations.)

In this case, there is more high-frequency periodic energy in the EL speech waveform (~20 dB as compared to ~5 dB in the range from 4-5 kHz). The harmonics of the voiced source thus appears to drop off more rapidly than those of the EL source. This could be due to the open/close phases::: I don’t know exactly how the buzz source is generated, but there would be sharper discontinuities to account for the high-frequency periodic energy. The electrolarynx is a more regular source than the vibrating glottis and there is less loss than what is due to closure of the glottis.


Normal (glottal source): slower opening; the closure is the main excitation of the vocal folds. F0 = 125 Hz
EL: mechanical transducer generates buzz sound waveform. F0 = 110 Hz

The EL source has harmonics but they are sinusoidal(?) and do not have the slow rise and quick fall of the glottal source during the open phase of the cycle. Fixed F0, No subglottal resonance due to lung cavity and trachea, nulls in the spectrum from EL-vocal tract transfer function, narrower bandwidths of formants = no energy losses due to internal structures (glottis/subglottal structures) -> harsh, unnatural

EM: No coupling to the subglottal system because the glottis is not open; instead of the glottal source, the EM source is used

iii. I think it would be difficult for a listener to understand the sentence. Although there are very obvious formant movements at appropriate places (transitions between vowels and consonants), the consonant onsets are not well-defined. Additionally, there seems to be voicing (buzzing) going on throughout the entire utterance, even during “voiceless” stops such as the [t] in “today.” The formant values for the vowels in “today” look very similar to each other, even though one should be a [u] and the other a [ei].

(f) Accelerometers are often used in laboratory settings, attached to the neck to capture the subglottal pressure waveform.

This peak and valley are probably due to subglottal resonances: that is, the effect of the coupling between the vocal tract and the subglottal cavities (trachea, bronchi, and lungs). This coupling introduces pole-zero pairs that can be computed by finding the natural resonance of the fixed subglottal configuration of branches and sub-branches. The subglottal system has a prominent pole around 1550 Hz for an adult male speaker[CITE], corresponding closely to the small peak at 1600 Hz evident in this spectrum. In addition, the zeros of the transfer function will be close to the poles, explaining the valley nearby at 1400 Hz.

Sources:

1. Stevens K. Acoustic Phonetics. MIT Press 1998.

2. de


Question III

(A)

When auditory-nerve fibers enter the cochlear nucleus, they bifurcate into an ascending branch that terminates in the anteroventral cochlear nucleus (AVCN) and a descending branch that projects to posteroventral nucleus (PVCN) and ends by terminating in the layered dorsal cochlear nucleus (DCN). Cells innervated by the ascending branch include globular cells and spherical (bushy) cells, both of which inhabit the AVCN. Cells innervated by the descending branch include octopus cells and fusiform (pyramidal) cells of the DCN. [1]