A Simplified System for Analyzing Stop Consonant Acoustics

A Simplified System for Analyzing
Stop Consonant Acoustics

By Zach Polen

ECE 499, Capstone Design

Professor Hanson

3/11/10

Report Summary:Analysis of acoustic waveforms is a complex process that is necessary to understand how different subjects produce speech. This project looks to study the characteristics of medial stop consonants in speech. This helps researchers understand why and how such stop consonants are produced. The current process used for analysis is primarily manual and the steps must be repeated for every waveform analyzed. Speech analysis can require looking at thousands of waveforms, so this analysis process needs to be expedited. We will accomplish this task by creating a software program based system to relieve the researcher of some of the analysis burden.

We will accomplish this goal by creating a system that:

Semi-automated the labeling events using Praat software.
Converted the Praat data into a MATLAB database.
Performed statistical analysis, duration computations, averaged computations and standard deviations of computations.
Is flexible enough that it can easily adapt for studies other than the current one.

This system will simplify the data analysis of acoustic waveforms for speech studies. The researcher will input a directory of labeled waveforms, runa script in MATLAB and an the data desired by the researcher will be output in table form.

Table of Contents

Title PagePage 1

Report SummaryPage 2

IntroductionPage 4-5

BackgroundPage 6-10

Design RequirementsPage 11-12

Design AlternativesPage 13-14

Final Design and ImplementationPage 15-29

Performance Estimates and ResultsPage 30-33

Production SchedulePage 34-37

Cost AnalysisPage 38

User’s ManualPage 39

Discussion, Conclusions, and RecommendationsPage 40-42

ReferencesPage 43

AppendicesPage 44-62

Table of Figures and Tables

Figure 1: A sample labeled acoustic waveformPage 7

Figure 2: Labeled waveform with durations annotatedPage 7

Figure 3: Block diagram of old labeling systemPage 8

Figure 4: Modified data processing systemPage 11

Table 1: List of Speech TokensPage 15-17

Table 2: Sample structure development codePage 19

Figure 5- Block diagram of data extraction processPage 23

Table 3: Segment of Output TablePage 31

Introduction:Currentmethods in place used to analyze acoustic waveforms are complex and require the running of simultaneous algorithms to obtain data. Data is transferred from one computer application to another; this leaves open the possibility for data to be incorrectly transported. There is also the issue of label uniformity because different researchers will label their waveforms with differently. The current system requires many different computer programs to obtain the necessary results. The system is complex, which allows for a large amount of human error. The researcher also has to have a learned knowledge of the system; this is due to the fact that the system is non-intuitive. It requires knowing when to run each script and for what purpose. One needs to be educated on how to use this system and even with that knowledge the room for error is large.

The objective of our project was to simplify the system. By implementing an algorithm in MATLAB, a researcher would be able to use an acoustic recording as an input and obtain an output table of the wanted data. The data from each step of the process will be stored in MATLAB so that there is no room for human error in transferring the data from step to step. There will be no jumbling of data and the system will be intuitive. This takes a 5 or 6-step non-intuitive process and implements a simple one step process.

Another important aspect of this project is the current method used is specific only to one type of analysis. By creating an algorithm that is not case specific, one would be able to use this program to analyze any set of acoustical data. This is important in developing the concept of uniformity. It will take a specific process and make it more general, meaning it can be used in a broader realm of applications.

The rest of this report will break down the process we went through to develop this system. First, there will be a background discussion that describes the current methods and what benefits the new system will provide on a small and large scale. The next section will discuss the design requirements, which summarizes the specifications needed for this project. After this the Design Alternatives will be discussed. From this information the report will explain what alternative was chosen and the process will be discussed in the Design Implementation section. Then the paper will explain the performance estimates and results. The paper will then go over the production schedule in detail. Following will be a brief discussion of the cost analysis. Then there will be a user manual that explains how a user can operate the system. Finally, there is the conclusion section that will explain what was learned and how well the project accomplished our onset goals.

Background:

This study is part of a larger study that examines the contrasting development of speech amongst children. The component of the study that this project deals with is being performed amongst Helen Hanson (Union College), Stephanie Shattuck-Hufnagel (MIT) and Kristen Demuth (Brown University). The component they are dealing with is examining medial stop production by children. The medial stop consonants being investigated in this case are considered voiced or voiceless. Voiced stops produce a simultaneous vibration of the vocal cords, while voiceless stops do not [1]. The medial stop consonants being analyzed are p, t and k (voiceless) and b, d and g (voiced). The study’s purpose is to procure models of speech development in children. There are two theories behind the development of speech. The first theory states that a child initially learns speech sounds and then learns to sequence them into words (increasing in length/type over time). The second theory states that children develop gestural patterns for syllables or words, which become more precise as the child’s speech develops [2]. This study investigates these theories a few different ways. First it provides insight into how language is stored and processed in the brain. Second it can be used to explain how speech utterances are planned and produced. A practical application of such knowledge is to use it to diagnose and treat speech disorders.

The research component performed previously to the medial stop consonant analysis was stop consonant analysis. Researchers were looking to obtain certain durations across children and their primary caregivers (typically mothers). These durations are measured by finding differences in speech events labeled on an acoustic waveform. Figure 1 shows an example of a labeled waveform:

Figure 1: A sample labeled acoustic waveform

Waveforms are labeled manually based upon where certain speech events occur. This is one of the areas of this study where the opportunity for human error is evident, because each label is placed based on the judgment of the researcher. The following figure shows an example of the durations that are measured on the waveforms:

Figure 2: Labeled waveform with durations annotated

From these durations Hanson, Hufnagel and Capotosto were able to find data that was used to determine certain origins of speech [3]. The question arose whether or not the data was valid and could be used to make research conclusions. In order to figure this out the data needs to be averaged and statistically analyzed. The current process to do so is very complex and requires intimate knowledge of the system used. Figure 3 shows a block diagram of this process:

Figure 3: Block diagram of old labeling system

The process defined by Figure 3 is described below:

Label acoustic recordings using TextGrids in Praat. This needs to be repeated for each acoustic recording.
The Computation/Measurement script has to be run for each measurement desired. To do so it is required to know which acoustic labels correspond to specific durations. These labels must be entered manually. (Done by undergraduate researcher at Brown)
The averages and standard deviations for each duration measurement have to be computed using Excel. (Done by Summer Research Practicum student at Union College)
The data then needs to be converted to a form that can be read in MATLAB. Conversion of Excel file to text format file. (Done By Summer Research student at Union College)
Once read into MATLAB the data needs to be put into a form that MATLAB can run a statistical analysis on, specifically ANOVA analysis. (Done By Summer Research student at Union College).
Data is not stored/saved in MATLAB. The data needs to be output in tables that can be accessed using either Word or Excel.

Although this system is works, the complicated steps have left a lot of room for human error. This can is because there is no uniform labeling system that is enforced across labelers (researchers),it requires numerous conversions between data formats, the Excel analysis is not automated and must be repeated. The system also shows a piece meal method that is not derived from one source but a number of sources. A conglomerate of different students and researchers were needed to come up with this method. The goal of this project is to move on from this complex system and have a uniform simple system take its place. This is based on the logic that something is easier to understand if it follows a standard formula and is learned from a single specific source.

There are large-scale effects on society that this project undertook. Since the project is virtual and research based it has no sustainability issue. It can be adapted to whatever future research is done and can be used functionally to serve the research goals. Manufacturability is not an issue. This is due to the fact that all the user needs to use this project is the programs Praat and MATLAB. Praat is a free online program and MATLAB is standard in most research institutions. Ethically this project is also acceptable. The only ethical question here is whether it is acceptable to run these tests on children. (The recordings are from children). In this case the testing is acceptable because it is done using play sessions. In which a child would see a picture or a toy and be asked to identify the object. The system also does not have negative effects on health and safety. No one is harmed in doing this research and from a health standpoint it is a very positive experience. This is evident because speech impediments can be seen as a medical issue. This system will allow for the diagnosis and rehabilitation process for speech impediments. The social aspects of this project are also very positive. It is necessary for humans to communicate in social situations. People with speech impediments are hesitant to do so. The results of using this system can be used to educate children in speech classes and facilitate the elimination of speech disorders. Overall, this is a positive project that helps society and with seemingly no ill effects.

Design Requirements:

The research that uses the data obtained from this project has certain specifications. The current system is an intricate process that was seen in Figure 3. The first requirement that this project was looking to obtain is simplification of the system. This means reduce the amount of time it takes to perform the numerical analyses of the research. With the current method this takes time in the region of hours. The process developed by this system will be able to perform this analysis in minutes. The process as stated in the background section is piece meal and complex. The process developed by this system will not be complex or hard to figure out. The simple input of a directory labeled acoustic waveforms runs through one program and outputs a table of the data wanted. This also limits the number of programs needed to perform the analysis. The old process uses four applications and 3 different algorithms. This system will use two programs Praat and MATLAB and one algorithm. The algorithm is ran in MATLAB and provides the output in 2 steps versus the 5 steps necessary in the old process. Figure 4 shows a sample block diagram of the modified system:

Figure 4: Modified data processing system

One of the other requirements necessary for the proposed system is that it should reduce the risk for human error. By simplifying the process and providing an intuitive interface the project will be able to do this. There is also the issue of label uniformity that needs to be enforced. This project should be able to create a template that does not automate the labeling process, however, it should automate the names of the labels. This should be done through a Praat script that labels each tier of the TextGrid and within tier provides a set number of labels for each speech event. This will reduce human error because then there is no chance for typos when manually labeling each speech event. Another requirement of the system is the ability to store data dynamically. This implies that the system will be able to store data at each step of the process. Not only at the final point. This way in the event of error troubleshooting will be easier to perform. Location of errors will be pinpointed and adjustments will be made directly.

Based on these general requirements of the design we would like our design to be able to achieve the following results:

Semi-Automate labeling of speech events using Praat software
Import the Praat data into a MATLAB database
Perform the statistical analysis of data in MATLAB
Have the system flexible enough so that it can be used for other studies.

Design Alternatives:

Through our research we came up with three design alternatives:

The current method. A piece-meal step-by-step method, which requires running multiple algorithms and converting data from one form to another for each computer applications use.
A completely automated system. This system would label waveforms, read the data into a usable form, compute durations based of waveform data, store data, average durations, statistically analyze data and output data in table form.
A hybrid system. This system will use Praat for labeling waveforms and MATLAB for extracting, storing, analyzing and outputting data.

We decided to use alternative 3. Although the first system is effective it is time consuming and is intricate. There is too much room for human error and is non-intuitive. It is a process one must be taught and can not easily pick up without proper instruction. There is also the need for multiple computer applications and complex scripts. Option two is ideal. However, we found that the labeling process in Praat is effective. There is room for errors in labeling but using MATLAB labeling errors can be corrected. This would also be too time consuming. It would be a process that would require two students to do research. It would require one student to program the code to automate labeling and one student to program the data analysis software. The third option was decided as the most practical approach. This was because both the student and professor had experience with the labeling process. This implies the ability to label waveforms accurately. The student and professor also had experience with MATLAB. MATLAB would be able to perform all the functions necessary to do the data analysis. Data could also be saved dynamically into MATLAB databases using structures. Finally, this was decided as a good method because users on the Union College network have access to MATLAB.

Final Design and Implementation:

The final design considered for the project was alternative 3. This was a hybrid system that used Praat for labeling of acoustic waveforms and MATLAB to process the data from the waveforms. The first thing done was testing automating the labeling system in Praat. Since neither the student nor instructor had knowledge of coding in Praat we were unable to accomplish this task. After this was determined, the next step was to manually label a small database of acoustic recording’s medial stops. In this database we made sure to have every possible voicing and place condition for an adult subject and child subject. This meant that there was voiced labial, velar and alvelar medial stop for each subject as well as, a voiceless labial, velar and alvelar medial stop. Each acoustic recording’s waveform had its specific speech events labeled. Table 1 shows a list of the speech events and their corresponding labels:

Tier 1) Word: This denotes the interval corresponding to our target word.

Tier 2) Noise: is broken down into separate noise tokens,

wd-ons_beg: beginning of noise related to the word onset coda either an aspiration or breathy voice

wd-ons-end: end of the noise related to the word onset coda