slide 2--- Voice Activation Introduction

Voice Activation Programming is becoming more prolific.

There are many potential applications of this technology.

Some examples are handicap assistance, the medical profession, data retrieval services, and device manipulation; such as voice commands using a GPS (global positioning systems) in a car

In November 2004, Microsoft announced that is voice command software would let users of gadgets, running the Pocket PC operating system, issue spoken commands to get calendar and contact information, make phone calls, access applications, and perform other basic functions. The software could also be used with the handheld version of Microsoft's Media Player to allow voice-activated control of digital music playback.

There are new applications of this technology found everyday in our society; from voice activated ATMs to automated voice mail telemarketers.

slide 3--- Project Goals

The goal of this project was to write a voice activation interface for the Microbot TeachMover robotic arm.

This was accomplished using Visual Studio C++ and the Microsoft Speech SDK (software developer’s kit)

slide 4--- PC Hardware

This was a software design project, which required a standard PC that had Visual Studio installed

A standard headset with microphone was used to interface the soundcard with the software.

The digital sound processing used in this project came from existing standards established in most sound cards and the macro functions in the Speech SDK.

slide 5--- Robot Hardware

The robotic arm used is the Microbot TeachMover II developed by Questech, Inc.

It has 4 joints; a base, shoulder, elbow, wrist and a gripper.

Each joint has a corresponding electric stepper motor used to move the joint accordingly

The robot interfaces with the PC through a db9 serial port (com1) using the default protocol of 9600 baud ,no parity, 8 databits ,1 stop bit (96,n,8,1)

slide 6 --Software

The Software packages used in this project were Microsoft’s Speech SDK 5.1 and

Microsoft’s Visual studio 2003 and 2005. C++ was the primary language used.

slide 7 –SDK 5.1

The Microsoft Speech SDK, software developers kit, contains the Microsoft Speech application programming interface (SAPI), the Microsoft continuous speech recognition engine and Microsoft concatenated speech synthesis (TTS or text-to-speech) engine.

It also has a collection of speech-oriented development tools for compiling source code and executing commands, tutorials and documentation on the most important SDK features.

slide 8 –SAPI

I used the SAPI and the Speech Recognition engine from the Speech SDK.

All of the speech recognition logic used by my implementation was taken from different examples in the SDK.

The first step was to initialize the COM, component object model, by creating an instance of the object. COM is to C++ as DLLs are to C.

The COM needs to be active throughout the application’s session.

The next step was to create the Recognizer object in order to recognize that speech was being invoked.

The Recognizer object is the interface that sets a recognition notification sequence.

This object provides access to the Speech Recognition engine.

When an event is recognized; the speech record interest gets set into recognition status, the default audio input is created and the input to the speech engine is set.

Once the recognizer alerts the speech engine that it’s active, a grammar is specified and loaded.

The next step is event recognition. A recognition context gets set for the speech engine.

A context is any single area of the application needing to process speech

Once the engine is active and processing speech one of three things can happen; an event, notification or interest.

An Example of an event is when a sound is first detected on the microphone, or when it successfully completes word recognition

Notifications occur concurrently with events, they alert when a SAPI event occurs. Notifications can be used in error checking to make sure events occur properly. By default SAPI send about thirty different events and notifications back to the application.

Interests are used as filters to select specific events and discard the rest.

In the case of this project there was only one event that was of great interest and that was the recognition event.

When an event is considered to be successful, there is a positive word match between the recognized event and the loaded grammar. The next step is to output the recognized event to the 32bit console.

slide 9 –Console

The next step was to output the recognition event to the 32bit Window’s console.

I used a dialog box to receive the recognized text and created 2 radio buttons on it for windows control; Send and Exit.

The program loops the speech engine looking for additional recognition events until either the send or exit button are depressed.

The user must determine if the correct 2-word command phrase has been realized in the dialog box before pressing the send command to execute the robot code.

When the send radio button is depressed the last correct recognition event is written to a text file. This file is then interpreted by the robot logic block.

If the exit button is selected the loop will end and the program will shut down.

slide 10 –Robot Code

The Robot code was written in two distinct logic blocks, the pattern recognition block and the pick and place operation block.

slide 11 –Pattern Matching

The pattern recognition block interprets the outputted text file from the voice recognition sequence, and breaks it down into individual characters placed into an array.

Pattern matching becomes a simple character search upon the elements within the array.

There are two commands used, find and the final position, which consist of four elements; queen, bishop, horse and castle and an accompanying number descriptor from one to eight.

I gathered positional co-ordinate data from the chessboard and assigned these values to the appropriate pattern matching instances.

slide 12 –Pick and Place operations

The next logic block was for the pick and place operations for the Microbot.

The Microbot’s home position was calibrated using a glass block.

The endefactor’s position and orientation were described with three-dimensional position co-ordinates, x, y and z and three orientation descriptors, pitch, roll and gripper (P, R and G).

This means that the Microbot has 5 degrees of freedom and a gripper value.

This system also needs to keep track of the chess piece. Co-ordinates are read and stored into another text file.

Once the appropriate command was realized in the pattern matching sequence a system steps were sent to the Microbot.

slide 13 – specific Pick and Place logic

The logic employed to send a specific change to the Microbot was;

load the home file (endefactor location), calculate joint angles of the Microbot from the positional data, load the piece location co-ordinates, calculate the joint angles, compute angle difference, calculate steps to effect angle displacement, store and write all new calculations of position and orientation for further action, send steps to the joint motors.

The Microbot received steps to individual motors and then moved to the appropriate co-ordinates. The command logic put together a series of these moves to affect pick and place operations.

For Instance the find piece logic will hover the endefactor over the piece, open the gripper, drop the gripper down on top the piece, close the gripper and then pick up the piece. Totaling five distinct series of steps sent to the Microbot.

slide 14 – Kinematics

When the command sequence recognizes a positional command, it reads the status file for its current position and needs to convert this positional data into angular data.

A series of mathematical calculations called Kinematics are used.

If one wants to convert positional data into angular data, Inverse Kinematics are employed.

The goal is for the Microbot to realize changes in co-ordinate data, but it can only change its joint angle positions, the Inverse Kinematics convert positional data into angular data.

This is accomplished by a series of mathematical steps written in the Microbot manual transcribed into the C++ code by me in Dr.Lucks robot class 418.

Once the angular displacement has been calculated it is multiplied by a step value specific to the joint of the Microbot.

This numeric value is a floating point value and must be converted into an integer for the specific number of steps for the joint’s stepper motor.

From time to time a truncation may occur, causing some error. Because of this the Microbot requires frequent calibration.

slide 15 – Temporary text files

Throughout the entire program temporary text files are used.

The start file is the stored text command from the speech recognition sequence.

The home file has the home position for the endefactor

The status file is the existing position for the chess piece

and the counter file keeps track of the number of steps sent to the Microbot.

I wrote a batch file to reset all of the files to their default starting values.

slide 16 – Project Includes

A pre-existing serial communications package was used to send steps to the Microbot. This package was created in Visual C++ and consisted of files that handle communications between the computer’s (COM1) and the Microbot.

They are the serial.cpp, the interface.cpp and their header files

These files define the structure for the Microbot class and constructor prototypes for potential kinematic functions.

The code that I created did not use any of the existing function prototypes.

The SAPI include file <sphelper.h> was all that was needed in order to use the SR engine. It links to about 15 other files with internal macros that handle all of DSP and SR.

slide 17 – Analysis

The development cycle was split into three distinct phases;

the Voice recognition phase, the Pattern recognition phase and the Microbot logic phase.

Writing the voice activation portion of the code proved to be a lengthy and difficult process.

The tutorials in the SDK were extremely helpful yet complex.

To improve this project a specific grammar could have been written for the command sequence. Instead I used a standard dictation grammar.

I consider this project to be version 1.0, because I designed it from its inception. There are many potential add-on projects.

One could write a heuristic algorithm that would let the program know if a specific chess move was valid or not.

slide 18 – Conclusion

The application that was developed for this project accomplished the goals that were set out in my proposal. A voice-activation interface was created to move the Microbot.

This project integrates kinematic code with a speech recognition engine in a 32bit interface.

There are many potential applications of using speech recognition with robotics.

Robot arms are used in many industrial settings and specific voice command sequences could be used as fail safe shutdowns.

The applications for the disabled and handicapped population are endless. A paralyzed person without control of their arms could activate a robotic arm with speech recognition. This could provide a more self-sufficient existence.

In the future robots will become more prevalent within general society. Their value has already been proven in industry.