Speech-Enabled Websites

SPEECH-VOTING SYSTEM

Project Work, IP Lahti, March 2010

Abstract

This project work aims to develop a speech-enabled voting system. The students will use the IBM Voice Websphere Toolkitand the IBM Websphere Voice Sever SDK as speech recognition and speech synthesis engines and certain knowledge in Voice XML, PHP and SQL and in creating JSGF grammar files. Students can apply any other knowledge in web programming they have.

Task Description

The main goal of the project is to develop a speech voting system in which the citizen can use his/her own voice to access the system an vote or retrieve the results of a votation.

Thus the project is structured in three different subtask and the students are expected to do as much as possible of them, according to the difficulties they may find during the process. Of course, it is completely open to the creativity of the team:

Development of a authentication speech-based stage in order to provide access to the system, only for authorized users. In this case, citizen who have right to vote.
Development of speech-based dialog with which the user can vote for the corresponding party.
Development of a speech enabled applicationwith which the user can retrieve the results of the voting after the counting process.

Currently, the UZ Java Applet only works in Spanish, so the Spanish speaking students will have to translate all the group ideas to Spanish synthesis and grammars.

Common Elements

The common elements that the students will have to face in the three proposed tasks will be: Creating the database, php scripts and the VXML based dialogs and JSGF grammars.

The possibilities inside a grammar are:

<element>: Defines an element to be substituted in a posterior definition
(word1): Defines a word or chain or words as they have to be pronounced by the speaker
{tag1}: The tag which will be given as an output of the system

The syntax also uses the elements:

[]: Whatever is kept in brackets might be said or not by the speaker
*: Whatever is followed by the asterisk might appear 0 or more times
+: Whatever is followed by a plus sign might appear 1 or more times
|: Vertical bar works as an OR operator.

A prompt has to be presented to the user (synthesized audio and image, with the possibility of text) and the recognition system has to take the user’s utterance. If the recognizer indicated it’s correct (very simple evaluation), a certain possitive feedback has to be given and the next prompt has to appear.