Status Report 1

May-June 2006

Dated Monday, July 3rd, 2006

Authored by all EASAIER partners

Edited by Josh Reiss

WP1

The Consortium Agreement was signed by all partners.

The contract has been received. QMUL are currently awaiting accession forms from partners before sending the signed contract back to the commission. QMUL have currently received 3 forms, but one was from another UK partner and 1 was incorrectly completed. Other than that, all administrative documents are in order.

EASAIER had its kick-off meeting, May 18th-19th, 2006. It was attended by all partners, plus the project officer Claude Poliart and Almut Boehme from the Expert User Advisory Board.

QMUL have been in touch with partners on a regular basis regarding staffing. RSAMD are currently recruiting personnel, and QMUL have agreed to assist them, e.g., joining interview panels. DIT have re-oriented staff within their Audio Research Group around the EASAIER project. Since NICE’s research plans are mutually related to those of EASAIER, NICE have increased the amount of researchers internally allocated to EASAIER, but without any change in the budget. This means that they may exceed the requirements for 50% co-financing.

WP2

LFUI are in the process of helping domain experts to build up a music and audio ontology based on standard ontology engineering. QMUL have supplied the partners with their ontology, as described in Abdallah, Raimond, Sandler, “An Ontology-based Approach to Information Management for Music Analysis Systems”, AES120, 2006.

ALL have collected and reviewed speech and music relevant ontologies. ALL evaluated LFUI’s ontology editor of DERI called DOME. ALL have collected and assessed user requirements influencing the expected ontology.

WP3

ALL have collected the draft specifications of all tasks in this WP. Work has already begun on the music retrieval and speech retrieval systems.

Music retrieval- A requirement specification, an example interface for music queries, a technical description of the Music Retrieval task and an overview of the current system have all been provided and disseminated to the partners.

Speech retrieval- Performance of the speech retrieval system is currently being enhanced. ALL have organized a set of trials and expect moderate performance enhancement form these trials. They have begun to embed a morphological dictionary, which may significantly improve the performance of the phoneme level recognition.

Vocal query interface- The increased performance of the phoneme level recognition, as described above, has an additional effect of boosting the performance of retrieval based on vocal queries.

WP4

Segmentation- NICE have implemented a simulation in Matlab for speech detection. The main descriptors are based on spectral characteristic of the audio signal. ALL have also tested and enhanced their speech separation algorithm.

Word Spotting. NICE have begun new research on optimal Auto Selection of the Working Point in Word Spotting Systems. In most modern word spotting systems, each recognized word is usually spotted together with a certainty level of the outcome. NICE have begun research on improving this technique by taking a different certainty level for every word in the searched lexicon. In addition they are trying to improve the indexing speed and the search speed by code/IO optimization.

Speaker object representation - Speaker retrieval is a unique task from the speaker recognition (detection) applications world. An efficient way to represent a speaker must be addressed so that it can enable fast reliable speaker identification and retrieval. In order to get highly concise speaker object representation we discard non-speech segments and allocate only single speaker utterances. The research will include developing methods for source segmentation (speech/non-speech), robust feature extraction, discarding outlier segments, pre-processing and speaker segmentation. High level characteristics of the speaker (gender, speech rate, pitch, emotion) are also important for fast speaker retrieval and NICE will engage in the research.

The speaker object representation and speaker retrieval will be based on open-set speaker identification (OSI) technology. The most promising state-of-the-art speaker adaptation methods of speaker model training will be utilized. The research will include developing a robust open-set speaker identification engine and handling the degradation problem due to the large target set.

NICE have completed implementation of 2 applications: Speaker verification and Identification. These are based on a speaker recognition engine, which generates a speaker voice model and calculates a conditional likelihood score to each voice sample. Speaker retrieval will be deployed on large sound archives. Speaker identification is highly CPU consuming and it is not applicable to speaker search on large sound archive. The research should attain efficient low CPU consuming engine, with only minor performance degradation.

Music Transcription - DIT have also developed a prototype system for polyphonic chord recognition based on comb filters. More recently we have begun testing an algorithm capable of detecting ‘novelty’ in audio signals. Like existing approaches, the algorithm uses a self similarity matrix and attempts to find parts of the signal which are uncommon within a given data set. Initial testing has indicated that it may be useful for segmentation when the audio content contains interruptions such as applause in the case of live performances.

WP5

Source separation - Work has begun on optimising and porting some of DIT’s existing algorithms for real-time use within the EASAIER framework. Upon the advice of our partners, Nice systems, we have decided to use the Intel IPP library (Integrated Performance Primitives) as our foundation class for audio processing routines. The IPP library is a highly optimized library of DSP functions for multimedia and data processing applications. This should save significant time when moving from our prototype designs to functional user applications In addition to optimising some of our 2 channel routines, we have initiated some research in the field of single channel source separation.

Since there are multiple audio processing partners, DIT have been in regular contact with our WP partners in order to make sure that there is no unnecessary duplication of work.

Sound enhancement- ALL have specified a set of preparation processes which can clean the speech before phoneme level recognition.

WP6

Issues were identified concerning system architecture, API and specifications. A working group was thus established to focus on system architecture and related issues. This is a highly selective group involving only core researchers heavily involved in programming and software engineering. The working group has begun communication, with SILOGIC issuing an initial document on the issues.

The coming months will also see more interaction with Silogic, who will be responsible for the user interface design, and the other partners. The interfaces should be simple yet give the user access to all the functionality of these complex processing algorithms. Possible mechanisms to encapsulate this idea will be discussed over the coming weeks.

WP7

There was a meeting of QMUL and RSAMD in mid-June to clarify issues related to this work package.

Evaluation- NICE are analyzing the results of speaker retrieval on their customer’s data base. They are also considering purchase a standard data base (NIST 2004) for speaker verification evaluation from LDC. NICE are analyzing the results of speech detection on their data base. For speech retrieval, ALL are working with a Hungarian corpus, which has been enlarged.

Expert User Advisory Board- One early priority is to convene the Expert User Advisory Board, comprising managers of sound archives, and this is underway with the British Library, National Library of Scotland, Irish Traditional Music Archives and Observer Budapest already contacted. INA (France) have also agreed to join. Other target members will be the African Museum (Belgium) and RTE (Ireland).

WP8

Deliverable- QMUL has delivered D8.1, Public website and promotional brochure. The website is , and it includes the brochure. Further promotional information included a powerpoint presentation, also available on the website. The website is also being used to share technical documents within the consortium.

Promotion- The brochure has been disseminated to all partners and was also disseminated at the cultural heritage project meeting, 29-30 June, Luxembourg.

LFUI has prepared a simple EASAIER poster and will present it at an event of the University.

LFUI are also exploring the potential to organize an EASAIER-related workshop in key events in Semantic Web area. Potential large events include the yearly international semantic web conference (ISWC) and the yearly European semantic web conference (ESWC). LFUI plan to submit a workshop proposal next year for one of these events.

QMUL also plan to organize an EASAIER-related workshop at the 31st International Audio Engineering Society Conference, entitled New Directions in High Resolution Audio. The workshop will be concerned with audio formats and issues for preservation and digitization of audio content.

Collaboration- Internal collaborations have been numerous and are mentioned above. One internal collaboration outside the scope of the project is the agreement that a key researcher at NICE will pursue his PhD at DIT. External collaborations have been arranged through the cultural heritage project meeting. These include planned meetings with MEMORIES, providing expertise on speech transcription to MultiMATCH, discussions with CONTRAPUNCTUS on interfaces for the visually impaired, etc..

Submissions- QMUL have a paper on Sonic Visualiser, a prototype music analysis interface, accepted for publication at the The 7th International Conference on Music Information Retrieval, ISMIR 2006. NICE are presenting a poster describing their advances in Speaker Recognition in Odyssey 2006 on June 29th in Puerto Rico. DIT’s prototype system for polyphonic chord recognition will be submitted shortly. When camera-ready versions are available, these will be delivered to the project officer.