Annex I - “Description of Work” · In order to cope with these kinds of problems, innovative audio processing, data mining, and visualization techniques, alongside proper user

Proposal #033902, 27/3/2006 Page 1 of 62

SIXTH FRAMEWORK PROGRAMME PRIORITY IST-2005-2.5.10

ACCESS TO AND PRESERVATION OF CULTURAL AND SCIENTIFIC RESOURCES

Contract for: SPECIFIC TARGETED RESEARCH OR INNOVATION PROJECT

Annex I - “Description of Work” Project acronym: EASAIER Project full title: Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Proposal/Contract no.: 033902 Related to other Contract no.: None Date of preparation of Annex I: 27/03/2006 Operative commencement date of contract: 01/05/2006

Proposal #033902, 27/3/2006 Page 2 of 62 Table of contents 1 PROJECT SUMMARY........................................................................................................................................................................3 2 PROJECT OBJECTIVES.....................................................................................................................................................................4 3 PARTICIPANT LIST...........................................................................................................................................................................8 4 RELEVANCE TO THE OBJECTIVES OF THE IST PRIORITY ......................................................................................................8

State of the Art ..............................................................................................................................................................................8 Relevance to Strategic Objective 2.5.10........................................................................................................................................9 Relevance to the Objectives of the IST Work Programme 2005-06.............................................................................................10

5 POTENTIAL IMPACT ......................................................................................................................................................................11 OBJECTIVE-RELATED STRATEGIC IMPACT ...........................................................................................................................................12 SCIENTIFIC INNOVATION STRATEGIC IMPACT ......................................................................................................................................12 SYNERGIES WITH OTHER RESEARCH ACTIVITIES..................................................................................................................................12 CONTRIBUTIONS TO STANDARDS ........................................................................................................................................................13 CONTRIBUTION TO POLICY DEVELOPMENTS........................................................................................................................................14 RISK ASSESSMENT.............................................................................................................................................................................14

6 PROJECT MANAGEMENT AND EXPLOITATION/DISSEMINATION PLANS.........................................................................15 6.1 PROJECT MANAGEMENT ..............................................................................................................................................................15

Management Structure................................................................................................................................................................15 Communication flow and documentation....................................................................................................................................17 Conflict Resolution......................................................................................................................................................................17 IPR and Knowledge Management...............................................................................................................................................17 Management and Protection of Digital Assets............................................................................................................................18 Quality Assurance Plan ..............................................................................................................................................................19

6.2 PLAN FOR USING AND DISSEMINATING KNOWLEDGE....................................................................................................................19 Exploitation Strategy Team and Business Plan...........................................................................................................................20

6.3 RAISING PUBLIC PARTICIPATION AND AWARENESS ......................................................................................................................20 Public Website ............................................................................................................................................................................20 Education issues .........................................................................................................................................................................20 EASAIER Multimedia promotion kit ...........................................................................................................................................21

7 DETAILED IMPLEMENTATION PLAN.........................................................................................................................................21 7.1 INTRODUCTION – GENERAL DESCRIPTION AND MILESTONES ........................................................................................................21

Major Milestones ........................................................................................................................................................................22 WP2 Media Semantics and Ontologies .......................................................................................................................................23 WP3 Retrieval Systems................................................................................................................................................................25 WP4 Sound Object Representation .............................................................................................................................................27 WP5 Enriched Access Tools........................................................................................................................................................31 WP6 Intelligent Interfaces ..........................................................................................................................................................33 WP7 Evaluation and Benchmarking ...........................................................................................................................................34 WP8 Dissemination and Exploitation .........................................................................................................................................35

7.2 PLANNING AND TIMETABLE .........................................................................................................................................................37 7.3 GRAPHICAL PRESENTATION OF WORK PACKAGES ........................................................................................................................38 7.4 WORK PACKAGE LIST ..................................................................................................................................................................39 7.5 DELIVERABLES LIST .....................................................................................................................................................................40 7.6 WORK PACKAGE DESCRIPTIONS...................................................................................................................................................41

8 PROJECT RESOURCES AND BUDGET OVERVIEW...................................................................................................................50 8.1 EFFORTS FOR THE FULL DURATION OF THE PROJECT ......................................................................................................................50 8.2 OVERALL BUDGET FOR THE FULL DURATION OF THE PROJECT........................................................................................................51

Overall Budget and Overall Request for Funding.......................................................................................................................52 8.3 MANAGEMENT LEVEL DESCRIPTION OF RESOURCES AND BUDGET..................................................................................................52

9 ETHICAL ISSUES.............................................................................................................................................................................54 GENDER ISSUES .................................................................................................................................................................................55

APPENDIX A: CONSORTIUM DESCRIPTION ................................................................................................................................55 Roles of the Partners...................................................................................................................................................................55 Queen Mary University of London - QMUL ...............................................................................................................................55 Dublin Institute of Technology - DIT ..........................................................................................................................................56 Royal Scottish Academy of Music and Drama - RSAMD............................................................................................................57 Applied Logic Laboratory - ALL.................................................................................................................................................58 Leopold-Franzens University of Innsbruck - LFUI.....................................................................................................................58 NICE Systems..............................................................................................................................................................................59 SILOGIC .....................................................................................................................................................................................60

APPENDIX B: PUBLICATIONS CITED IN THE TEXT ...................................................................................................................61

Proposal #033902, 27/3/2006 Page 3 of 62 Enabling Access to Sound Archives through Integration, Enrichment and Retrieval EASAIER

1 Project summary Many digital sound archives suffer from tremendous problems concerning access. Sound materials are often held separately from other materials and media, they cannot easily be listened to or browsed, and there is no way to search the content. Existing systems which attempt to deal with these issues are often library or content specific, of limited functionality, or difficult to use.

Extending over two and a half years, the EASAIER project will enable enhanced access to sound archives by providing multiple methods of retrieval, integration with other media archives, content enrichment and enhanced access tools. It offers methods of searching content based on audio features, musical features, or speech content. EASAIER also supports cross-media retrieval, enabling access to other media besides just audio. It implements recent advances in machine learning, music and speech processing, and information retrieval. Furthermore, it addresses a growing demand for interactive electronic materials.

EASAIER allows archived materials to be accessed in different ways and at different levels. The system will be designed with libraries, museums, broadcast archives, and music schools and archives in mind. However, the tools may be used by anyone interested in accessing archived material; amateur or professional, regardless of the material involved. It enriches the access experience as well, since it enables the user to experiment with the materials in exciting new ways.

EASAIER will provide a unique, friendly interactive experience, utilizing state-of-the-art technologies to increase the effectiveness of sound archive access. The EASAIER project benefits from the wealth of user needs studies in this area. This allows the work to be reactive; the tools that are developed through EASAIER are in direct response to user demands.

Proposal #033902, 27/3/2006 Page 4 of 62 2 Project Objectives We have identified several key areas that still lack a deep, systematic, and focused approach: multi and cross- media retrieval, interactivity tools, integration of speech and music processing methods, and systemic archive analysis. In order to cope with these kinds of problems, innovative audio processing, data mining, and visualization techniques, alongside proper user needs and evaluation studies, will be developed and integrated into prototypes. These will be deployed in several sound archives in order to demonstrate a qualitative jump in usability, effectiveness and accessibility. Over a two and a half year period, EASAIER will achieve the following objectives: 1. Improve and implement the separation and representation of sound objects from audio signals. The following goals can be identified:

• Establish a common set of metadata and provide a mapping for various existing archive ontologies. Previous projects have paved the way for creating high-level descriptors such as rhythm complexity or tonality. Machine learning will be used for computing descriptors. Software will consist of an extraction module for semantic descriptors related to speech, music and other audio.

• Develop and improve segmentation and source separation techniques. This ability is valuable in itself by allowing the user to access, analyse and listen to the finer details of a recording. Both temporal and spectral methods will be applied, specifically adapted for music or speech content. These tools also provide a necessary front end to Transcription tools and Sound Object Recognition tools. Furthermore, the outputs of these tools can then be used as refined queries for the retrieval system.

• Sound object recognition tools which aim to identify the sources of the sounds in archived recordings. This is useful for metadata creation; for example, allowing media searches by instrument type. The search then returns historical, geographical and visual information as well as audio content pertaining to the instrument.

Assessment: This objective will be assessed and verified by the publication of technical papers, measurement of accuracy in international evaluation conferences, and demonstration of working prototypes. Measuring progress will be by means of evaluations against ground-truth databases, user evaluation studies and computational load and effectiveness figures, taking care to ensure relevance and portability. Progress will be measured using existing algorithms as baselines against which we will assess the achieved improvements. Annotated test databases will be instrumental to numerically assess the reliability and generality of the achieved improvements. 2. Allow processing of content in order to provide online interactive tools for end-users and enrich

the browsing experience. This objective will enhance the interaction capabilities of an end-user with sound archives through the exploitation of the content within. Using our expertise, the following technical tasks will be achieved:

• Time-stretching – allows a user to slow down (or speed up) recordings, without modifying the pitch. This would enable a music student or musicologist for example, to more readily learn or analyze a piece of music. The ability to speed up the audio content gives the user the ability to browse long segments of audio rapidly. The same technology also allows for pitch-shifting of the audio without affecting the time scale.

• Transcription – allows a piece of music to be converted into a standard score notation or other popular notation systems. The output of such a tool is also useful for generating metadata, e.g. allowing searches by melodic similarity. Existing music transcription technology is limited to single instrument transcription; our source separation tool allows for multiple instruments to be transcribed individually.

• Synchronisation and alignment of multimedia – the HOTBED project identified a strong user need for other media in addition to audio. For instance, if time scaling is applied to audio, similar scaling must be applied to the video in order to maintain synchronisation.

Assessment: Completion of this objective may be determined through rigorous evaluation studies which demonstrate that new tools satisfy established user needs. The established user needs studies used as input for evaluation include the HOTBED project [1] and extensive research by the JISC[2-4].

Proposal #033902, 27/3/2006 Page 5 of 62 3. Provide multiple online retrieval systems, allowing for searching of content and metadata using

multiple different techniques and modalities. • Speech retrieval - The development of speech/music separation and segmentation technology will

reduce the overhead of indexing and will supply metadata on the audio content. The application of multilingual speech indexing technology to mixed audio (speech, music and other sound objects) sound archives will enable the retrieval of spoken audio content by text-based queries. The deployment of speaker identification technology will supply metadata with the information on the speaker, enabling speaker identification and retrieval, etc. The development of the vocal query interface will enable voice-initiated speech retrieval in the system.

• Music retrieval - Music retrieval involves searching and organising audio collections according to their relevance to music-related queries. This process consists of the generation of compact representations for both the query and the collection and the search for similarities between these representations. Most music retrieval systems use low-level features which allow fingerprinting of audio files, but are limited to only exact match retrieval. By using appropriate higher level features, we will obtain ranked lists of audio files related to the query through melodic and harmonic similarity.

• Cross-media retrieval – This allows the user to search media in various formats (audio recordings, video recordings, notated scores, images etc…) and find related material across different media. For instance, a search for similar media to a piece of music could result in musically similar pieces as well as relevant text and video linked to the song or performer. The establishment of linked metadata will enrich the content by allowing for association of separate media.

Assessment: This objective will be assessed by deployment and publication of new retrieval technologies with high precision-recall characteristics. Music retrieval methods will be assessed by MIREX and related international benchmarks, and speech retrieval methods will be assessed using internal evaluation archives provided by NICE, and freely available archives such as the On-Line Speech Corpora provided by the Linguistic Data Consortium. 4. Provide organizational tools to gather archive materials into tailored collections and provide an

interactive multimedia means of demonstrating the relationships between the contents. We have identified a strong interest in sharing or unifying the content of different archives and unifying the content of different types of material within an archive. This can be achieved via semantic mediation based on semantic web technologies. A suite of tools on ontology editing and library management will be set up to ease this task. Based on this tool suite, different multimedia archives can be mapped, aligned and integrated. Users would be able to more easily locate not only sound materials, but link them with related objects. This will entail the following goals:

• Provide tools for cross-archive access, especially the unified API for managing the storage and retrieval of multimedia objects. Project ontologies and description schemes will be reported.

• Implement functions that help the user to learn about the collection, the items within, and their relationships. In order to achieve this goal, we will combine description extraction from the signal with manual annotations of sound archives. Manual annotations are a pre-requisite for the exploitation of data-mining algorithms that alleviate and simplify further annotations.

• Create tools for visualizing and navigating sound archives and groups of elements inside collections. We will develop models of different categorizations that can be used for listening, exploration and retrieval of sound archives. These techniques will be used to provide systematic or global study of an archive’s contents. Such analysis may be used to characterise an archive in terms of its structure, or to compare several archives in order to identify common features.

• Develop tools that learn categories for audio description, from examples. In order to progress, problems need to be defined in terms of class discriminations. Examples of that would be: determing if a piece is speech or music, identifying the spoken language in speech contents, identifying the time signature or key of musical pieces...

Assessment: This will be verified by the successful creation and dissemination of new interactive visualizations and navigation systems. An outcome of this objective is that it will enable collaboration of disparate communities with a shared interest in the varied materials of the sound archives. Achievement of this outcome will be verified by the merging of archives from members of the Expert User Advisory Board,

Proposal #033902, 27/3/2006 Page 6 of 62 e.g., the National Library of Scotland and the Irish Traditional Music Archive. The ontologies and description schemes will be assessed by demonstrating the usefulness of proposed structures. 5. Measure the effectiveness of the developed access system and explore its potential impact We will develop and deploy prototypes intended to be elements of a larger architecture for sound archive access. In development and testing we will also deal with issues and components of a digital access system. These include IPR management, data security, and digital assets management. Evaluation of prototypes will be done using user groups representing content providers and managers, and end users. Data will be gathered by means of panel sessions, focus groups, surveys, and close analysis of user interaction with the prototypes. The EASAIER project will be ensured a strong impact through promotion and dissemination, as described in Section 6, as well as through collaboration with ongoing related projects. The following goals have been identified as components of this objective:

• Evaluate the system from a variety of perspectives such as uptake, usage, presentation, value and effectiveness. We are concerned with the role of users and their interaction with sound archives, instead of considering them as passive consumers. We will integrate relevance feedback and other user-centred methodologies into the audio description and retrieval processes. Thus, we will validate usability and efficacy of new description schemes and ontologies by active involvement of user communities.

• Benchmark the effectiveness using both subjective and objective measures. Validation scenarios and protocols will be defined and developed, and feedback will be gathered and reported. Where possible, retrieval methods can be validated against ground truth sets. Otherwise, validation will require specific evaluation protocols that are closely connected to perceptual and cognitive experiments, or to survey research. We will develop operational, well-specified, testable, and replicable procedures for evaluating audio processing techniques.

• Establish synergies with existing and related research projects. Numerous projects exist both on the national and international level which are related to access to sound archives. It is important that the EASAIER project collaborate with the wider community and ensure that project achievements are placed in the broader context of audio-visual archiving and cultural heritage.

Assessment: This objective will be measured through the use of rigorous formal evaluations and by the successful deployment of novel access tools in sound archives outside the consortium. Further assessment includes detailed feedback from the Expert User Advisory Board. Achievement of this objective will involve the inclusion of major archival institutions on the Board, gathering detailed assessments from the experts, and successfully addressing their comments and criticisms. The level of synergy with existing projects will be measured through the collaborations established and reported (co-authored publications, joint conferences or workshops, visits and exchanges, etc). Value and effectiveness will also be evaluated through consultation and comparison of the EASAIER project with related projects such as PrestoSpace (FP6-507336), MultiMatch and EthnoArc. The number of yearly accepted submissions and percentage of submitted papers that are accepted will be used, among other metrics, as ways for measuring the successfulness of the scientific side of the project. Results of Assessment will be incorporated into the Business Plan developed by the Exploitation Strategy Team. The following table lists the general assessment methods and indicators that may be used to evaluate each objective and the overall performance of the EASAIER project.

Proposal #033902, 27/3/2006 Page 7 of 62

Table 1. Assessment of Project Objectives.

Objective

Description

Relevant Deliverables

Check

Separation & Representation of Sound Objects

1

Ass

essm

ent 1. Demonstration of working prototypes.

2. Publication of technical papers which are peer reviewed. 3. Measurement of accuracy in international evaluation conferences. 4. Evaluations against ground-truth databases. 5. Direct comparison with baseline algorithms. 6. User evaluation studies.

D2.1, D4.1-D4.3

Content Processing for Interactive Tools

2

Ass

essm

ent

1. Demonstration of prototype content processing algorithms. 2. Publication of technical papers. 3. Verification of user requirements as defined in established user needs

studies [1-4]. 4. User evaluation studies. 5. Expert Advisory Board Evaluation. 6. Direct comparison with baseline algorithms.

D5.1-5.3

Online Retrieval Systems

3

Ass

essm

ent

1. Publication of new retrieval technologies. 2. Demonstration of prototype Retrieval Systems. 3. Test bed user group deployment. 4. Music retrieval system assessment by MIREX 5. Speech retrieval system assessed using internal archives & the On-

Line Speech Corpora.

D3.1-3.3, D7.2

Organizational Tools for Tailored Collections

4

Ass

essm

ent 1. Creation and dissemination of visualisation and navigation systems.

2. Merging of archives from members of the Expert User Advisory Board. 3. Ontologies and description schemes assessed by demonstrating the

usefulness of proposed structures as well as user group evaluation. 4. Publication of technical papers which are peer reviewed.

D2.1,D6.1

Measure Effectiveness and Impact

5

Ass

essm

ent

1. Formal evaluations & deployment to external sound archives. 2. Feedback from the Expert User Advisory Board. 3. Collaborations established and reported (co-authored publications),

joint conferences or workshops, visits and exchanges. 4. Inclusion of major archival institutions on the Expert Advisory Board. 5. Development of Business Plan.

D7.1-7.2, D8.1-8.3, D1.4

Overall Project Assessment

1-5

Ass

essm

ent

1. Successful management and project administration. 2. Detailed and timely reporting to the Commission. 3. Deliverables and Milestones achieved as stated. 4. Quality assurance maintained and risks avoided.

D1.1-1.3, D1.5

Proposal #033902, 27/3/2006 Page 8 of 62

3 Participant List

List of Participants

Participant Role*

Participant no. Participant name Participant

short name Country Date enter project

Date exit project

CO 1 Queen Mary, University of

London QMUL UK 1 30

CR 2 Dublin Institute of Technology DIT Ireland 1 30

CR 3 Royal Scottish

Academy of Music and Drama

RSAMD UK 1 30

CR 4 Applied Logic Laboratory ALL Hungary 1 30

CR 5 University of Innsbruck LFUI Austria 1 30

CR 6 NICE Systems NICE Israel 1 30

CR 7 Silogic SILOGIC France 1 30 *CO = Coordinator CR = Contractor

4 Relevance to the objectives of the IST Priority

State of the Art Many digital sound archives still suffer from tremendous problems concerning access. Materials are often in different formats, with related media in separate collections, and with non-standard, specialist, incomplete or even erroneous metadata. Thus, the end user is unable to discover the full value of the archived material. To expose the inherent value of the archived material, powerful multimedia mining techniques are needed, in combination with content extractors, meaningful descriptors, and visualization tools. There is also a need to improve retrieval effectiveness. Existing retrieval systems often do not take into account the specific nature of the media content. Search by speech or musical feature functionality is rare. To address this, multiple retrieval techniques need to be merged and deployed, and similarity and structure must be conceptualised in order to provide a usable service. Another issue is that of providing appropriate interaction with and presentation of material for the end-users. An archive used by musicians and music students, for instance, requires that the material can be manipulated or modified appropriately at playback; archives of recorded broadcasts need to emphasise appropriate segmentation and interactive speech recognition features. In addition, the creation of tailored collections with customized material has been identified as a strong user need in access systems. These scenarios necessitate the development of enhanced and appropriate retrieval systems, as well as organizational structures and the means to interact with the presentation of materials. This demands appropriate metadata that can be automatically created in order to deliver, share or organize the archives. The state of the art in access to sound archives is typified by the Variations2 project[5] at Indiana University. This project aims to establish a digital music library testbed. The music is in multiple media and formats: audio, video, musical scores, computerized score notation. Users listen to sound recordings, display printed scores, have notation translated into MIDI format for audio playback and search the collection in all formats from a single search engine. Retrieval is based entirely on metadata searching and no content-based search capabilities are available. Technologies in the field of Music Information Retrieval (MIR) are not exploited. MIR involves searching and organising music collections according to their relevance to specific queries. Systems based on low-level acoustic similarity measures are intended to recognise a recording under noisy

Proposal #033902, 27/3/2006 Page 9 of 62 conditions and despite high levels of signal degradation. These audio fingerprints represent audio as a collection of low-level feature sets mapped to a more compact representation by using a classification algorithm[6]. The features can be extracted from the signal on a frame-by-frame basis and require little use of musical knowledge and theory. This technology has shown great success in detecting exact one-to-one correspondence between the audio query and a recording in the database [7], even when the query has been distorted by compression and background noise. However, acoustic similarity measures disregard any correlation to the musical characteristics of the sound. The OMRAS project had some success at identifying musical similarities using high-level representations[8]. The approach relied on the conversion of audio data into a symbolic representation. Though the conversion is often inaccurate, OMRAS was the first reported system to successfully retrieve polyphonic scores from polyphonic audio queries. Mid-level representations of music are obtained by transforming the audio into a highly sub-sampled function that characterizes the attributes of musical constructs in the original signal. This is the key process in a wide class of music processing algorithms (e.g. onset and pitch detection, tempo and chord estimation). These methods exploit musical knowledge and a deep understanding of human perception, in order to attain higher levels of semantic complexity than low-level features (e.g. successfully characterise the rhythmic structure of a piece), but without imposing the rules of music notation. A significant problem with research into audio processing, sound archive access and semantics is that it has often not taken into account the wealth of user needs studies in this area. Recent work[9-11] has provided a systematic study of what end users want from MIR systems. User needs studies[1] and extensive research by the JISC[2-4] has identified a number of key features that are required in order to enrich sound and music archives. These findings stress the need for web-based access, integration of other media, time-stretching functionality, and alignment of scores with audio, among many others. Studies within the consortium[1] have also contributed significant findings (the need for real-time, user-defined mark-up and looping functions). They confirm and build on previous work on user needs for digitized audio collections such as the ground breaking exploratory studies carried out more than 10 years ago by the Library of Congress[12] or the Jukebox and Harmonica projects[13, 14]. The development of Indiana University’s world-leading Variations project[5] was founded on close analysis of users’ needs – particularly music students’ need for annotation and visualisation tools to help them learn with digital music content. However, a recent Scottish study into training needs analysis in e-learning[15] reported that audio is still an under-used technology. The UK Arts and Humanities Research Council’s ICT Programme has recently funded work surveying the needs of the research community in searching and analysis tools for audio streams[16]. The EASAIER project will create access tools which are specifically designed to satisfy user requirements.

Relevance to Strategic Objective 2.5.10 EASAIER addresses Strategic Objective 2.5.10, Access to and preservation of cultural and scientific resources. Specifically, it addresses Focus 1, Access. It does this through an integrated approach to providing multiple means of access to multimedia libraries focussed around audio resources. We will develop an electronic system for integrating and enriching sound archives and their related multimedia content. Audio has been chosen on the basis that national music, dance and literature are often one of the most tangible cultural exports of a country.By providing new ways of access, we will expose the inherent value of the archived material. By providing new tools for interaction at access time, we will add additional value. In so doing, we will greatly increase the accessibility of sound archives, enhance their value, improve their organizational structure and preserve a significant part of the European cultural heritage. This will be done in direct response to already identified user needs. In this section, we describe how EASAIER addresses each aspect of this objective. Conceptualisation and representation of digital cultural and scientific objects – Though individual audio objects are usually well-defined, their extraction and identification from audio streams or recordings is a complex task. EASAIER proposes to accomplish this through the use of state of the art audio processing techniques. Initially, sources are separated using a variety of basis decomposition[17] and aural localisation techniques[18]. Using instrument and speaker templates, machine learning techniques may be applied to identify the sources. Thus, an audio stream will be conceptualised as a sophisticated mix of distinct auditory events. EASAIER will be concerned with the representation of these events as sound objects, which may be isolated, identified, retrieved and manipulated.

Proposal #033902, 27/3/2006 Page 10 of 62 Representation of sound objects is a separate task from the challenge of structured semantic representations which are concerned with proper identification of intellectual creations. The distinction between the different semantic entities, such as the work, the expression, the manifestation and the item, is often unclear. Even when well-defined through metadata, it is often difficult to bridge the gap between an end-user’s intended topic statement and the queries that may be passed to an information retrieval system. EASAIER proposes to tackle these issues through a complex navigation of metadata and content-based features. We will enable searching of audio through speech processing, music processing, metadata creation, and cross-media retrieval. Through novel means of presenting results, EASAIER allows a simple query concerning a general topic, to respond with browsable presentation of related works, accessible by multiple means. Developing new forms of interactive or creative experiences – Most audio archives are very limited in terms of access and interaction[4, 19]. In essence, they give the user the option of discovering and requesting a given resource via metadata. If online listening access is available, it is usually in a restricted means constrained not by the material in the corpus, but by the imposed functionality of the interface. EASAIER will remove these barriers, thus allowing the user to choose their means of access and presentation. Notably, the enriched access tools will allow the end- user to segment and mark up the media, both using the automated segmentation techniques and using manual mark-up tools for direct user interaction. Different media may be retrieved and linked, and alignment and synchronisation tools will enable text to be time matched to audio, notated scores to be aligned with music, and video to be synchronised with time-scaled audio. Using and creatively exploiting non-textual and complex objects – at its heart, EASAIER is concerned with audio materials. They may represent a very long stream, such as a day’s broadcast, or represent a collection of small audio samples and accompanying metadata. But the objects are far more complex than that. Audio objects usually connect to a rich network of associated materials (transcripts, scores, video, images) that are necessary for users fully to interact with them. Associated resources are often not linked and have to be searched separately. They typically represent some combination of speech, music, and other sounds. How best to index and search such objects poses interesting questions. Metadata searching is limited since it requires that the metadata be present and consistent, yet content-based searching is context dependent, i.e., the relevant features for processing speech differ greatly from those for processing music. Furthermore, neither alternative inherently offers full cross-media retrieval. The ability to relate the content of an image to the content of a music file, for instance, is omitted. EASAIER is designed to deal with multiple media and multiple formats. It allows one to listen to any audio resource, to segment the sources appropriately, to retrieve sources based on their content and through various means, and to combine various objects into tailored collections. Integration into sustainable digital library services - The primary outcome of EASAIER is the creation and deployment of software for access to sound archives. This is sustainable for several reasons:

• Sustained and increasing demand for such tools as digitized audio collections become more available. Though techniques and tools for digitization are well established, many archives are only partly digitized and lag behind technology. As they complete the transition to digital collections, they will move into the next stage when the EASAIER tools may be deployed and exploited.

• Exploitation of open source licensing models. Open source software has been widely adopted by the information science community. This model encourages cooperation, reuse and sustainability. The EASAIER project will release open source software under license models described in the Consortium Agreement.

• Deployment of the access system and tools in real world archives outside the consortium. This is key to both dissemination and evaluation, since it provides independent evaluation by real world users.

• Continuing R&D in the consortium, beyond the lifespan of this project. The consortium partners are engaged in long term research on semantics, speech and music processing, retrieval systems and visualization. The outcomes of the EASAIER project will impact on further research which will continue to improve digital library access systems.

Relevance to the Objectives of the IST Work Programme 2005-06 The Work Programme (WP) 2005-06 aims at aligning research in a way that responds to the emerging policy and market contexts and puts Europe in a position to exploit future opportunities. EASAIER will put

Proposal #033902, 27/3/2006 Page 11 of 62 European sound archives at the forefront, and thus promote their increased use and exploitation opportunities. These sound archives include those used commercially, such as radio and television broadcast archives, as well as archives of recorded conversations for use in fraud prevention and emergency call monitoring. Thus the tools created in the EASAIER project may be used not just to enhance access to cultural resources, but also to promote European business and infrastructure. In what follows, we describe how EASAIER addresses various aspects and objectives of the IST Work Programme for 2005-2006. Mastering complexity by pioneering new approaches to cope with the very large – EASAIER is intended for use in large audio collections. This is one of the aspects which differentiates EASAIER from other European projects like SIMAC or Semantic Hi-Fi. SIMAC is intended for use with personal music collections, and Semantic Hi-Fi is intended to improve user interaction with tomorrow’s digital music distribution systems. In those projects, the user is the individual, and the content is either material on a personal device (SIMAC) or material that has been delivered or accessed via a distribution chain. In the EASAIER project, the primary user is the content provider or manager, and the end user is the individual who wishes to access the content. Thus EASAIER operates on a much larger scale, dealing with very large collections which present issues concerning computation, indexability and infrastructure requirements. EASAIER addresses these issues by developing appropriate semantics and ontologies to cope with large audio databases by developing and evaluating retrieval systems capable of handling very large collections, and by creating intelligent interfaces to the archives. Exploring multidisciplinary fields combining ICT with other science and technology fields – Development of software systems for accessing a digital collection is principally an ICT field, but enriched access to audio also requires digital signal processing, musicology, library science and user interface design. Notably, in audio processing, the EASAIER project brings together researchers in the speech processing and music processing fields. These two communities have worked in parallel for many years on complementary problems (speaker and musical instrument recognition, speech and music transcription, emotion detection and expressivity) but with little interaction or knowledge of each others approaches. The EASAIER project represents a unique opportunity to foster interdisciplinary approaches, exchange ideas and exploit recent advances in both subdisciplines of audio processing.

Promoting innovation from ICT use by bringing services and technology developments closer together – Sound archives represent important cultural heritage. The content managers who provide access to these archives perform an essential service in that they enable the community to access their traditions, promote their culture, and interact with and learn about their aural history. As mentioned in Section B.1, innovative music processing technology is rarely applied to content in traditional sound archives (libraries, museums), and speech technology is exploited even less. By integrating library services with state of the art technologies for content retrieval, semantics, interfaces and access tools, we will stimulate innovative uses of the archives. For instance, end users will be able to mine the archive, transcribe the audio, and uncover similarities between archive items that otherwise would only have been revealed through exhaustive listening. Reinforcing European strengths whilst seizing new opportunities – Europe is both incredibly rich in its cultural heritage and in the strength of its music and audio industry. Furthermore, the media and information industry has been identified by the Foresight Programme as one of the fastest growing sectors in world markets[20]. However, the digital audio technologies field is highly competitive and international. The music recording and distribution industry is at a critical juncture. Its traditional revenue streams are diminishing and many recording companies are seeking new methods of disseminating their products and increasing the market size. Access to audio archives represents an important potential revenue stream, and may also be used to promote a recording company’s intellectual assets. Thus, EASAIER represents an excellent and timely opportunity for the EU to exploit its strengths and maintain its leading edge in a key aspect of new digital audio technologies.

5 Potential impact The goal of EASAIER is to enable access to sound archives through enrichment of materials, integration of multimedia sources, and creation of advanced retrieval systems. It implements recent advances in machine learning, music and speech processing, and information retrieval. Furthermore, it addresses a growing demand for interactive electronic materials. EASAIER allows archived materials to be accessed in different ways and at different levels. The system will be designed with libraries, museums, broadcast archives, and music schools and archives in mind. However,

Proposal #033902, 27/3/2006 Page 12 of 62 the tools may be used by anyone interested in accessing archived material; amateur or professional, regardless of the material involved. Furthermore, it enriches the access experience as well, since it enables the user to experiment with the materials in exciting new ways. Outcomes will be valuable to both sound archive content providers and end-users. By enriching and organizing the materials in the archive, we will create new and innovative methods of access. This enhanced access stimulates use, because multimedia resources will be connected, and archived material will be exploited and made visible to the right users. This will be deployed in connection with strong evaluation studies using communities of users and content managers. EASAIER will provide a unique, friendly interactive experience, utilising state-of-the-art technologies to increase the effectiveness of sound archive access. Most importantly, all of this will be done in a reactive manner, in direct response to user needs which have been identified in [1, 2, 5, 19]. These inter-connectable components will be devised, developed, and tested in connection with user communities and content distributors. Previous work within the digital library community has identified strong demand for specific tools. Rather than ‘guessing’ at demands for new technology, these demands have been specified and this project is focused on addressing these needs. This minimizes risk and greatly increases the impact of this work.

Objective-related strategic impact Information Society issues have been identified as a priority of the European Research Area (COUNCIL DECISION 1513/2002/EC, 30 September 2002). Inclusive access is a strong component of the EASAIER project in that it will promote access to inclusive institutions (libraries, museums, etc…). EASAIER will greatly improve the accessibility of sound archives throughout Europe, and hence encourage the use of the rich cultural heritage maintained in these archives. The Management and Protection of Digital Assets is also carefully considered in terms of collection of content, evaluation procedures and project outcomes. The procedures which will be used for the collection and supervision of content are described in Section B.5. Remarkable impact in the following aspects of access to and preservation of digital cultural heritage is foreseen as outcomes of the EASAIER project.

• Improving the retrieval operations in digital sound archives • Adding value to collections of audio and related resources • Increasing the involvement of communities • Exploring and testing novel environments for retrieval of digital archived resources

Scientific innovation strategic impact We can identify the following scientific and technological innovative aspects in EASAIER (they will be further developed in the description of workpackages):

• It approaches the problem of music similarity problems in a rigorous and scientific way. • It devises a means of combining metadata and feature extraction for cross-media retrieval. • It integrates user needs studies, evaluation, and other user-centred methodologies into the access

system creation processes • It develops novel methods for structuring large digital collections, and for navigating through them. • It puts forward business models involving the exploitation of software under an open source model. • It plans coordinating actions with ongoing initiatives (see below in this section)

Synergies with other research activities The partners of the EASAIER Consortium have considerable experience in national and international projects, including many that are directly applicable to this project. QMUL coordinated the recent European IST Project BUSMAN (Bringing User Satisfaction to Media Access Networks) as well as participated in the IST Network of Excellence, SCHEMA (Content-Based Semantic Scene Analysis and Information Retrieval). QMUL is currently a member of the SIMAC project (Semantic Interaction with Music Audio Contents). SIMAC’s main task is the development of prototypes for the automatic generation of semantic descriptors and development of prototypes for exploration, recommendation, and retrieval. QMUL is also a member of aceMedia[21], which is concerned with the integration of knowledge, semantics and content for user-centred intelligent media services. Finally, QMUL is supporting the IMIRSEL (International Music Information Retrieval System Evaluation Laboratory) project[22], funded by the National Center for Supercomputing Applications, The Andrew W. Mellon Foundation, and the National Science Foundatio (2004 – 2006).

Proposal #033902, 27/3/2006 Page 13 of 62 IMIRSEL has made internationally available a multi-terabyte collection of music materials for use in rigorous scientific research of music information processing algorithms. From BUSMAN, we bring skills on the delivery of content and from SIMAC we will utilise the low-level audio processing tools that have been developed, as well as the evaluation testbeds and content that was established. We will work closely with aceMedia in order to exploit the automatic annotation techniques that will be developed. The Network will be used to disseminate the results and encourage further research and development. From IMIRSEL, we have a proven evaluation testbed available for use with all music processing and retrieval tasks. RSAMD is a member of the steering groups of a number of UK projects researching aspects of digital sound archives. These include the Archival Sound Recordings project, a collaboration between the British Library National Sound Archive and the JISC to digitize and make available over 4,000 hours of the British Library’s collections and the UK Arts and Humanities Research Council-funded Centre for the History and Analysis of Recorded Music (CHARM) which concentrates on research applications of digital music collections. The AHRC has recently funded audio-related user needs research[16] under its ICT programme (of which Celia Duffy is a member of its management group). RSAMD has representation on the Steering Group of the Spoken Word project[23], jointly funded by the National Science Foundation and the JISC. RSAMD also has strong links with the Variations project[5]. Variations has had success in attracting grant funding amounting to over $3.1 million from the National Science Foundation and National Endowment for the Humanities through the US Digital Libraries Initiative. The project is currently exploring ways of transferring the knowledge it has gained to a wider range of different institutional settings. The partners consist of several internationally recognized research centers, which have been involved in numerous national projects whose output will directly contribute to this project. These include the HOTBED (Handing On Tradition By Electronic Dissemination) project on audio archive evaluation and enrichment, the audio content processing DITME (Digital Tools for Music Education project), and the music retrieval and organisation projects OMRAS(Online Music Recognition and Searching) and SIMAC(Semantic Interaction with Music Audio Contents). Mark-up tools have been developed in the HOTBED project, music retrieval tools developed in OMRAS, and online audio tools developed by the DITME project. All of these tools will be enhanced, modified, and deployed in the EASAIER project. The academic partners will join forces with industrial partners dealing with speech recognition, visualization and interfaces, and meta-data exploitation. Applied Logic Laboratory has made significant contributions in projects such as SoundStore and MEO, NICE has lead Digital Recording and Quality Management Solutions research in the KITE (Knowledge Inference Technologies) industrial alliance. SILOGIC has been active in interface design and systems integration in the EU IST projects DSE (Distributed Systems Engineering), DUNES (Dialogic & Argumentative Negotiation Educational Software), and the multimedia annotation project DIANE (Design, Implementation and Operation of a Distributed Annotation Environment). EASAIER has been developed with exploitation of these projects as a principal goal.

Contributions to standards A key outcome of the EASAIER project will be the creation and deployment of standardized tools for organizing and indexing sound archives. This necessitates the use of scalable and convertible metadata standards that can be integrated with different archives in different settings. Furthermore, standardized representations must be used in order to evaluate our results, as well as to ensure effective dissemination. EASAIER will contribute to standards in the following ways;

• Defining standardized methodologies and test collections for the evaluation of retrieval systems. Coordinating actions are planned with ongoing initiatives, such as the music retrieval evaluation platform IMIRSEL and the RWC Intellectual Resources Working Group. RWC is a copyright-cleared (Real World Computing) music database that is available to researchers as a common foundation for research.

• Developing description schemes, semantics, taxonomies and ontologies to organize a rich and powerful audio knowledge representation. Descriptors require a flexible and open structure that places them into different relationships. This structure has to accommodate newer elements as they are needed. MPEG-7, which includes an Audio Content Description Interface, is a clear candidate for such a generic framework, even though it has important shortcomings for dealing with content description. QMUL has existing synergies with the MPEG-7 group (aceMedia FP6-001765 and EPSRC SeMMA GR/S84750/01), and these will be exploited for the development and delivery of an MPEG7-compatible Description Scheme. This will be the basis for planning an active role in the

Proposal #033902, 27/3/2006 Page 14 of 62

standardization processes (we will be watchful of other relevant standards such as MPEG-21, SMDL, MusicXML and AAF).

• Contributing to the Semantic Web and W3C initiatives. Standards in the area of semantically enabled knowledge technologies are chiefly being determined by the World Wide Web Consortium (W3C). In particular, RDF has been defined by the RDF Core Working Group of the W3C, whilst the ontology language OWL (Web Ontology Language) is being defined by the Web-ontology Working Group. The Digital Enterprise Research Institute (DERI) at LFUI is a member of the W3C, and is heavily involved in the OWL standard. Furthermore, DERI participates in the W3C interest group on "Semantic Web Best Practices" and activities on semantic web rule languages. DERI has also positioned itself in the Semantic Web Services area. It is very proactive in the W3C Semantic Web Services interest group. It has submitted the proposal of Web Service Modeling Ontology (WSMO) to W3C as potential semantic web services standards.

• Promoting professional associations and standards for metadata and archive creation and indexing. This involves pushing proposals in standardization initiatives, joining existing efforts promoting standardization, working towards a shareable sound archive or organizing specific workshops and symposia in generic conferences. Notably, we will exploit existing synergies, through SILOGIC, with the Open Archival Information System (ISO 14721: OAIS). We will contribute to the latest recommended standards, which will allow existing and future archives to be more meaningfully compared and contrasted. SILOGIC has contributed to this standard and will make further contributions through the EASAIER project.

Contribution to policy developments EASAIER will greatly enhance access to important European music and sound archives, as well as encourage the adoption of new technologies throughout the EU. This will stimulate the transition to a knowledge based economy, as defined in the Lisbon Agenda. Among EASAIER’s added values to the community are:

• Adding value to collections of audio and related resources • Increasing the involvement of communities • Exploring and testing novel environments and interfaces for music visualisation • Integration of multinational user needs studies, evaluation, and other user-centred methodologies • Development of business models involving the exploitation of software under an open source model. • European Audiovisual policy and Preserving European music content production • Social policy, creation of employment • European Open Source Software Policy • Digital content policy

Risk Assessment There is no potential risk associated to citizens or society associated with the EASAIER project. The risk management procedure for the EASAIER project is part of the Quality Assurance Plan. In this section, we have identified the specific risks that are associated with each Work Package.

In WP1, management risks such as partner defection or misuse of resources will be respectively reduced by proper Consortium Agreement elaboration, and by proper monitoring of the project execution. Temporal and functional dependencies between modules and between partner’s responsibilities might cause delays in deliverables that will be foreseen and avoided by proper and timely project specification. WP1 also involves collection of content. These will be gathered by all the partners and by content providers who have expressed an interest in the project. The contents that are going to be used for testing and development will have rights properly cleared or granted. Here we have the advantage that several partners already have public domain material which may be exploited without copyright and ownership issues. Work packages 2 to 6 deal with scientific research oriented towards practical applications. Here we face an unavoidable risk of focusing on basic research problems that may not have a clear solution during the time-span of the project. These dead-ends will properly be identified during the first year of the project, and the project workplan re-specification process will consider them (and propose re-orientations or changes in focus) before time and resources are wasted. However, we will focus on those technologies which have been successful or shown promise in testing and research settings. Additionally, the technological approach and

Proposal #033902, 27/3/2006 Page 15 of 62 previous expertise that all partners bring to the project ensures that an acceptable approximation, in terms of functional features for the planned components, will be implemented and tested. A risk in WP7 is the potential misuse of content by the communities of testing users. The partners must ensure that the content provided for research and demonstration activities is not subject to piracy activities. Another inherent risk is that user groups will not be large enough, or representative enough of typical archive users. This risk is minimized by using established large scale evaluation procedures. This was successfully deployed in prior projects (HOTBED) with notable success. In this case, user-feedback will provide a valuable source for determining the “acceptability” of proposed approximations to hard scientific problems. Additionally, dealing with retrieval of audio content requires the adoption of imperfect techniques. Although there is room for improvement, and this will be a goal for some scientific teams, care will be taken to keep the user’s expectancies close to what can be achieved given the current state of the art. A further risk is in the ability to accurately address user needs. This will be done in a reactive manner, in direct response to already identified user needs. Previous work within the digital library community has identified strong demand for specific tools. Rather than ‘guessing’ at demands for new technology, these demands have been specified in prior art and in previous work by the partners. We will be assisted in this by the establishment of an Expert User Advisory Board who will help ensure that we remain focussed on addressing the needs. This minimises risk and greatly increases the impact of this work. A final risk occurs in WP8 with the establishment of an Exploitation Strategy Team. The team must be well-suited to the task, and have a clear understanding of the available exploitation routes. Here we are helped by the involvement of three industrial partners in the consortium. Also this team will be established a year into the project, after initial assessments of the work progress, but well before any exploitation opportunities could be missed

6 Project management and Exploitation/Dissemination plans

6.1 Project Management Queen Mary University of London (QMUL) will be responsible for both scientific and administrative project management. QMUL has extensive experience in project management and an established team with the knowledge and skills to oversee various aspects of the work. In particular, the Centre for Digital Music at QMUL has a strong international profile and multidisciplinary connections with many communities that will enhance the dissemination of results. Technical project co-ordination is the key element to keeping the work packages on track. This is particularly important in view of the tight integration required between the work packages. The first level of technical review will be performed by the Work Package leaders. The leaders have been chosen for their technical competence to manage the work packages in their respective groups. A second level of review will be performed by QMUL for coherence and guidance on work package control. Contractual management and interfacing with the European Commission will be performed by the QMUL Project Management team.

Management Structure

To provide quality assurance and effective communication across the project, QMUL will appoint a Coordination Team, lead by an overall Project Manager, and a Steering Committee. In turn, the Project Manager will assign an Administrative coordinator and a Research coordinator. The latter two will provide strong support to the Project Manager with any matters related to the administrative tasks and the research tasks in the management of the project. Other key roles have been also depicted in Figure 1.

Proposal #033902, 27/3/2006 Page 16 of 62

Email,Phone &

Videoconference

Coordination Team

QMUL

NICE

EuropeanCommission

ProjectOfficer

Project Manager

ResearchCoordinator

AdministrationCoordinator

ALL

RSAMD

DIT

Data repository

•Project reports•Deliverables•Cost Statements

Team Manager

Administ. Resp.

Team Manager

Administ. Resp.

Team Manager

Administ. Resp.

Team Manager

Administ. Resp.

EASAIERweb site

WIKI

CVS

LFUITeam Manager

Administ. Resp.

SILOGICTeam Manager

Administ. Resp.

Figure 1. The management structure of EASAIER. The Project Manager will be Dr. Reiss from QMUL. He will be responsible for the overall project and act as representative of the project in front of the EC. He will assess the achievement of the projects´ objectives throughout the project and the risks taken by the project at all times. He will:

• Ensure efficient management of tasks within the consortium; • Supervise the project progress according to work plan, time schedule and resources-budget

established in the contract; • Communicate with the European Commission and transmit relevant information between the

Commission Officers and the consortium partners • Mediate any conflicts between partners; • Establish efficient communication flows within the consortium; • Report to the project Steering Committee • Ensure any necessary important deviations from contract and work plan are presented and approved

by the Steering Committee. The Research Coordinator will coordinate research between all consortium partners and will align the overall research directives to the project objectives. The Research Coordinator will:

• Monitor time and resource deviations from the original plan, and promote appropriate corrections • Facilitate information flow and collaboration between partners • Manage project documentation • Prepare, coordinate and document Technical Progress Meetings • Ensure that the project achieves its technical objectives and maintains its conformity to the work

plan The Administration Coordinator will provide overall project support and will be the interface with the European Commission regarding documentation and cost statements. The Administration Coordinator will manage all financial aspects of the project, such as:

• Ensure that appropriate resources are being invested • Keep an accurate and up-to-date record of costs, resources and time scales. • Prepare, coordinate and document Management Meetings

Each project partner will identify a Team Manager and an Administrator Responsible. This information will be sent to the Project Coordination Team and it will be updated if there is any change during the duration of the project. In some cases, both roles can be taken up by the same person. These two persons will be responsible for sending all required information to the Coordination Team: periodical project reports, other informal reports, deliverables, cost statements, invoices, etc. The Steering Committee (SC) consists of one representative per partner (normally the Team Manager), under the governorship of the Project Manager. SC members will have the authority to take corrective actions within their organizationso. This management body will meet atleast twice per year, though extraordinary meetings may also be convened. The SC’s objectives are as follows:

• ensure that the consortium fulfils its contractual obligations • ensure that there is effective communication between partners in the consortium and between the

consortium and the Commission • approve any proposed changes to the workplan and/or resources allocation

Proposal #033902, 27/3/2006 Page 17 of 62

• request any actions from Workpackage leaders to rectify any deviations from workplan and resources allocation

Each Work Package will have a WP Coordinator that will be responsible for the following: • Plan the scientific and technical work of the WP, in coordination with all the involved partners; • Ensure that project timetables are maintained, and flag any discrepancies immediately to the

Research Coordinator; • Initiate corrective action for project deviations, if any; • Consolidate partner information and prepare reports for submission to the Research Coordinator; • Ensure that the objectives and milestones of the WP are achieved; • Ensure that deliverables are available on time.

Communication flow and documentation The Coordination Team will meet all WP leaders and Team Managers by multiple teleconferencing at least every two months, where they will report on the progress of the project and discuss any important issues. Communication will be aided by face-to-face meetings in accordance with the travel plans. These reports and the content of project meetings will be the bases for the bi-monthly reports from the project coordinators to the EC. Researchers will be encouraged to work under short-term arrangements at the headquarters of different partners. The mobility of researchers will facilitate integration and communication, and has already been shown to be effective in prior projects (SIMAC). A schedule will be devised and these internships, typically of 1 to 2 months will be arranged based on practicality and need. A restricted mailing list has been operative since the inception of the project, and will be used for generic day-to-day communication. More focused lists will be organized as soon as the main flows of specific and technical communication (for example: integration issues, user management issues, etc.) have been identified. The project will be managed by means of a web-based collaborative space with the appropriate organizational structure and access restrictions. This will make it possible to share calendars and issue reminders for periodic events (reports, conference deadlines, etc.). Office software will be used to track the project status, including maintainance of Gantt charts and accounting spreadsheets. Document drafts may be edited by relevant partners with the use of version tracking facilities. A secure code repository will be set up by QMUL at the project kick-off. Code versions will be managed with the help of CVS, which is a robust, well-known, and widely used tool. Similarly, WIKI tools will be used for collaborative document development and discussion. The repository will not be publicly accessible during the project, though repositories will be made accessible to the public as an outcome. A Consortium Agreement will be signed by all the partners before the outset of the project. This will cover in detail issues such as IPR, exploitation, and responsibilities.

Conflict Resolution The project has been carefully prepared as a collaborative effort from all partners in order to prevent conflicts between them. Forthcoming documents will be created with the same attitude and purposes. Nevertheless, if problems arise, they will be reported to the project manager as early as possible. If necessary, the Steering Committee will be called. Conflict resolution will be approached by consensus or, if necessary, by voting. In the case of irreconcilable disputes, the conflict procedures that will be described in the Consortium Agreement will be adopted.

IPR and Knowledge Management It is expected that new technologies and algorithms will be developed for the EASAIER project. To protect and exploit the newly developed Intellectual Property (IP) the partners agree on the following statement of intent to establish ground rules for collaboration and protection of IPR among Parties. It defines simple guidelines for the protection and patenting of IPR, where this is considered necessary. It is expected that agreements between Parties will be negotiated in the spirit of scientific collaboration ensuring that the challenging technical requirements to be met by the project will be achieved. At this stage of the project, a Statement of Intent is deemed sufficient. When the project is funded, it will be appropriate to revise and expand this document. No restrictions on use of IPR. Where there are no restrictions on a participating Party to divulge essential aspects of technology, including but not limited to experimental results, each Party shall be allowed to use

Proposal #033902, 27/3/2006 Page 18 of 62 such IPR for its own purposes for continuing developments, unless restricted from doing so during a patenting or commercialisation process, or because of conflict with the terms of other agreements. Procedures where protection of IPR is deemed essential. Suitable arrangements for protection and exploitation (as applicable) of the Project IP developed by the Parties should be negotiated as early as possible. Where only one Party to a technology development wishes to protect its own Background IP and any subsequent IP developed at its own expense for the project (Project IP), that Party may exploit the technology for purposes other than the project, but only if no other Party has contributed know-how to the Project IP. Where another Party has contributed, and if the features of such joint invention, design or work are such that it is not possible to separate them for the purpose of applying for, obtaining and/or maintaining the relevant patent protection or any other IPR, the Parties concerned agree that they may jointly apply to obtain and/or maintain the relevant right together with any other Parties in the project. Any other Party involved but not applying for such jointly owned intellectual property right shall discuss with the joint owners the possibility of obtaining a non-exclusive, non-transferable, irrevocable, paid-up license on a case-by-case basis. Whatever license rights are obtained should be available to all Parties in the project on the same terms. Confidentiality. Background and Project IP related to the project will be treated as Confidential Information by all Parties if so indicated by the originating Party at the time of disclosure. Patentable IPR. Material which is patentable arising from the project should be considered as a separate issue given the need to restrict publication prior to issue of a patent. It may be necessary during this period, for Parties participating in the development of the relevant technology, or Parties requiring access to the technology for the development of the project, to be signatories to a Mutual Confidentiality Agreement.

Management and Protection of Digital Assets Protecting the IP rights of content providers is an important issue faced by EASAIER. The primary security goal is to prevent the transmission of copyrighted material to any third party, who might then knowingly or unwittingly distribute it. The secondary goal is to provide seamless access to the copyrighted material for the purposes of research. Achieving these two conflicting goals is accomplished via three approaches.

• Fully rights-cleared material will be used and promoted for research purposes. This includes the use of Creative Commons material where licenses have been constructed in order to provide explicitly the right for the material to be shared or even altered and redistributed. Other rights-cleared material will include The RWC (Real World Computing) Music Database of the Real World Computing Partnership (RWCP) of Japan. All the necessary copyrights and associated legal interests related to the RWC database belong to Japan's National Institute of Advanced Industrial Science and Technology (AIST), who have provided for free sharing of this musical data within the research community. For speech processing, retrieval and detection methods, testing can easily be performed using freely available, license cleared evaluation archives such as the On-Line Speech Corpora provided by the Linguistic Data Consortium.

• The EASAIER project will harness the content and the structure of the International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) for the purpose of evaluating music processing and retrieval methods. A set of legal agreements has been established between IMIRSEL and content providers allowing for the transfer of copyrighted material to IMIRSEL and the use of copyrighted material by members of the research community, including the EASAIER consortium. These agreements are designed to specifically allow only these uses, and retain all other rights, and serve to constrain any other agreements or policies that may be made between IMIRSEL and research efforts such as EASAIER. Furthermore, the IMIRSEL security architecture uses firewalls and application proxies, disables non-essential services, and disallowing external access to data. In effect, it allows music processing to be applied to content while guaranteeing that the content is not transmitted in any form.

• Digital Rights Management will be incorporated into the access tools that are created. The tools will allow the content managers to explicitly specify the accessibility of any content being accessed. The rights to access materials will be constrained both by the IP concerning the material and by the policies of the sound archive. Thus whether it is possible to listen to material, download material or modify material is specific to each individual item as well as the location and security level of the end-user.

Proposal #033902, 27/3/2006 Page 19 of 62 Quality Assurance Plan A Quality Assurance (QA) Plan will be applied to all internal and external services and deliverables. Quality assurance is the joint responsibility of all partners during the project lifetime. The goal of the QA plan is to ensure detection of errors as early as possible, applying planned and systematic activities to determine and ensure achievement of quality objectives. Non-conformance to QA Plan will be documented, and corrective actions applied. The project manager has the authority for implementing and verifying compliance with all quality evaluation policies and procedures related to the project. The following is a non-exhaustive list of QA tasks that will be performed: 1. Develop the QA Plan, develop QA project instructions, procedures, checklists and reports. 2. Monitor project development to ensure that initial project plan goals are met within time and budget

constraints. 3. Review internal and external deliverables for consistency, clarity, technical content, and adherence to the

QA plan. An internal review process will be defined and applied to each deliverable prior to its delivery to the EC.

4. Conduct audits of processes prior to each development phase of the project, and assure existence of specifications for each phase.

QA documentation will be maintained electronically and be accessible by partners through the collaborative software tool that will be adopted, and also through the restricted web site pages.

6.2 Plan for Using and Disseminating Knowledge Dissemination of results and promotion of the achievements will be accomplished mainly through a public website, brochures, press releases, and journal and conference publications in the areas of information retrieval, computer music, audio and speech technology, and multimedia and signal processing. Results will also be disseminated through the AES Technical Committees on Semantic Audio, chaired by Prof. Sandler of QMUL and on High Resolution Audio, vice-chaired by Dr. Reiss, also of QMUL. On the website will be a calendar containing, among other things, an up-to-date lists of relevant conferences, workshops, meetings, and calls for papers. This list will be maintained by the Research Coordinator with the collaboration of all the partners. Alerts for deadlines will help the researchers to prepare their papers or presentations on time. Once a given partner has sent a submission or received acceptance he will notify the Research Coordinator. Dissemination will be carried out by all participants in shared participation and according to their areas of expertise. It will focus on the research community, content managers and end-users. Different public visibility strategies will be pursued depending on the targets we wish to address. Public scientific visibility will be achieved via presentations at conferences and workshops, and publications in prestigious journals. The Consortium has identified a list of the most relevant conferences in the different fields of partners’ expertise. We will target the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Audio Engineering Society (AES) conventions, the International Conference on Spoken Language Processing (Interspeech— ICSLP), the International Conference on Music Information Retrieval (ISMIR), Digital Audio Effects (DAFx) conference, and the International Computer Music Conference (ICMC). Major target journals include the IEEE Trans. in Signal Processing, Speech and Audio Processing, Pattern Analysis and Machine Intelligence, and in Multimedia (plus related letters); the Computer Music Journal and the Journal for New Music Research. Wherever possible, software will be demonstrated to the scientific community. Notably, new algorithms for music processing will be benchmarked against competing state-of-the-art algorithms via the Music Information Retrieval Evaluation eXchange (MIREX)[24]. Public industrial visibility will be achieved through professional fairs and exhibitions of audio-related and digital library-related products and services. These include exhibitions at the AES Conventions and demonstrations at ACM Multimedia. The project and its outcomes will also be presented to company representatives, venture capitalists, and other key people who may be instrumental for moving the project towards the business arena. Dissemination and deployment to a variety of user communities will be done via the networks of project partners. These include research communities, the general public and regional communities, and professional users such as musicians and librarians. A thorough plan for dissemination and use will be

Proposal #033902, 27/3/2006 Page 20 of 62 written in the early stages of the project, and will be updated every six months. The document will include past activities, short-term plans (for the next 6 months), and updates to long-term strategic actions. Applied Logic Laboratory will market its products containing speech technologies related to the technology to be developed in the context of EASAIER. ALL is in active dialogue with several possible industrial users about the deployment of its speech related technologoies. ALL is also in dialogue with the Hungarian Association of Content Industry (MATISZ). This connection will also be used to propagate the results of the project to the community of content owners.

Exploitation Strategy Team and Business Plan The EASAIER partners believe that a Business Plan shapes our efforts, indicates a suitable exit strategy and exploitation route for the project, and highlights the strategic impact. It involves business activities for the industrial partners and evaluating the formation of a new company to license core intellectual property and to provide services around the idea of new methods of managing and accessing audio archives. The project Coordination Team together with the Team Managers from each Partner will create an Exploitation Strategy Team to explore ways of exploiting the project outcomes and intellectual property. An initial dissemination and use plan will be devised by month 8 of the project, and will be revised throughout the project, based on inputs from the Exploitation Strategy Team, feedback from the Expert User Advisory Board and market watch and state of the art analysis. This will form the basis of the business plan for which an initial draft will be available month 24 and a final plan in place at the end of the project (see Work Package 8). The Exploitation Strategy Team will be in charge of writing a Business Plan which will include:

• Executive summary describing the basic exploitation ideas • Description of the product & services to be exploited • Description of the team in charge of the exploitation • Report on the market: structure, segmentation, competition, etc. • Marketing plan, including business model and distribution strategy (open source models will

deserve particular attention because many audio archives belong to nonprofit or governmental institutions)

• Risk assessment, including a critical review of assumptions and outstanding issues • Financial plan and scenario analysis

6.3 Raising Public Participation and Awareness

Public Website At kick-off the Consortium will acquire a specific and catchy domain that clearly will identify the project (i.e www.easieraccess.com). This website will be the focal point for intensive dissemination of knowledge, contents, and activities. The public website may contain, among other:

• A project description • A project consortium description (with links and other relevant information) • All public project documentation • A discussion forum for all the EASAIER community (including project partners, interested members of

the research community and potential trial users of the project deliverables) • A news repository that will give access not only to project news but also to news from over the world

concerned with digital sound archives; a news alert service will be studied to be included there too. • Detailed descriptions of technical results and scientific achievements. • A special section will periodically offer non-technical explanations of the goals and achievements of the

project, for the purpose of divulging scientific results. • Links to related projects, such as PRESTOSPACE, MULTIMATCH and ETHNOARC, with appropriate

explanation in order to put EASAIER in the broader context of work in this field. Education issues The partners who are related to public education institutions will offer graduate courses (for PhD and Master students) about subjects that are tightly connected with the problems and technologies aimed at by the project. Offering highly focused short courses (industry or academic-oriented) will be considered as the

http://www.semanticaudio.com/

Proposal #033902, 27/3/2006 Page 21 of 62 project matures. Enrolled students will be part of the workforce making possible further project exploitation initiatives (spin-offs, continuing research projects, industry-financed narrow-focus projects, etc.).

EASAIER Multimedia promotion kit We plan to create a promotional CD-ROM that will be delivered in professional, scientific and popular meetings (AES conventions, MIDEM, MusicMesse, Sonar festival, JCDL, etc.). It will contain an interactive presentation of the project, prototypes, and other elements also downloadable from the project’s website. The kit is scheduled by month 24, when the project is expected to be well-specified and first outstanding outcomes are available.

7 Detailed implementation plan 7.1 Introduction – General Description and Milestones

QMUL (UK) will coordinate WP1:Management They have the technical competence to ensure the different work package activities are properly co-coordinated. QMUL will also be responsible for the collection of content. This will be gathered from all partners, as well as from content providers who have expressed an interest in EASAIER, i.e., the Expert User Advisory Board. At the core of the EASAIER project is creation of common archives through integration of existing digital media with consistent ontologies. For this we will use the model provided by the Open Source project Greenstone[25],1 a suite of software for building and distributing digital library collections. However, though extensible, this framework does not provide methods for creation of descriptors or access methods beyond text-based retrieval. LFUI (Austria) will lead WP2: Media Semantics and Ontologies. They are the ideal partner for this given that their Digital Enterprise Research Institute is a world leader in Semantic Web research and technology. The semantic structures will form the basis of the descriptive information used for sharing and management of the multimedia objects within the archives. The archives will be enriched through the use of multiple, versatile retrieval systems(WP3), new methods of sound object representation (WP4), and improved access tools (WP5). Here, we will build on previous work by the partners, adapting the tools for integration, robustness and portability. WP3: Retrieval Systems, coordinated by ALL (Hungary), involves construction and integration of 3 retrieval systems; one emphasizing musical similarity, one based on speech recognition and a final cross-media retrieval system. WP4: Sound Object Representation will be coordinated by NICE (Israel). This WP is concerned with ways to extract and identify individual sound objects, using separation and segmentation of speech and music, as well as transcription of segments to text or musical notation respectively. DIT (Ireland) will be primarily responsible for providing online audio-processing tools and will coordinate WP5: Enriched Access Tools. These tools will allow for modifications to retrieved materials, such as time-stretching or source separation, and looping and mark-up of audio excerpts. They will allow the end-user to creatively interact with multimedia resources in new and innovative ways. Each of the tools and systems developed in WP3-5 will be given intelligent interfaces (WP6) such that they may be evaluated in end-user studies(WP7) and deployed in real world sound archives (WP8). SILOGIC (France) will provide visualisations and user interfaces, as well as integration of individual components into full systems. They are the leaders for WP6:Intelligent Interfaces. Interface design will be built around inputs from Work Packages 2 to 5, with recommendations from WP7. An example interface is depicted in Figure 2. It demonstrates how multiple media can be retrieved, with metadata annotated, sound objects recognised, and interactive access tools provided. It is expected that one of the outcomes of the project will be a deployed access system with an interface similar to that given in this figure. The technical work throughout the project may be seen as being reactive. That is, user needs have already been identified and the component work packages have been designed to address those needs, e.g. content-based retrieval, identification and manipulation of sounds and integration of other media. RSAMD (UK) will lead WP7: Evaluation and Benchmarking. WP7 involves benchmarking, evaluation and user needs studies. For this, we can exploit the groundwork already performed on user needs, and use the subjective evaluation models that have been successfully applied in recent projects.

1 www.greenstone.org

Proposal #033902, 27/3/2006 Page 22 of 62 Finally, QMUL (UK) will lead WP8: Dissemination and Exploitation. This WP involves dissemination, promotion and extension. The extension stage is especially important since it will guarantee that the tools we have created may be more generally applied to other collections. As project coordinators, QMUL are well poised to ensure successful dissemination and awareness, and have a strongest incentive to guarantee further exploitation. Final stages also involve, in conjunction with the user needs studies of WP7, deployment of the tools in real world archives, such as those maintained by members of the Expert User Advisory Board.

Figure 2. An example intelligent interface for accessing sound archives.

Major Milestones In WP1, key milestones will be the deployment of a private collaborative and administrative website in month 1 (M1.1), the revision of the project workplan at month 12 (M1.2), the internal assessments and collection of components at month 18 (M1.5), the creation of an initial business plan in month 24 (M1.4), and the issuing of public and private reports at the end of the project, month 30 (M1.5). WP2 has semi-annual milestones, aligned with the project status reports. The requirements will be established by month 6 (M2.1), the ontology language will be devised by month 12 (M2.2), the infrastructure constructed by month 18 (M2.3). This WP ends month 24, when a working ontology environment will have been implemented (M2.4). In WP3, the speech and music retrieval systems, with vocal query support, will be established by Month 20 (M3.3). Previous milestones at months 8 and 14 are simply draft versions of the retrieval systems. By month 26 the cross-media retrieval system will also be fully-functional (M3.4). The speech/music discrimination component of WP4 will be implemented by month 6 (M4.1). This is developed at an early stage because it is a necessary precursor to all sound object representation. Initial audio object segmentation and separation will be developed by month 10 (M4.2), and work will continue on this throughout the WP. This allows us to work on the transcription of the individual audio objects. Transcription of speech will be presented in month 18 (M4.3), and of music in month 24 (M4.4). Testing and evaluation results on transcription will be available by month 28 (M4.5).

Proposal #033902, 27/3/2006 Page 23 of 62 By month 12, a minimal version (looping and marking) of the access system to be developed in WP5 will be operational (M5.1). This will be enhanced throughout the work package to incorporate pitch shifting/time scaling, multimedia alignment, and video synchronization. A prototype with interfaces, will be available by month 18 (M5.2), which by month 24 then be refined for input to WP6 (M5.3). By month 30, the tools will be fully integrated with those from WP3 to provide enriched access to multiple retrieved media. Mock Graphical User Interfaces will be devised in WP6 by Month 7 (M6.1) and distributed to all partners for comment. This will then be used to develop an Audio Intelligent Interface by Month 21 (M6.2). An interface for alignment of audio with multimedia resources will be constructed by month 26 (M6.3) and, using output from WP3, support for the cross-media retrieval system will be built into the interface by month 29 (M6.4). In WP7, the Expert User Advisory Board will be established by month 3 (M7.1). Test users will be recruited and annotated sound archives established by month 4 (M7.2), and the initial user requirements will have been gathered from other work packages by month 8 (M7.3). By month 19, the evaluation tools and protocols will be specified, designed and piloted (M7.4). Finally by month 24 the evaluations with user groups will have concluded and final feedback assessment begins (M7.5). The final evaluation report will be delivered at the end of the project, month 30 (M7.6). For WP8, the initial dissemination and use plan will be developed by month 8 (M8.1), and the Exploitation Strategy Team will be formed in month 12 (M8.2). The dissemination and use plan will be revised, based on the IPR Management Strategy and feedback from the Expert User Advisory Board by month 18 (M8.3). A market watch and state of the art analysis will be performed in month 21 (M8.4). By month 24, deployment of tools in external sound archives will have begun, a promotional kit will be available, and an initial business plan will have been developed (M8.5). In what follows, we describe in full the tasks associated with work packages 2 through 8 (WP1 having been described in detail in Section 6).

WP2 Media Semantics and Ontologies User needs studies have identified web-based access systems as an important requirement for archives[1]. This implies that the archives will need proper representation within the semantic web. The explicit representation of the semantics underlying the archived resources, as well as structured support within the archive for the more general semantics of the web, will enable a knowledge based web that provides a qualitatively new level of service. Automated services will improve in their capacity to assist humans in achieving their goals by “understanding” more of the content on the web, and thus providing more accurate filtering, categorization, and searches of information sources. This process will ultimately lead to an extremely knowledgeable system that features various specialized reasoning services. The backbone technology for this semantic web is ontologies. Ontologies provide a shared understanding of certain domains that can be communicated between people and application systems. Ontologies are formal structures supporting knowledge sharing and reuse. They can be used to represent explicitly the semantics of structured and semi-structured information enabling sophisticated automatic support for acquiring, maintaining, and accessing information. As this is at the centre of recent problems in knowledge management, enterprise application integration, and e-commerce increasing interest in ontologies is not surprising. The research tasks in this workpackage will focus on finding suitable representation techniques for the efficient and expressive representation of semantic aspects of audio components. In particular, we will apply research in the design of ontology languages that overcome limitations of the OWL family of languages for the rich representation of media semantics. A special focus will be on alignment of the ontology language and the computational costs of full reasoning support, because popular ontology languages require complex Description Logic reasoners that do not scale well. Furthermore, tools should be provided for the efficient metadata and data storage and retrieval. WSMO is the current significant European effort on modeling ontology (WSMF), designing ontological language (WSML) and providing running environments (WSMX) for Semantic Web Services mainly driven by large EU funded Integrated Projects. The main features of WSMO – simplicity (a solution to the integration problem that is as simple as possible), completeness (solves all aspects of the integration problem), executability (a set of execution semantics exists as well as a reference implementation) - should provide a world-wide standard to smooth the process of information integration. Here we will adopt WSMO in this project in order to achieve a proper ontological language to describe the media semantics.

Proposal #033902, 27/3/2006 Page 24 of 62 Task 2.1. Ontology and semantics for media object representation To fulfill the aims of the EASIAER consortium, the content of the sound archives is required to be enhanced with explicit semantic descriptions. However, semantic metadata is useful only if its nature is clearly understood, and only when its structure and usage are well defined. For this purpose, ontologies are needed to capture the essential characteristics of the content domain into a limited set of meaningful concepts. During Task 1 in WP2, the existing semantic description methods and technologies, such as the Extensible MPEG-4 Textual Format (XMT) or Description Definition Language (DDL) of MPEG-7, will be analyzed. The exact requirements for the representation of the semantics will be captured. During the requirement analysis, skills of consortium partners with experience on semantic representation (ALL, LFUI) and skills of consortium partners with experience on media objects and on their representation (QMUL) are both needed in order to precisely identify the requirements. Knowledge about multimedia content domains has to be included into techniques that capture objects through automatic parsing of multimedia content. During Task 2.1 , ontology-based semantic descriptions of music and speech will be generated based on appropriately defined rules that associate low-level features of standards, such as MPEG-7, to the concepts included in the ontologies. It will consist of an audio-visual ontology in compliance with the MPEG-7 specifications and corresponding domain ontologies. We will provide a knowledge infrastructure design focusing both on multimedia related ontologies and domain specific structures. A general ontology infrastructure will be designed, including a description of a tool to assist the annotation process needed for initializing the knowledge base with descriptor instances of domain concepts. The hybrid nature of multimedia data will be reflected in the ontology architecture to developed. We will enrich the knowledge base with instances of domain concepts that serve as prototypes for the necessary concepts. During the work, an Audio Annotation Ontology will be created with emphasis on the handling of these instances of domain concepts. Each of these instances will then be linked to the appropriate audial descriptor instances. The approach that we will use is thus pragmatic, easily extensible and conceptually clean. Task 2.2 Ontology management environment A common scheme for annotating temporal objects is to establish a list of people, places, events, topics or objects that occur in the music or speech, and indicate the temporal intervals in which each is present. This scheme will be adapted here by the introduction of stratified annotations, as a list of concepts and their intervals form independent strata or layers of meaning. The complete description of the object at any instant is then found by taking the cross-section of the strata at that point in time. By organising the concepts and their strata in ontologies, we will further improve annotation of temporal objects. The proposed ontology language will utilise the hierarchical organisation to create a visual representation of the annotations where the strata belonging to the subtree of a concept can be aggregated, thus hiding the subtree without losing much information. This makes it easier to handle large sets of concepts, and gives the user control over the level of annotation details. Similarly, this language will take advantage of the ontology semantics when performing queries. In this task a set of coordinated technologies and procedures for media management will be adopted that allow the efficient storage, retrieval, reuse and re-purpose of the digital data such as text, music and speech files. It aims to maximize the value of these assets by facilitating easy storage and retrieval while protecting and, at times, enhancing their utility. The set of considered technologies is critical infrastructure software that allows organizations to manage large collections of diverse content centrally as well as prepare and stage that content for delivery to any distribution channel. Media management

• creates reusable content that can support both short- and long-term use; • ensures effective management of assets to maximize efficiency, productivity and profitability; • protects the integrity of data (storage and transmission requirements); • ensures the persistence of data (archiving); • enables ownership control (rights management) and security; and • ensures the authenticity and integrity of documents.

The ontology management support will use the Ontology Representation and Data Integration (ORDI) Framework for storing, querying, mapping and versioning the ontological media data (see Figure 3). The ontologies will add explicit and machine processable semantics to the media data, bridging the gap between

Proposal #033902, 27/3/2006 Page 25 of 62 the different terminology used by end users and information retrieval systems. This will support a scalable infrastructure for ontology editing, browsing, merging, aligning, and versioning.

Figure 3. General architecture of the ontology management system

The media management will support the association of knowledge with data from “media specific” to “domain specific” metadata. The technology will support mapping concepts in domain ontologies to schema metadata elements, and mapping of concepts in domain ontologies to music and/or speech patterns. The technology will be augmented by a knowledge repository, together with a set of processes and services to support media object related data creation, refinement, indexing, dissemination and evolution. Media management will also be appropriate to integrate and aggregate data. Media ontology data repositories will be built by merging objects from one or more sources, including other repositories and data sources.

WP3 Retrieval Systems

Task 3.1: Music Retrieval Music Retrieval involves searching and organising music collections according to their relevance to specific queries. It poses numerous challenges including the choice of an adequate representation for the audio in the query and music collection. Systems based on low-level acoustic similarity measures have shown great success in detecting exact one-to-one correspondence between the audio query and a recording in the database, but disregard any correlation to musical characteristics. High-level representations, such as those based on transcription, emphasise the musical similarities between recordings but are constrained by the type of instrumentation and music that can be analysed. Mid-level representations of music are measures obtained by the process of transforming the audio signal into a highly sub-sampled function that characterizes the attributes of musical constructs in the original signal. This is the key process in a wide class of musical signal processing algorithms (e.g. onset and pitch detection, tempo and chord estimation). Mid-level representations, which we will use, characterise the rhythmic structure of a piece, but without constraints imposed by the rules of music notation. Task 3.2: Speech Retrieval Technologies in development at Applied Logic Laboratory will allow efficient search in large multimedia databases containing spoken and vocal data. The technologies are based on an effective speech indexing system which allows written or spoken queries, without the resource-intensive manual transcription of the speech content. The speech retrieval technology works in 2 phases: during the offline indexing phase it builds an index of the speech content. This is done using a phenome-based speech recognition system. The speech content is transformed into phenome lattices, which are compact probabilistic representations of the possibly uttered

Proposal #033902, 27/3/2006 Page 26 of 62 sentences. Special lookup trees are then buit for these lattices in order to reduce look-up time during queries. This indexing makes very large databases manageable. During the online query phase, the system receives written or spoken queries. It then serves the parts of the indexed multimedia data it deems most relevant to the text query. To achieve this, it first normalises incoming text, turns it into a phenome string, looks up the index trees to narrow the search, and then uses a novel dynamic programming algorithm to find and score the most relevant documents or excerpts. Scoring is based on phenome string similarity, so misspelled words or alternative spellings can still yield relevant hits. The technology being developed is intended to be language neutral. It can quickly and easily be applied to any language. Task 3.3: Cross-Media Information Retrieval The JISC User Requirements Study for a Moving Pictures and Sound Portal[2] identified the need to have “cross-searching between still and time-based collections.” Furthermore, the HOTBED study identified the use of video as having a strong impact in aural learning. Thus, it is clear that this collection must incorporate video and other media as well as audio, and provide significant interaction between the media. For this reason, we will implement a cross-media information retrieval system. Such a system would allow the user to enter a piece of media as a query and retrieve a different type of media as a related document. This combines feature extraction techniques, metadata, and optimised multidimensional search methods. This task represents a worthwhile endeavour both as a useful application and as an advance in frontier research. The key technological challenges and solutions are listed below.

Feature extraction involves the analysis of a file, or portion of a file, to extract a small set of quantifiable features which represent the most relevant properties of the media. The benefit of extracting these features is that a set of features is much easier to compare, analyse and manipulate than the huge amount of information in the media file. Since we wish for the retrieval system to be able to associate media files in different ways, a variety of different feature sets will be created. By providing several feature sets that can be combined in different ways, we can search the database using different similarity measures.

To support content-based cross-media queries, it is necessary to provide a means whereby one can say that documents of different media can be considered similar. As an example, an audio recording of a person speaking and a photo of that person are related, yet this information is not revealed through the use of feature extraction. To relate such documents, the metadata must be linked and ranked, so that an obscure relation will be ranked lower than an obvious one. We thus define a similarity measure that utilises the metadata. This measure can be used in series or in parallel with the feature based measure, as given in Figure 4.

For combined use of metadata and features, the archive must support several radically different types of internal search methods. Relationships based on features require multidimensional similarity searching and indexing. Despite extensive literature, it remains to be seen which index is most suitable, and what modifications it would require. Furthermore, metadata gives rise to complex relationships between documents. Ranked linkages of metadata connections may give rise to nonmetric relationships. Thus the appropriate way to search the metadata remains a challenging task. Graph theory and small world networks are applicable to this problem.

Computational costs are incurred in several different places. Since this system has feature extraction on the query and a multidimensional search on the data, large query documents and large databases can both result in an excessive retrieval times. Thus, optimisation is necessary. We will investigate several schemes with the goal of minimising the time it takes to construct the database (feature extraction, metadata creation, index construction) and the time it takes to retrieve documents (query feature extraction, metadata and feature search, ordering and presentation of results).

Design, interface and presentation are important concerns when one considers that a goal of providing a cross-media retrieval system is to uncover previously unknown relationships between query documents and documents in the database. Since metadata information is already incorporated into the database, the results should be presented in a structured manner. A relevant presentation should reveal, for instance, that an audio clip is related to the audio stream from a certain video, with these related images, and to another audio clip that relates to a certain subject. The choices for how to present the retrieved results are numerous, and user testing is necessary to determine the most effective approach.

Proposal #033902, 27/3/2006 Page 27 of 62

QueryDocument

QueryFeatures

FeatureExtraction

FeatureSearch

MetadataSearch

MetadataSearch

Retrieved Documents

AB

.

.

.

CombinedSimilarity Ranking

Retrieved Documents

AC

D...B

.

.

.

Retrieved Documents

BD...

Retrieved Documents

AC...

Figure 4. How a cross-media query can be performed using a combination of a feature similarity measure and a metadata similarity measure.

Task 3.4: Vocal query interfaces The development of a vocal query interface to the retrieval systems will enable voice-initiated media retrieval. The Vocal Query Subsystem is responsible for producing the phonematization of the spoken query. Once the phoneme representation of a query is given, the phonematized query is used in the retrieval procedures (described in Task 2) to find the best matches. To produce the phonematization of the query, the Vocal Query subsystem will align the vocal query and use the most relevant phonematizations of the result as the phoneme string to look up for in the index trees. The alignment algorithm will find the most probable sequences of phoneme HMM's given the low-level feature representation of the query. The alignment of the vocal query will use the phoneme HMM's trained for the cluster of the questioner. The result will then be scored by combining the score of the phonematization and the answer. Once the phonematization is given, the processing is the same as for text queries. The phonematization of the vocal query involves the speaker adaptation module. The sentence boundaries in the query are marked by the same sentence boundary detector described in the Task 2.

WP4 Sound Object Representation This work package deals with how the raw data, audio, is represented at different levels of abstraction for the purposes of efficient querying. Raw data in itself is often to large and difficult to index and search in an efficient manner. For this reason, an efficient way to represent sound objects must be addressed. For example, if we convert speech based audio into rich text, we now have a highly compressed representation of the sound object which can be retrieved using standard string searching techniques. The audio content dealt with in EASAIER will fall into two main categories; speech (telephone conversations, broadcast recordings, etc…) and music. NICE and ALL will deal with the speech content and DIT in association with QMUL will deal with the music content. The first step will be to automatically segment the speech and music material since archival recordings will generally contain both. Music is highly complex by nature but simple in essence. Every piece of music is effectively a collection of notes played in a particular sequence by a collection of musicians. This simple statement already provides us

Proposal #033902, 27/3/2006 Page 28 of 62 with several possible sound objects. An Ensemble Object would simply be a list of the instruments which appear in the piece of music and a Melody Object would be an ordered list of notes and chords which appear the lead melody line. It should be seen that these objects will greatly aid at the indexing and searching stage. A possible query might require that only specific instrument combinations are returned. This is not a trivial task. In order to achieve this valuable data, a machine learning technique for extracting each musical instrument from the audio mixture signal is required. Task 4.1: Audio Stream Segmentation Task 1 is concerned with segmenting audio into its macroscopic elements such as silence, speech, or music, and further into classifications such as speaker for speech, or verse and chorus for music. This is an extremely compact description of the global content of sound objects. Representing the audio at these various levels of abstraction are of use both at the indexing stage and also at the querying stage. This information combined with the metadata generated in task 3 of WP 3 will provide a robust set of high level sound objects which concisely represent the underlying audio data. Algorithms developed in this work package will also form the basis of some of the Enriched Access Tools which will be developed in WP5. A simple yet powerful algorithm for speech/non-speech segmentation would work by aligning the sequences of Hidden Markov Models trained for speech and non-speech audio data respectively. Since sentences being the basic structural elements of language, further segmentation into sentences and phrases will be performed, Reliable and robust sentence segmentation of speech also highly improves summarization, indexing and retrieval of speech data. Algorithms for sentence segmentation without continuous speech recognition will be developed by ALL. The segmentation algorithms can be used for the classification of different kinds of audio segments. A universal 'none' HMM along with HMMs for each defined sound object (phrase, instrument or note sound, a crash, etc…) can be trained and then used for the segmentation. Task 4.2: Sound Source Separation Sound source separation refers to the task of taking a signal(s) containing a mixture of many different sources and extracting each of these individual sources from the mixture. DIT has developed various state of the art algorithms capable of varying degrees of sound source separation. These algorithms currently depend greatly on the audio format and on the user defined parameters as well as the content itself. Task 2 is to adapt and extend current techniques such that the separation can be carried out in an automated and robust fashion independent of the format, musical content and user control. The successful completion of this task is a prerequisite for Task 3. DIT will also work closely with NICE to adapt a particular separation algorithm for the purposes of separating speech from non-speech signals such as noise or music. This will greatly improve the ability of the speech transcription algorithms employed. Task 4.3: Sound Object Identification Task 3 is concerned with automatic recognition and labeling of sound sources (speakers, musical instruments, artefacts). Task 3 will take the independent sources as generated by Task 2 and use prior knowledge by way of a sound source database to identify the types of sources being used within an audio stream. For this task, a robust method of creating a condensed sound object fingerprint is necessary. Unique ways of characterising specific timbres will be assessed resulting in an audio source code book. The use of source identification is one of the key components which enables the enriched access tools and intelligent interfaces of WPs 5 and 6, as depicted in Figure 5.

Proposal #033902, 27/3/2006 Page 29 of 62

Figure 5. Overview of Sound Object Representation and its relationship to other components.

With the help of speaker identification technology, the spoken content of the media can be enriched with metadata on the speaker. The task of speaker identification will be done by dynamically adapting the indexing algorithms to the speaker. The most promising state-of-the-art speaker adaptation methods of speaker selection training will be utilized. Given a set of registered speakers (target-set) and a sample utterance (test), open-set speaker identification (OSI) is defined as a twofold problem, as depicted in Figure 6. Firstly, it is required to identify the speaker model in the set, which best matches the test utterance. Secondly, it must be determined whether the test utterance has actually been produced by the speaker associated with the best-matched model, or by unknown speaker outside the target-set.

Target model

#1

Target model #2

Target model #N

Input: Speech of unknown-speaker (test)

scores

Target-set models

Scores alignment

Threshold decision

Threshold

Top score/s + target/s identity

Target/s identity or rejection

Figure 6. Speaker identification

Task 4.4: Transcription Task 4 is concerned with transcription of spoken or musical content. For music it involves extracting pitch and rhythm information from the musical content. A typical music transcription is that of a musical score or a standard midi-file but other useful forms of musical transcription also exist. For speech, transcription is usually annotated text. However, there are numerous commercial tools which accomplish this task. Hence the speech transcription will focus on extraction of meaningful data about the type and importance of the spoken text. Together, the speech and music components of this task will focus on producing representations which are maximally informative and contain as little redundancy as possible. Music transcription achieves very high accuracy with monophonic music but is particularly difficult for polyphonic music, where more than one note is played at a time. It is akin to speech transcription where multiple speakers overlap in time. However, the past few years have seen advances in chord recognition, onset detection, key recognition, and extraction of harmonic and melodic contours, to name just a few(see

Proposal #033902, 27/3/2006 Page 30 of 62 [26] and references therein for a synopsis of the state of the art). Application of these advancements in music processing will aid in the production of more accurate transcription systems. Both DIT and QMUL have extensive experience with music transcription systems and will work to produce a robust system which is capable of monophonic and polyphonic transcription independent of musical instrument or style. This will also be facilitated by the sound source separation techniques developed in Task 1 of this work package. Though speech transcription is well-established, issues remain in the transcription of audio in large archives where many non-speech sources also exist. NICE will address these issues through appropriate identification of speech characteristics. The application of data mining and classification techniques allows for a sophisticated annotation of audio. In effect, speech may be converted into a rich text format where the text has been annotated with metadata describing the subject matter, the level of importance, the emotive content, and so on. This allows for the end-user to interact with spoken content in exciting new ways, for instance, by focusing on the most stressed speech. Classification of speech interactions demands a considerable investment of human resources. Automatic methods of speech classification exist but are very limited. Most of these methods are based on word spotting engines, where a word list is assigned a-priori according to some prior experience. However, the topic is not always characterized by a well defined word list and a dynamic environment needs continuous word finding refinement procedure. Figure 7 shows innovative text analytics solution based on Large Vocabulary Conversational Speech Recognition, or LVCSR. In this example we want to characterise the emotional intensity in each spoken segment accessed from a sound archive. The goal of the proposed apparatus is to classify each segment as low, medium or high level intensity. The proposed classification algorithm is supervised, thus we need to assign training samples for each topic. This should be done using human validation, since the classification decision is very much dependent on the match level between the training set and the corresponding topics. In this example, the significance of training data can be determined by implementing a user feedback score. Once we have assigned sufficient audio samples per topic, a Speech to Text engine is applied. The output results are noisy text files, which are used as input to the classification training block in the classification engine module. In this method the classifier builds a specific model for each topic. In the testing (operational) phase, a new spoken sample is forwarded as input to the LVSCR engine. The LVCSR engine’s noisy text output is then taken as input to the classifier engine. The output of the classifier module is a vector of scores; each component assigned a likelihood matching to the defined topics. The topic decision is made according to some scoring technique, e.g. maximum score.

Statistical pattern classifier

Speech to T ext

P re- p rocessing

D etection D ecision

M em ory

T esting

H igh M odel

M ed . M odel

Lo w m odel

K ey phrases

Irrelevant content

A nnotate sam ple data

T raining

A ssign training data to categories

S trong em otions

LV C SR Speech to T ext

M ed. H igh Lo w

N ew audio

Figure 7. Text analytics system based on LVSCR.

Proposal #033902, 27/3/2006 Page 31 of 62 WP5 Enriched Access Tools WP5 is concerned with the construction of a set of tools enabling a more enriched experience when accessing the media in the archive. The tools we provide will allow the user to manipulate the audio content in a number of useful ways which will enhance both the inherent educational and entertainment value of the media content. User group studies undertaken during the HOTBED project indicated the tools that users believed would be most desirable for such an application. The four Tasks of this Workpackage, together with the sound object representations of WP4 and the interfaces of WP6, have been designed to provide tools answering the user demands. Some of the tools provided for enriched access will be direct decedents of those developed in WP 4 such as the Sound Source Separation tool which will be employed here to allow the user to selectively listen to any source within the recording. This source may then be used to generate a new query. Task 5.1 Time-scale Modification / Pitch-scale Modification Task 1 involves the development of efficient and robust algorithms to perform both time and pitch scale modification. Time-scale modification (TSM) of audio allows the playback rate of the content to be slowed or speeded to any desired speed without affecting the local pitch content. This is of great importance where the intelligibility of the audio is less than adequate. Time domain approaches produce the best quality for speech applications whereas frequency domain approaches are favoured for music applications, and the output quality for each is heavily dependent on the user parameters. Robustness is clearly an issue here. We propose a metadata-driven time scaling system which incorporates both approaches. The parameters and algorithm used will be automatically defined by the nature of the audio signal as described by the metadata and further analysis. In this way, the system is adaptive and independent of the audio content. The task will involve optimising parameter sets for various signals ranging from speech to music ensembles. Pitch scale modification then will allow the user to change the key signature of a piece of music at access time. Similar technology is employed for this task and so will be developed in tandem with the TSM algorithms.

Task 5.2 Looping and Marking Task 2 involves developing methods to automatically mark up a piece of music. This is intimately related with task 4 in WP4, Segmentation. The audio must be time indexed such that a user can skip to the beginning of any section of a recording. Furthermore a provision will be made so that any section may be looped or repeated seamlessly. Any other sound objects or media such as video will also be synchronised to further enrich the experience. The methods for identifying the audio sections will be similar to those used in WP4, and used here as an ease of access tool. The diagram below depicts the rough functionality of the looping and mark-up tool as applied to a musical piece.

Figure 8. Looping and mark-up tool mock up

Task 5.3 Multimedia Alignment and Enrichment Audio archives contain many different representations of musical content which may be broadly divided into three classes. Musical scores contain elements which primarily represent the sound such as the order of the notes, their pitch, duration etc. Within this project, MusicXML is the most interesting format to use for Score oriented content. It is an open standardized format for which conversion tools are available to commercial editing environments such as Finale and Sibelius. However, score information is too abstract for use as a generater of a high-quality performance. This is because a musician interprets the score and adds additional information to it. Event oriented content such as MIDI encodes the consecutive musical events of an instrument. MIDI events are defined on a time grid defined by the number of frames per second and the number of ticks per frame. By choosing this grid very small the timing of the events can be considered continuous. The mapping between the score timescale and event time scale captures an important part of the interpretation of the musician. The

Proposal #033902, 27/3/2006 Page 32 of 62 dynamics of each individual note is encoded in the MIDI file while this information is not available from a score representation. These formats do not represent the performances accurately for instruments on which continuous control is exerted such as brass and woodwind instruments. Continuously controlled instruments have important variations in loudness, pitch and timbre within a note which cannot be captured by a score or an event based format. The goal here is to combine scores and their performances to enable alignment of all representations of music in an archive as visualized in Figure 9.

Musical Content

Score Oriented Files Event Oriented Files Audio Oriented Files

Alignment Files

.wav.mid.xml

.xml

Figure 9. An enriched audio archive.

Task 5.4 Sound enhancement Archived material often suffers from artefacts which considerably degrade the quality and intelligibility of the audio material. Audio restoration is a specialised task and beyond the ability of the average user so much of the archived material will still suffer from these artefacts. We propose a multipurpose noise reduction and sound enhancement system which will allow the user to improve the playback quality at access time. In this way the integrity of the original recording is never breeched as the access time processing is user specific. The parameter sets used in such algorithms are often outside the scope of common knowledge. Hence audio quality assessment measures will be developed after which the most suitable forms of sound enhancement may be chosen automatically. The tools will be adapted for beginner and expert use. Because of the specialised nature of the tools, it is unfeasible to expect the average user to have sufficient knowledge of the optimal parameter settings for different audio content. For this reason, a significant part of each task will involve automating the processes. This will be achieved by explicitly referencing the associated metadata generated in WP4 so as to ascertain the nature of the audio to be processed. In “real life” audio recordings there are many additive and convolutional signals that degrade the signal quality. The performance of audio analysis algorithms is degraded as a result of low quality signals. The robustness of each audio analysis algorithm is affected differently by the different aspects of the signals quality. This task will provide relevant scalar measures for the quality of the audio signal. The main aim is to analyze the various quality aspects of the signal and assign a scalar measure to each aspect. The quality measures will be used by the different audio analysis algorithms in order to identify possible errors, set parameters and enhance performance. Scalar measures for quality parameters such as signal to noise ratio, echo level, and Mean Opinion Scoure (MOS) will be given. Those measures will be the basis for audio analysis algorithms performance estimation. One of the main technological challenges in this project is to perform an experiment that comprises the various audio quality aspects in order to predict the algorithm performance. The ability to predict the algorithm performance leads directly to the ability to improve the performance which is the ultimate goal of this project.

Proposal #033902, 27/3/2006 Page 33 of 62 A set of rules (rules engine) will be developed for each algorithm. The rules will determine the action that needs to be taken based on the quality measures. For instance, an algorithm with high sensitivity to echo might decide not to process high echo signals. The following are some examples of the different sets of rules based on the quality measures:

• The algorithm can decide whether to perform or not perform the analysis – for instance the decision not to perform the analysis can be taken in case of low quality in the relevant parameters.

• The accuracy estimation of the algorithm results could change on the basis of the quality parameters. • A decision to disqualify a specific result can be made. • A decision to re-analyze the signal can be made. • A decision to enhance the signal can be made.

It is often the case that the Sound Object Recognition algorithms proposed in WP4 struggle to accurately identify deteriorated speech signals. These noise reduction algorithms will also help as a pre-process for the speech recognition tasks. Figure 10 below illustrates the flow of the signal quality measurement (quality evaluator) through the decision stage to the main analysis algorithm.

Quality Evaluator Audio Classifier

Decision (per algorithm)

Enhance signal Enable AlgorithmAnalysis

Disable AlgorithmAnalysis

Audio AnalysisPreprocessor

Pre-Analysis Rules Engines

AudioSignal

Source Separation Segmentation Transcription

Main Audio Analysis Algorithms

...

Analysis Results

To Post-Analysis Stage Figure 10. Signal quality measures in Sound Object Representation and Enriched Access Tools.

WP6 Intelligent Interfaces The guiding vision of this Work Package, as lead by SILOGIC, is to develop novel types of computer interfaces that provide new ways of presenting, understanding, experiencing, and also shaping audio and related media resources in creative ways. These interfaces will provide intuitive access through novel visualization paradigms and new methods for interactive manipulation and control of audio streams and recordings – for instance, by visualizing aspects of the expressive performances of great artists via computer animations, or by permitting a user to interactively play with and modify a given recording according to his/her taste. The technical goals of this workpackage are to perform the basic research needed to develop the methodological basis. That includes research on intelligent musical structure recognition algorithms, new visualization paradigms, and methods for (real-time) computer-based interaction with sound sources. This will be achieved by inter-disciplinary research that combines knowledge from fields like computer science, Artificial Intelligence, pattern recognition, visualisation, and musicology.

Task 6.1: Operational Interface Design Based on the previous end-users needs studies and exploiting the partners’ background with existing tools, the specification for the suitable operational interface will be realised. The interface is built step by step throughout the project. Based on the open architecture previously validated, the interface will evolve while integrating the partners’ modules and presentation and visualisation paradigms. Each specialist team will set

Proposal #033902, 27/3/2006 Page 34 of 62 up its modules, corresponding to EASAIER workpackages, which are integrated and evaluated. The objective is to capitalize on the added value of each result in a complete and efficient interface. An open architecture adequate for the future implementation of all the necessary tools, offering format compatibilities and future extension will be designed. Inter-operability and multi-media processing will provide the basis for the design.. Each evolution will be diffused frequently to all partners. The intelligent interface will serve the EASAIER dissemination plan. During the project, feedback from partners and end-users concerning the interface will be collected in order to reach the final objective, thus strongly improving audio resources and people interactions. Milestones and deliverables will allow reviews of progress of the operational interface. SILOGIC will collaborate with research centres for the specifications and the design of the EASAIER intelligent interface. This first task will produce a GUI design report. Detailed attention will also be paid to the standards. The EASAIER interface will offer a large among of functions in an intuitive manner. This will enable access and understanding. These include resource retrieval, navigation, material visualisation & monitoring, segmentation, and annotation.

Task 6.2: Cognitive Interface The guiding vision of this work package is to develop novel types of computer interfaces that provide new ways of presenting, understanding, experiencing, and also shaping audio objects in creative ways. These `interfaces' will provide intuitive access to music and speech through novel visualisation paradigms and new methods for interactive manipulation and control of sound objects and audio recordings. This task will develop audio-visual signal representation, manipulation and management tools allowing users to create, exchange and consume multimedia data easily. Navigation & visualization techniques, following user needs and advanced studies, will be developed and integrated. This will strongly improve usability, effectiveness and accessibility of real-world sound archives. We will explore new concepts for innovative visual representations, considering different music representations: Symbolic, Audio, Visual and Metadata. Using a cognitive approach, EASAIER will offer transparent access to music resources even if music users often differ in their musical knowledge. An appropriate interface should be proposed depending on the user’s profile, allowing a rapid access to the items of interest, audio features and descriptors. This cognitive approach will be applied throughout the project, from the GUI design to each step of the operational intelligent interface development (task 6.1) and for the media retrieval module (task 6.3).

Task 6.3: Cross and Multi-Media interface This particular task focuses on development of a suitable interface for WP3. The integration of the retrieval systems will allow users to retrieve multimedia objects through various means from the same interface.cross-media retrieval. As described, audio resources often need to be linked to associated materials such as video, images, text, transcripts or scores. This work will offer interactive and intuitive representation and tools for users to easily access and manipulate audio and linked material through the same interface. Cross-media interaction will be developed. This includes creation of appropriate linked meta-data and synchronisation of time scaled audio and video. All specific tools will be produced considering the cognitive approach, task 6.3, and incorporated into the operational intelligent interface, task 6.2.

WP7 Evaluation and Benchmarking This workpackage will deal with the evaluation and benchmarking of the software tools developed and other outcomes of this project. It will necessitate close interaction with all partners and co-ordination with other workpackages. Different stages of the project will call for different techniques to be used in the evaluation of the deliverables: from surveys and personal interviews (when eliciting first users’ requirements) to closed-format specific problem-solving scenarios for evaluating the interaction with the prototypes, and also including perceptual-cognitive laboratory experimentation under controlled conditions. Operational procedures for validating and promoting the tools will be devised and tested. For the other work packages, the partners will need access to large collections of audio resources that can be annotated (or are partially annotated), analysed and mined. Archives have already been discussed in Section B.5, and further content is available from the National Library of Scotland, the British Library, the Irish Traditional Music Archive, and other providers who have established links with the consortium. Trial sound archives will be established, with related media, and these will be annotated for appropriate exploitation

Proposal #033902, 27/3/2006 Page 35 of 62 during the project length. Using several different databases enhances the robustness and generalization of our results, and enables us to satisfy a strong user need; integration of disparate but related archives. The identified tasks are:

Task 7.1 Evaluation methodology and metrics This task involves defining a methodology for the users to evaluate (according to usability criteria) the prototypes, and defining criteria and procedure for the recruitment of sample users from a number of potential end-user communities (general public, education and life-long learners, music- or speech–specific). The first step in this task is to establish and organise user-requirements that will drive the project specification and the software functionality. The initial user requirements specification will be based on previous studies, and intended for use in all workpackages. From these and successive refinements of the user requirements, evaluation tools and protocols will be specified, designed and piloted. Working protocols will be established for the evaluation goals, metrics, and procedures, including test and information-gathering protocols, and scheduling and feedback mechanisms. A deliverable on evaluation goals, metrics, and procedures will be issued at the beginning of the project where specific usability criteria will be discussed and defined. Test-user communities will be established, primarily from potential user groups through the networks of all partners and from content providers and managers who have already expressed interest in the EASAIER project. User communities will be defined in terms of likely usage characteristics, interests, and goals. Appropriate user communities will be set up that may act as testers of the prototypes and developments of the project. To this end, an Expert User Advisory Board will be established. The board will consist primarily of content managers, external to the consortium, such as the content providers noted above, and users with a strong interest in the outcomes of this project. This board, in addition to providing initial user needs specifications, will serve as expert evaluators. It should be noted that the partners have twice met with content managers (London, May 23rd, 2005 and Dublin, June 29th, 2005) in order to establish the needs which would be addressed in this project.

Task 7.2 Demonstrator prototypes and testbeds Fast prototyping is necessary for achieving early and successful user-feedback information. Therefore we will develop mock-ups (i.e. fake versions) for the deliverables in order to properly develop the operational modules, refine the needed functionalities for the prototypes, and the usage and interconnection problems they can pose. The demonstrators will present a graphical user interfaces and functionalities, devised according to user requirements. The task will be developed after initial input from content providers and users, and will be used to generate on-site, run-time information to evaluate. Proper learning/annotation/test contents will be supplied and, shared by all partners to be used under controlled conditions, by users when interacting with the demonstration deliverables.

Task 7.3 Testing, validation and feedback assessment This task will be concerned with carrying out the different feedback collection activities, and reporting about this feedback. It involves testing and implementing the evaluation and benchmarking procedures, and the user-feedback tests and studies. Feedback information will be instrumental for writing and reviewing the users’ requirements documentation that will be input to the technical and functional specification. Assessment will be made both by user feedback testing, and by reports collated from information provided by the Expert User Advisory Board.

WP8 Dissemination and Exploitation

Task 8.1 Deployment The goal of this task is to administer and monitor the trial deployment of the developed software tools in sound archives, as well as to measure the performance of the access tools in real world environments. We will develop and deploy prototypes intended to be elements of a larger architecture for sound archive access. The tools will be used in several sound archives, outside the consortium, in order to demonstrate improved usability, effectiveness and accessibility of the sound archive. This is key to both dissemination and evaluation, since it provides independent evaluation, demonstrates proof of concept, and encourages dissemination to the digital archive communities. Through selection of key content providers with a shared

Proposal #033902, 27/3/2006 Page 36 of 62 interest, we will also be able to demonstrate an important objective; the creation of common archives from disparate sources. Finally, we note that this will encourage the adoption of the standards employed in EASAIER. Deployment of proper ontology environments will connect sound archives to the semantic web, which in turn will enable widespread inclusive access. Task 8.2 Dissemination The dissemination and awareness plan has been described in Section B.3. In this task, dissemination will be pursued and monitored. The aim will be to maximize social, scientific, and industrial visibility of the developments, findings and outcomes of the project. Key deliverables for this task are the creation of an EASAIER brochure and a public website in the first month of the project. Further measures of successful dissemination will be the count of academic publications related to the project and the generation of interest by the greater public, as seen by articles and websites mentioning the project. Task 8.3 Exploitation The principal goal of this task is to exploit the outcomes of this project through open source software models, IP licensing and the involvement of (or creation of) SMEs. A secondary goal is to define a strategy for continuation of the research. To this end, an Exploitation Strategy Team will be set up a year into the project. The Exploitation Strategy Team will monitor advances in the state of the art and devise a business plan. The business plan will consider use of knowledge beyond sound archives, including the deployment of audio processing and retrieval tools in other multimedia cultural heritage archives, and in archives of call center recordings .

A market watch analysis will also be produced. This will report on commercial and publicly available systems dealing with access to sound archives and related cultural heritage. Key indicators of success in this task will be the successful development of wide ranging exploitation routes, including both restricted IP licenses and open source software licenses, and the generation of interest by SMEs, possibly from within the consortium.

Proposal #033902, 27/3/2006 Page 37 of 62

7.2 Planning and Timetable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

WP1 Management T 1.1 Consortium administrationT 1.2 Financial management T 1.3 Quality Assurance D1.1T 1.4 Activity planning and reporting D1.2 D1.3WP2 Media Semantics & OntologiesT 2.2 Ontology and semantics for media object representation T 2.3 Ontology management environment D2.1WP3 Retrieval SystemsT 3.1 Music retrieval T 3.2 Speech retrievalT 3.3 Cross-media retrieval D3.1 D3.3T 3.4 Vocal query interface.WP4 Sound Object RepresentationT 4.1 Audio stream segmentation T 4.2 Sound source separationT 4.3 Sound object identification D4.1 D4.3T 4.4 Transcription D4.2WP5 Enriched Access ToolsT 5.1 Looping and marking D5.1T 5.2 Time-scale / pitch-scale modificationT 5.3 Multimedia alignment and enrichment D5.2T 5.4 Sound enhancement and noise reduction D5.3WP6 Intelligent InterfacesT 6.1 Operational interface designT 6.2 Cognitive interfaceT 6.3 Cross & multi-media interface D6.1WP7 Evaluation & BenchmarkingT 7.1 Evaluation methodology and metricsT 7.2 Demonstrator prototypes and testbeds T 7.3 Testing, validation and feedback assessment D7.3WP8 Dissemination & ExploitationT 8.1 Deployment D8.2T 8.2 Dissemination D8.1T 8.3 Exploitation

D7.1

D1.4,5

D3.2

D8.3

Proposal #033902, 27/3/2006 Page 38 of 62

7.3 Graphical Presentation of Work Packages

WP1: Project ManagementWP2: MediaSemantics& Ontologies

WP5: RetrievalSystems

WP3: SoundObjectRepresentation

WP4: EnrichedAccess Tools

Deploy-ment

UserNeedsStudies WP8: Dissemination &

Exploitation

ExploitationStrategy Team

Expert UserAdvisory Board

WP6: Intelligent Interfaces

WP7: Evaluation &Benchmarking

Figure 11. Main components of the project and their basic interaction.

Proposal #033902, 27/3/2006 Page 39 of 62

7.4 Work Package List

Work-package

No2

Workpackage title Lead contractor

No3

Person-months4

Start month5

End month6

Deliverable No7

WP1 Management 1 (QMUL) 24 0 30 D1.1-1.5

WP2 Media Semantics and Ontologies 5 (LFUI) 34 0 24 D2.1

WP3 Retrieval Systems 4 (ALL) 66 0 30 D3.1-3.3

WP4 Sound Object Representation 6 (NICE) 95 0 30 D4.1-4.3

WP5 Enriched Access Tools 2 (DIT) 80 0 30 D5.1-5.3

WP6 Intelligent Interfaces 7 (SILOGIC) 40 3 30 D6.1

WP7 Evaluation & Benchmarking 3 (RSAMD) 26 0 30 D7.1-7.2

WP8 Dissemination & Exploitation 1 (QMUL) 23 0 30 D8.1-8.3

TOTAL 388

2 Workpackage number: WP 1 – WP n. 3 Number of the contractor leading the work in this workpackage. 4 The total number of person-months allocated to each workpackage. 5 Relative start date for the work in the specific workpackages, month 0 marking the start of the project, and all other start dates being relative to this start date. 6 Relative end date, month 0 marking the start of the project, and all ends dates being relative to this start date. 7 Deliverable number: Number for the deliverable(s)/result(s) mentioned in the workpackage: D1 - Dn.

Proposal #033902, 27/3/2006 Page 40 of 62

7.5 Deliverables list

Deliverable No

Deliverable title Delivery date

Nature

Disseminationlevel

D8.1 Public website and promotional brochure 1 D PU D1.1 Quality assurance protocols and policies 3 R CO D3.1 Report outlining retrieval system functionality and

specifications 6 R RE

D5.1 Prototype of looping and marking modules 9 R PU D7.1 Report on initial user requirements and evaluation

procedures 9 R PU

D1.2 Mandatory management, financial and activity reports 12 R CO D4.1 Prototype segmentation, separation and

speaker/instrument identification system 14 P PU

D3.2 Prototype on speech and music retrieval systems with vocal query interface

20 P PU

D1.3 Mandatory management, financial and activity reports 24 R CO D2.1 Report on metadata management infrastructure and

ontology language for media objects 24 R PU

D3.3 Prototype on cross-media retrieval system 26 P PU D4.2 Prototype transcription system 27 P PU D5.2 Time stretching modules with synchronized multimedia

prototype 27 P PU

D8.2 Demonstator of deployed access tools in sound archives outside the consortium

27 D RE

D4.3 Final report on sound object representations 30 R PU D5.3 Final Enriched Access Tools report, including sound

enhancement methods 30 R PU

D6.1 Aligned cross- and multi-media interface report 30 R PU D7.2 Report on formal evaluations and deployment 30 R PU D8.3 Final Plan for Dissemination and Use of Knowledge 30 R CO D1.4 Final Public Report 30 R PU D1.5 Mandatory Final management, financial and activity

reports 30 R CO

The nature of the deliverable: R = Report P = Prototype D = Demonstrator O = Other The dissemination level: PU = Public PP = Restricted to other programme participants (including the Commission Services). RE = Restricted to a group specified by the consortium (including the Commission Services). CO = Confidential, only for members of the consortium (including the Commission Services).

Proposal #033902, 27/3/2006 Page 41 of 62

7.6 Work Package Descriptions

Workpackage description

Workpackage number 1 Start date or starting event: Month 0 Workpackage title: Project Management Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 24

Objectives The objective of this activity is to provide overall co-ordination and management for the entire contract. This includes technical and administrative coordination, performing quality assurance, taking care of partner dynamics and conflicts, and ensuring that the project is delivered according to the planned timetable and budget. It also includes the writing and reviewing of all kinds of management and project monitoring documents.

Description of work Technical project co-ordination; contractual management; organisation and co-ordination of internal communication flow; documentation management, tracking project status; establish and maintain travel plans; review and verification of deliverables; organisation of progress meetings (including notices, agendas, chairing and minutes), organisation of reviews; ensure co-ordination between the different activities as necessary; conflict resolution, and risk prevention/minimization. Communication flow between partners will be achieved using teleconferences, collaborative software tools, mailing lists, and a CVS-controlled repository of developed software. Mobility of researchers and developers will be favored and encouraged. The responsibilities that the coordinator will assume consist of the following tasks:

• Setting up the web-based collaborative tools, and promoting its usage by all partners. Basic functionalities of this tool will be in operation just after the official starting date of the project.

• Self-assessment and project workplan specification. Functional and temporal dependencies between workpackages will be carefully considered in order they do not become a source of conflicts or delays in deliverables.

• Ensuring scientific and technical quality of the activities, deliverables and dissemination elements.

• Reporting activities, progress and resource management of the project, and acting as an interface for the distribution of information and reports between the partners and the EC. Issuing management reports, progress reports and cost statements

The following formal tasks have been identified; T 1.1 Consortium Administration T 1.2 Financial Management T 1.3 Quality Assurance T 1.4 Activity Planning and Reporting

Proposal #033902, 27/3/2006 Page 42 of 62

Deliverables D1.1 Quality assurance protocols and policies (Month 3) D1.2 Mandatory management, financial and activity reports (Month 12) D1.3 Mandatory management, financial and activity reports (Month 24) D1.4 Final Public Report (Month 30) D1.5 Mandatory final management, financial and activity reports (Month 30) Deliverables 1.2, 1.3, and 1.5 will be in accordance with Annex II of the contract, in particular, Article 7 of Annex II.

Milestones and expected result Status reports every 6 months Co-ordination meetings every 3 months M1.1 Private collaborative and administrative website deployed (Month 1) M1.2 Basic development tools are available; user groups have been recruited; content databases are usable; first scientific achievements; tentative dissemination and use plan has been advanced; Status Report 1 - Project self-assessment and workplan specification (Month 6) M1.3 First disseminated scientific developments; new elements available for revision of user-feedback methodology; Status Report 2 - Revised Project workplan specification, software progress update (Month 12) M1.4 Internal assessments; first versions of prototype components available; Status Report 3 – Evaluation procedure specification and integration report (Month 18) M1.5 Final prototype components available, initial business plan; Status Report 4 – Initial evaluation results, revision requests and deployment procedure (Month 24) M1.6 Final public and private reports issued, project ends (Month 30)

Proposal #033902, 27/3/2006 Page 43 of 62


Workpackage number 2 Start date or starting event: Month 0 Workpackage title: Media Semantics and Ontologies Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 6 10 15 3

Objectives - Analyze the requirements for the representation of semantic aspects of audio objects - Identify suitable representation techniques for the efficient and expressive representation of

semantic aspects of media components and align it with the current important Europe effort on WSMO

- Provide support for a distributed ontology management environment for ontology management and reasoning. This includes ontology editing, mediating and versioning, as well as analysis of the scalable infrastructure for an efficient repository to store and manage large scale media ontology data.

Description of work T 2.1 Ontology and semantics for media object representation – Analysis of the requirements for the representation of semantic aspects of sound objects and related media. A principle component of this task will be the alignment of the Ontology language recommendations with the European effort of WSMO T 2.2 Ontology management environment – This will support a scalable infrastructure for ontology editing, browsing, merging, aligning, and versioning. Roles of partners LFUI will lead this workpackage. They will provide the main ontological framework and integrate the ontologies with the semantic web. ALL will align efforts with MPEG standards and test and develop appropriate descriptors. SILOGIC will align efforts with the OAIS and oversee integration of ontologies with system functionality. RSAMD will ensure that the metadata appropriately address user needs and are consistent with musicological standards (MusicML, MEI, XScore, etc…). Links to other workpackages This workpackage will utilize outputs from WP4 where the requirements of the ontological language can be collected. The ontological representation will be used in WP3, WP5 and WP6 for storing and querying media data intelligently.

Deliverables D2.1 Report on metadata management infrastructure and ontology language for media objects – (Month 24)

Milestones and expected result M2.1- Month 6: Semantic requirements established. Appropriate metadata requirements for representation of semantic aspects of multiple media types determined. M2.2 - Month 12: Ontology language devised. Integration of language with other standards developed. M2.3 – Month 18: Infrastructure constructed. Initial testing complete. M2.4 – Month 24: Working ontology environment, integrated with semantic web.

Proposal #033902, 27/3/2006 Page 44 of 62


Workpackage number 3 Start date or starting event: Month 0 Workpackage title Retrieval Systems Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 22 5 24 3 12

Objectives

The objective of this work package is to provide retrieval systems offering the ability to search by various musical similarity measures, to search for spoken words or phrases, and to search across different media for associated content. Queries may be text-based, spoken, or audio examples.

Description of work The objectives of this WP fall neatly into the creation of three retrieval systems T3.1 Music Retrieval - Searching and organising audio collections according to their relevance to music-related queries. By using appropriate high level features, we will obtain ranked lists of audio files related to an audio query through melodic and harmonic similarity. T3.2 Speech Retrieval - The application of multilingual speech indexing technology to mixed audio (speech, music and other sound objects) sound archives will enable the retrieval of spoken audio content by text-based queries. T3.3 Cross-media Retrieval - Using both metadata and feature extraction with combined similarity measures, this allows the user to search media in various formats (audio recordings, video recordings, notated scores, images etc…) and find related material across different media. In addition to the construction of the three retrieval systems, additional objectives are as follows; T3.4 Development of a vocal query interface will enable voice-initiated media retrieval. Roles of partners ALL will lead this WP. They will work mainly on speech retrieval and spoken queries. NICE will also work on speech retrieval, and these two partners will work together on integration of their technologies. QMUL will work on cross-media and music retrieval. Many of the components of this work package have already been developed in some form by the partners. Thus a large portion of the work concerns modification, improvement, optimisation and additional functionality.

Deliverables D3.1 Report outlining retrieval system functionality and specifications (Month 6) D3.2 Prototype on speech and music retrieval systems with vocal query interface (Month20) D3.3 Prototype on cross-media retrieval system (Month 26)

Milestones and expected result M3.1- Month 8: Initial vocal query system tested, initial speech and music retrieval algorithms developed. M3.2 - Month 14: Vocal query is fully-functional, speech and music retrieval implemented, cross-media retrieval method finalized. M3.3 – Month 20: Vocal query finished, Speech and music retrieval systems established, basic cross-media retrieval implemented. M3.4 – Month 26: Cross-media retrieval fully functional, further work is only refinement and optimization.

Proposal #033902, 27/3/2006 Page 45 of 62


Workpackage number 4 Start date or starting event: Month 0 Workpackage title Sound Object Representation Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 18 40 6 8 3 20

Objectives

Sound Object Representation is concerned with the identification of key features in audio, as well as with the appropriate segmentation of audio streams. A key initial step is the separation of multiple overlapping sources. Spoken and musical audio excerpts will then be identified, thus allowing the appropriate tools to be applied to each segment or object. Distinct objects may then be recognised using musical instrument identification and speaker identification techniques. Higher level features, such as the notes played, words said, or emotion conveyed will then be identified. The appropriate interfaces for accessing such features and interacting with the audio will be developed in WP 6. Furthermore, some of the tools developed may also be applied in WP5. Notably, this concerns source separation, which may be used both at access time and for representations of audio.

Description of work The following tasks have been identified: T 4.1 Audio stream segmentation T 4.2 Source separation T 4.3 Sound object identification T 4.4 Transcription Roles of partners NICE will contribute the primary speech-related modules (speaker and emotion detection, segmentation by sentence,…) and will lead the workpackage. ALL will contribute speech/music separation routines. DIT will develop music segmentation routines, instrument identification, and music transcription systems. QMUL will work on harmonic and melodic representations of music, as well as aligning source separation and transcription research with that of DIT. LFUI will collate the metadata formats for representing these objects and RSAMD will validate the methods, align the research with user needs studies and develop enhanced real-time performance.

Deliverables D4.1 Prototype segmentation, separation and speaker/instrument identification system (Month 14) D4.2 Prototype transcription system (Month 27) D4.3 Final report on sound object representations (Month 30)

Milestones and expected result M4.1- Month 6: Speech/music separation methods implemented and tested M4.2 - Month 10: Initial results on identification of sound objects, prototype segmenter and separator M4.3 – Month 18: Identification of speech characteristics from segmented, separated audio streams M4.4 – Month 24: Transcription of monophonic music from segmented, separated audio streams M4.5 – Month 28: Testing and evaluation of complete system

Proposal #033902, 27/3/2006 Page 46 of 62


Workpackage number 5 Start date or starting event: Month 0 Workpackage title Enriched Access Tools Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 20 46 14

Objectives The principal objective of this work package is the construction of tools enabling a more enriched experience when accessing the media in the sound archive. The tools will be integrated into a system which allows the user to interact with audio resources through separating and identifying sources, processing and modifying the audio, and aligning various sources at playback (e.g., a musical piece and its source, or speech and its associated text). A secondary objective is the synthesis of speech and music processing approaches. Both technologies are necessary since different aspects of the signal are emphasized depending on the source.

Description of work

The tasks are as follows. T 5.1 Looping and marking T 5.2 Time-scale / pitch scale modification T 5.3 Multimedia alignment and enrichment T 5.4 Sound enhancement and noise reduction Roles of partners Work Package 5 (WP5) is co-ordinated by DIT. Technologies developed by one partner with experience with speech (NICE) and two with music (DIT and QMUL) processing technologies will be applied in this workpackage. DIT will develop the source separation and timescaling tools. NICE will develop the speech to text alignment and sound enhancement tools. Filtering and noise reduction tools will be developed jointly by NICE (speech specific) and by QMUL (music specific). Looping and marking tools will be developed by QMUL.

Deliverables D5.1 Prototype of looping and marking modules (Month 9) D5.2 Time stretching modules with synchronized multimedia prototype (Month 27) D5.3 Final report, including sound enhancement methods (Month 30)

Milestones and expected result M5.1 Basic access system is running (Month 12) M5.2 Initial prototype integrated with interfaces (Month 18) M5.3 Prototype prepared for evaluation (Month 24) M5.4 Integration of the tools with the retrieval system (month 30)

Proposal #033902, 27/3/2006 Page 47 of 62


Workpackage number 6 Start date or starting event: Month 3 Workpackage title: Intelligent Interfaces Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGICPerson-months per participant: 4 4 32

Objectives This work package has two fundamental objectives. The first one consists of advanced research on intuitive human computer interface dedicated to audio and media resources interaction. This aims to improve the state of the art of manipulation, management and interaction with audio resources. The second is the production of an operational and efficient intelligent interface, setting up in the EASAIER project for demonstration. This will lead to a concrete implementation of the cognitive innovations and others tools produced by all EASAIER researches. The work package is built around the users’ needs and the innovative research. All the tasks are designed to achieve the objectives within the two and a half year time frame. All partners concerned with exploitation of the results of this task will participate in the specification and evaluation of the deliverable of the task.

Description of work To reach the ambitious objectives, several tasks running in parallel will be realised during the project: T 6.1: Operational InterfaceDesign T 6.2: Cognitive interface T 6.3: Cross- and multimedia Interface Roles of partners SILOGIC will lead this workpackage and will integrate the user interfaces with the lower level processing tools of WPs 3, 4, and 5. RSAMD and QMUL will develop mock-up prototypes which exhibit GUI functionality. RSAMD will also report on GUI design requirements, and QMUL will develop real-time optimized routines for access by the interfaces.

Deliverables D6.1: Aligned Cross- and Multi- media interface report.(Month 30) Links with other WP: Input from: WP2, WP3, WP4 & WP5 ; Output to: WP2, WP3, WP4, WP5 & WP7

Milestones and expected result Milestones:

• M6.1 Mock GUI designs devised (Month 7) • M6.2 Audio intelligent Interface (Month 21) • M6.3 Multimedia Alignment Interface (Month 26) • M6.4 Cross- and Multi-media Interface. (Month 29)

Expected result: • Operational EASAIER interface, • Navigation and efficient retrieval access, • Innovative representation and interaction with audio resources.

Proposal #033902, 27/3/2006 Page 48 of 62


Workpackage number 7 Start date or starting event: Month 0 Workpackage title Evaluation and Benchmarking Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 3 2 12 3 1 2 3

Objectives • Set up a trial sound archive, with related media, and annotate it for appropriate exploitation

during the project length • Set up appropriate user communities, including an Expert User Advisory Board, that may act as

test prototypes and steer aspects of project development • Establish and organise user-requirements that will drive the project specification and the software

functionality • Define plans for user evaluation and perform user-feedback tests and studies

Description of work The identified tasks are: T 7.1 Evaluation methodology and metrics defining a user evaluation methodology and criteria and procedures for the recruitment of sample users from a number of potential user communities. T 7.2 Demonstrator prototypes and testbeds- Present a ‘mock-up’ of the GUI and functionalities, which will be used to generate on-site, run-time information to evaluate. T 7.3 Testing, validation and feedback assessment- testing and implementing the evaluation and benchmarking procedures, carrying out the different feedback collection activities, and reporting about this feedback. Roles of partners All partners will benchmark and evaluate their own work from other workpackages. RSAMD, as leaders of this workpackage, will assess the usability of all tools in practical settings. They will oversee user studies and training, as well as coordinate activities with the Expert User Advisory Board.

Deliverables Deliverables for this workpackage will be reports as follows: D7.1 Report on initial user requirements and evaluation procedures (Month 9) D7.2 Report on formal evaluations and deployment (Month 30)

Milestones and expected result M7.1 Establishment of an Expert User Advisory Board (Month 3) M7.2 Test users recruited and annotated sound archives established (Month 4) M7.3 Initial user requirements specification for other workpackages (Month 8) M7.4 Evaluation tools and protocols specified, designed and piloted (Month 19) M7.5 Evaluations with user groups concluded, Precision-recall benchmarking of retrieval systems, final feedback assessment begins (Month 24) M7.6 Final evaluation report (Month 30)

Proposal #033902, 27/3/2006 Page 49 of 62


Workpackage number 8 Start date or starting event: Month 0 Workpackage title Dissemination and Exploitation Participant id QMUL DIT RSAMD ALL LFUI NICE SILOGIC Person-months per participant: 7 3 5 3 1 2 2

Objectives • Monitor the deployment of software tools in sound archives outside the consortium. • Attend industrial and scientific meetings. • Maximize social, scientific, and industrial awareness of the developments and outcomes of the

project. • Generate public documentation that can bring more attention to the projects from final users, and

that can help them to understand the project goals, status, and achievements. • Show demonstration prototypes in public events. • Establish an Exploitation Strategy Team who will monitor external developments and devise a

business plan for the continuation of activities and exploitation of knowledge and developments

Description of work The identified tasks are: T 8.1 Deployment- To demonstrate and market use of enhanced access tools in ‘real world’ sound archives. T 8.2 Dissemination- Covers a wide range of activities concerning spreading and promoting the project and its outcomes. T 8.3 Exploitation – This will cover the development of the IPR Management strategy and defining the business plan and road map for future development. The Exploitation Strategy Team will be established in order to monitor developments outside of the project, and prepare the roadmap for future research and development, use and exploitation for the project and its outcomes. Roles of partners All partners will participate in the dissemination and exploitation activities. RSAMD will oversee the deployment task, as this also meshes well with their evaluation activities. QMUL, as WP leaders, will lead the creation and management of the Exploitation Strategy Team.

Deliverables The deliverables for this workpackage will be the following: D8.1 Public website and promotional brochure (Month 1) D8.2 Demonstator of deployed access tools in sound archives outside the consortium (Month 27) D8.3 Final Plan for Dissemination and Use of Knowledge, i.e., business plan (Month 30)

Milestones and expected result M8.1 Initial dissemination and use plan developed (Month 8) M8.2 First scientific publications, creation of the Exploitation Strategy Team (Month 12) M8.3 Revised dissemination and use plan, incorporating IPR Management Strategy and feedback from the Expert User Advisory Board, deployment plan established (Month 18) M8.4 Market watch and state of the art analysis (Month 21) M8.5 Prototypes deployed in external sound archives, promotional kit available, initial business plan developed (Month 24)

Proposal #033902, 27/3/2006 Page 50 of 62

8 Project resources and budget overview

8.1 Efforts for the full duration of the project STREP Project Effort Form

Project acronym - EASAIER

QMUL DIT RSAMD ALL LFUI NICE SILOGIC TOTAL PARTNERS Research/innovation activities WP1 Management WP2 Media Semantics & Ontologies 6 10 15 3 34 WP3 Retrieval Systems 22 5 24 3 12 66 WP4 Sound Object Representation 18 40 6 8 3 20 95 WP5 Enriched Access Tools 20 46 14 80 WP6 Intelligent Interfaces 4 4 32 40 WP7 Evaluation & Benchmarking 3 2 12 3 1 2 3 26 WP8 Dissemination & Exploitation 4 3 3 3 1 2 2 18 Total research/innovation 71 96 31 48 23 50 40 359 Demonstration activities WP1 Management WP2 Media Semantics & Ontologies WP3 Retrieval Systems WP4 Sound Object Representation WP5 Enriched Access Tools WP6 Intelligent Interfaces WP7 Evaluation & Benchmarking WP8 Dissemination & Exploitation 3 2 5 Total demonstration 3 2 5 Consortium management activities WP1 Management 24 24 Total consortium management 24 24 TOTAL ACTIVITIES 98 96 33 48 23 50 40 388

Proposal #033902, 27/3/2006 Page 51 of 62 8.2 Overall budget for the full duration of the project

Table 2. Staff months effort per Work Package.

WP1 Management QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 24 0 0 0 0 0 0 24Personnel Costs (€) 119746 0 0 0 0 0 0 119746RTD & Innovation related activities 0 0 0 0 0 0 0 0Demonstration activities 0 0 0 0 0 0 0 0Management Activities 24 0 0 0 0 0 0 24WP2 Media Semantics & Ontologies QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 0 0 6 10 15 0 3 34Personnel Costs (€) 0 0 31073 38092 113550 0 20201 202916RTD & Innovation related activities 0 0 6 10 15 0 3 34Demonstration activities 0 0 0 0 0 0 0 0Management Activities 0 0 0 0 0 0 0 0WP3 Retrieval Systems QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 22 5 0 24 3 12 0 66Personnel Costs (€) 100503 18317 0 91420 22710 72376 0 305326RTD & Innovation related activities 22 5 0 24 3 12 0 66Demonstration activities 0 0 0 0 0 0 0 0Management Activities 0 0 0 0 0 0 0 0WP4 Sound Object Representation QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 18 40 6 8 3 20 0 95Personnel Costs (€) 82230 146535 31073 30473 22710 120627 0 433648RTD & Innovation related activities 18 40 6 8 3 20 0 Demonstration activities 0 0 0 0 0 0 0 0Management Activities 0 0 0 0 0 0 0 0WP5 Enriched Access Tools QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 20 46 0 0 0 14 0 80Personnel Costs (€) 91366 168516 0 0 0 84439 0 344321RTD & Innovation related activities 20 46 0 0 0 14 0 80Demonstration activities 0 0 0 0 0 0 0 0Management Activities 0 0 0 0 0 0 0 0WP6 Intelligent Interfaces QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 4 0 4 0 0 0 32 40Personnel Costs (€) 18273 0 20714 0 0 0 215472 254459RTD & Innovation related activities 4 0 4 0 0 0 32 40Demonstration activities 0 0 0 0 0 0 0 0Management Activities 0 0 0 0 0 0 0 0WP7 Evaluation & Benchmarking QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 3 2 12 3 1 2 3 26Personnel Costs 13706 7327 62145 11428 7570 12063 20201 134440RTD & Innovation related activities 3 2 12 3 1 2 3 26Demonstration activities 0 0 0 0 0 0 0 0Management Activities 0 0 0 0 0 0 0 0WP8 Dissemination & Exploitation QMUL DIT RSAMD ALL LFUI NICE SILOGIC Total Staff Months Total 7 3 5 3 1 2 2 23Personnel Costs 31978 10990 25894 11428 7570 12063 13467 113390RTD & Innovation related activities 4 3 3 3 1 2 2 18Demonstration activities 3 0 2 0 0 0 0 5Management Activities 0 0 0 0 0 0 0 0

Proposal #033902, 27/3/2006 Page 52 of 62 Total QMUL DIT RSAMD ALL LFUI NICE SILOGIC Staff Months Total 98 96 33 48 23 50 40 388Personnel Costs 457802 351685 170899 182841 174110 301568 269341 1908246RTD & Innovation related activities 71 96 31 48 23 50 40 359 Demonstration activities 3 0 2 0 0 0 0 5Management Activities 24 0 0 0 0 0 0 24

Overall Budget and Overall Request for Funding Table 3. Project finances by partner.

Table 4. Estimated breakdown of the EC contribution per reporting period.

8.3 Management level description of resources and budget The Project Effort form shows the allocation of activity effort by each partner to each work package and type of Activity (Research and Innovation, Demonstration and Management). This is broken down into allocations of each partner into each work package in Table 2.

Proposal #033902, 27/3/2006 Page 53 of 62 The total billable effort equates to 388 staff-months, for which the consortium seeks a contribution from the EC of €2,100,000. QMUL, LFUI, DIT and RSAMD use the AC cost model and we estimate that they will contribute together an additional 25 staff months (10% of their allocation) from permanent (non-chargeable) staff engaged in supervision of contracted researchers. This gives an effective labour total of 413 person months. This is a substantial resource mobilisation, equivalent to 34.42 person-years. If we analyse the project budget as a whole, as depicted in Table 3, we see that approximately 91.08% of the requested EC contribution is allocated to Research and Innovation activities, 1.92% to Demonstration activities and 7.00% to Consortium Management (including audit certificates from all partners). Excepting audit certificates, management costs are attributed to the coordinator, QMUL, who leads both WP1: Management and WP8: Dissemination and Exploitation. Coordinating the technical work packages, WP2-WP7, is primarily research-related. The average cost per billable staff month is €5412, including all overheads, equipment, travel and subsistence, materials and equipment. This is a below-average figure, made possible by the unusually low Overhead rates charged by most of the partners and by the use of existing testbeds and content. The project as a whole benefits from large investment in advanced Information Technology infrastructure from the partners and their work on previous projects. This allows us to concentrate effort where it is most needed. Equipment costs are minimised and substantial resources are dedicated to staff labour on research and development. The additional cash investment committed to the project by the FC and FCF partners is approximately €816,347. Analysis of the detailed figures provided by the Partners shows that this is a very cost-effective project. The requested EC contribution per reporting period is given in Table 4, and it is divided evenly throughout the 30 month duration. Each partner’s distribution of requested financial resources is given in Table 5, including estimated costs of audit certificates. Table 5. Resource allocation by partner.

Partner Personnel (€) Equipment (€) Travel (€) Consumables (€) Other (€) Audit certificates (€) Total (€)

QMUL 457802 15154 20209 1800 5850 4320 505135 DIT 351685 11628 15837 950 4500 3000 387600

RSAMD 170899 5606 7687 576 0 2100 186868 ALL 182841 6080 8328 590 0 4800 202639 LFUI 174110 5790 7977 589 2750 2784 194000 NICE 301568 9862 13448 880 0 3000 328758

SILOGIC 269341 8850 11977 790 1792 2250 295000 Total 1908246 62970 85463 6175 14892 22254 2100000

As can be seen from the summarised resource allocation as a percentage of the total budget in Table 6, additional expenses are fairly light. Travel and subsistence costs include project review meetings, partner meetings and travel to conferences, conventions and exhibitions, as estimated by each partner. Durable equipment includes the purchase of high end PCs and servers, audio equipment and professional soundcards and speakers. Consumables includes software licenses, back-up tapes, storage devices and various small items. Additional charges include journal page charges, production of promotional brochures, website maintenance and IP management costs. Large charges include maintenance of grid infrastructure (LFUI, to be used for research in WP2), maintenance of an audio production studio (QMUL, to be used for the demonstrator in WP8), and multi-projector display systems (SILOGIC, to be used for prototypes in WP6), estimated at €3600, €2750 and €1792, respectively .

Proposal #033902, 27/3/2006 Page 54 of 62

Table 6. Resource allocation as percentage of total budget.

Category Requested contribution (€) Percentage Notes

Personnel and overheads 1908246 90.87% As based on the staff month cost of each partner.

Durable equipment 62970 3.00% Primarily computing and audio equipment, large costs include

display systems and servers.

Travel and subsistence 85463 4.07%

Includes conferences/exhibitions, project meetings, meetings with the European Commission and meetings with sound archive

managers for deployment and evaluation Consumables 6175 0.29% Includes promotional brochures.

Other 14892 0.71% Primarily maintenance of websites, grid infrastructure and audio demonstration facilities

Audit certificates 22254 1.06% Estimated by each partner Total 2100000 100.00%

9 Ethical Issues The ethical codes applied in EASAIER will be consistent with the content and spirit of the European Charter of Fundamental Rights. Policies regarding any ethical aspects of research undertaken will be established in the Consortium Agreement and the application of ethical policies will be assessed by the Project Coordinator. The ethics programme of EASAIER will be concerned with the ethical practices of the consortium, the uses of the tools developed and the evaluation techniques. Using students as testers or for experimentation is the usual practice in human sciences so we will use the ethical protocols that are common practice for managing their participation in scientific research studies. As described in Section D.6 of the ethical rules in FP6, consent must be given, no personal data will be gathered, and no children will be involved as a user group at any stage in EASAIER user trials or in the social science research programme. Table 7 confirms that the proposed research does not raise sensitive ethical questions in a number of important areas. Table 7. Confirmation that sensitive ethical questions are not raised.

Does the proposed research raise sensitive ethical questions related to: YES NO

Human beings X

Human biological samples X

Personal data (whether identified by name or not) X

Genetic information X

Animals X

Does the proposed research involve: YES NO

Research activity aimed at human cloning for reproductive purposes X

Research activity intended to modify the genetic heritage of human beings which could make such changes heritable8 X

Research activity intended to create human embryos solely for the purpose of research or for the purpose of stem cell procurement, including by means of somatic cell nuclear transfer.

X

8 Research relating to cancer treatment of the gonads can be financed

Proposal #033902, 27/3/2006 Page 55 of 62 Gender issues There are no specific gender issues related to this project. The EASAIER project aims to promote the participation of both women and men alike. Procedures to ensure that the activities of EASAIER do not disadvantage either gender will be incorporated into the administrative procedures implemented by the Administration Coordinator. Examples of good practice in the area of gender mainstreaming will be drawn upon and adapted to the particular needs of this project and the consortium. We will consider both the roles of women and men as equal within the project work, and the impact of the project on each.

Appendix A: Consortium Description Roles of the Partners

The EASAIER Consortium consists of 3 companies and 4 academic institutions, with 5 EU countries represented. The partners have considerable experience in national and international projects, including many that are directly applicable to this proposal. The Consortium merges complementary partners that cover different facets of the scientific, technical and dissemination and exploitation skills required. They have strength in both speech and music, processing and retrieval, and the relevant skills in evaluation, deployment and integration.

The following aspects highlight the quality of the consortium: 1. Research/Industry balance: There are four academic research entities entities (QMUL, DIT, LFUI, RSAMD), of which RSAMD is also an internationally esteemed conservatoire. The three industrial partners (NICE, SILOGIC, ALL) provide a strong balance to academic approaches, as well as offering expanded routes for dissemination and exploitation.

2. Scientific competence balance: This consortium has the appropriate strength in semantics, software integration and evaluation, principally through LFUI, SILOGIC and RSAMD respectively. They also benefit from prior work establishing content testbeds (NICE, ALL, QMUL) and user needs (RSAMD). Furthermore, this consortium is exceptional in its approach to audio in that it has the capabilities to tackle audio issues using a synthesised approach which exploits both speech and music processing. The Centre for Digital Music at QMUL, and the Digital Media Centre at DIT are both internationally renowned for their musical signal processing research, and NICE and ALL both have established state of the art speech technologies. The ability to represent and manipulate sound objects with a unified approach represents a unique opportunity.

3. Size balance: The consortium amalgamates four research institutions, two large to medium sized companies (NICE and SILOGIC), and one SME (ALL), which leads the Work Package for Retrieval Systems. ALL is fully involved in all aspects of the project: it will work closely with NICE on the application of speech technologies, as well as with QMUL on the integration of retrieval systems

4. Geographic balance: The Consortium combines partners from 5 EU countries including one new member state (Hungary). One associated member state is also represented (Israel). Thus a significant and well-distributed proportion of the EU is represented in the consortium.

For an overview of how each partner fits into the project, see the General Description in Section 7.1.

Queen Mary University of London - QMUL Leaders, WP1 and 8

Queen Mary’s Centre for Digital Music (UK) does research into technologies for audio and music. Our research covers the field of Music & Audio Technology from record/replay equipment in the home or studio, to the simulation and synthesis of instruments and voices, acoustic space simulation, music understanding, delivery and retrieval. The interface between audio and music, and representations for them both form a particular current focus for our work.

The signal processing techniques at the heart of the Lab’s work include: Time-frequency and Time-Scale Analysis, Neural Networks, Hidden Markov Models, Matching Pursuits, Transient Analysis and Independent Component Analysis.

Our projects often involve developing signal processing techniques to extract meaningful feature sets from music. Notably, we have been involved in the COST action on Digital Audio Effects, the OMRAS project, which created a music retrieval system based on transcription, and are currently involved in the EU SIMAC

Proposal #033902, 27/3/2006 Page 56 of 62 project, which develops prototypes for the automatic generation of semantic music descriptors and prototypes for exploration, recommendation, and retrieval of music, and the EPSRC funded SeMMA project: Hierarchical Segmentation and Semantic Markup of Musical Audio.

Dr. Joshua D. Reiss has bachelor's degrees in both physics and mathematics, and a PhD in physics from the Georgia Institute of Technology. In June of 2000, he accepted a research position in the Audio Signal Processing research lab at King's College, London. He has since become a Lecturer with the Centre for Digital Music in the Electronic Engineering department at Queen Mary, University of London. His research interests include nonlinear dynamics, audio and music processing, and music retrieval systems. He is the vice-chair of the Audio Engineering Society Technical Committee on High-resolution Audio, and the program chair of the 6th International Conference on Music Information Retrieval (ISMIR 2005). He is a frequent reviewer for IEEE journals, has submitted patents and published numerous scientific papers, most notably on audio formats, evaluation of Music Information Retrieval systems and multidimensional multimedia indexing.

Prof. Ebroul Izquierdo, is Chair of Multimedia and Computer Vision at Queen Mary. Since receiving his PhD from Humboldt University, Germany, in 1993, he has been involved in research and management of projects in Germany, the UK and the EU. In the European project ACTS-PANORAMA, he developed techniques for disparity estimation and intermediate view synthesis. Dr. Izquierdo was the UK representative of the EU Action Cost211 and currently coordinates the EU Action Cost292. He coordinated the EU IST project BUSMAN and represents QMUL in the European IST Network of Excellence SCHEMA and is a main contributor to the IST IP aceMedia.

Prof. Izquierdo is associate editor of the IEEE Transactions on Circuits and Systems for Video Technology and has served as guest editor of two special issues. He is a Chartered Engineer, a senior member of the IEEE, the IEE and the British Machine Vision Association. He is on the management committee of the Information Visualization Society, the programme committee of the IEEE conference on Information Visualization, the international program committee of EURASIP&IEEE conference on Video Processing and Multimedia Communication and the COST sponsored European Workshop on Image Analysis for Multimedia Interactive Services. He was chair of the European Workshop on Image Analysis for Multimedia Interactive Services, 2003 and the European Workshop for the integration of Knowledge, Semantics and Content, 2004. He has published over 150 technical papers.

Professor Mark Sandler is leader of the DSP and Multimedia Group, as well as the Centre for Digital Music. Mark (born in 1955) has worked in Digital Signal Processing for Audio since 1978. He became Professor of Signal Processing at Queen Mary in September 2001, following 19 years at King’s College, where he was Professor and Head of Department. He was general chair of the recent ISMIR2005 conference, was an active participant in EU COST G6: Digital Audio Effects (DAFX), and chair of DAFX 2003. He is chair of the new AES TC on Semantic Audio Analysis. Mark was awarded two consecutive A.H. Reeves Premium prizes from the Institution of Electrical Engineers in 1996 and 1997 for research on SDM. He is a Fellow of the IEE, a Fellow of the AES and Senior Member of IEEE. He founded Insonify, a spin-off company which has commercialised scalable audio streaming technologies. He regularly reviews papers for IEEE, IEE and others, and reviews grants for EPSRC, having been a member of its prestigious College since its inception. Prof. Sandler is principal investigator on the SEMMA project and QMUL’s principal investigator on the EU IST project SIMAC.

Dublin Institute of Technology - DIT Leaders WP5 The Dublin Institute of Technology's Digital Media Centre (Ireland) provides a multi-disciplinary environment for research, education and commercial development focused on the application of interactive digital media. Current projects focus on: Toolsets for Virtual Environments; Spatio-temporal Interfaces for Cultural Data; Toolsets and Methodologies for the Creation of an Irish Cultural Portal. Of particular significance to this project, is the Digital Audio Research Group which has emerged from the successful project, DiTME (Digital Tools for Music Education). The project focused on developing an integrated software based toolset which will aid trainee musicians learning their instrument. The toolset has 3 main functionalities;

Proposal #033902, 27/3/2006 Page 57 of 62 1 - To allow a user to isolate and audition any single instrument within an ensemble recording. 2 - To allow the user to speed up or slow down the separated audio without affecting pitch. 3 - To automatically generate a notated standard musical score from the audio signal. Research carried out by the group pertains to all aspects of digital audio processing including; sound source separation, polyphonic music transcription, efficient musical descriptors, time and pitch-scale modification, mobile device audio applications, audio restoration processing, audio special effects, speech recognition and synthesis and noise reduction. To date the group has generated 3 patents. The group is also involved in the recently funded fp6 project SALERO (Semantic Audiovisual Entertainment Reusable Objects project, 2006-2009. The Digital Media Centre at DIT is also currently the lead partner in a consortium in response to a government call for the establishment of a national digital research centre.

Dr. Eugene Coyle (BE, MSc, PhD, Fellow of the Institution of Engineers of Ireland, Chartered Engineer) is currently the Head of School of Control Systems and Electrical Engineering at DIT. He was principal investigator on the Digital Interactive Tools for Computer Assisted Language Learning (DITCALL) and the DiTME and project. His research areas include applied digital signal processing, audio technologies, biomedical engineering & electrical engineering. Dr. Richard Hayes (BSc, MSc, PhD, MIEI) graduated from DIT in 1973 with a BSc in Electrical Engineering, He subsequently attained an MSc in 1974 and a PhD in 1980, both in Electronic Engineering at TCD. In 1978, he was appointed to the academic staff in the school of Control Systems and Electrical Engineering in DIT. He has contributed to numerous engineering informatics projects, including Openlabs, Synapses, SynEx, and Cryocell. He is also leader of a European Medical Informatics Standardisation project team to develop a standard interface between Analytical Instruments and Laboratory Information Systems. His research interests include: process modeling and control, digital signal processing, informatics and microprocessor applications.

Royal Scottish Academy of Music and Drama - RSAMD Leaders WP7 The Royal Scottish Academy of Music and Drama (UK) is Scotland’s international conservatoire. The RSAMD’s National Centre for Research into the Performing Arts (NCRPA), set up in January 1999, has developed a distinctive creative research ethos. Staff and consultants work on a portfolio of projects across a number of core artistic and related areas with areas of specialization including e-learning and user-centred digital resource provision in the performing arts. RSAMD also has representation on a number of national steering and consultative groups. The National Centre for Research into the Performing Arts (NCRPA) at the RSAMD has developed a distinctive creative research ethos. Their staff and consultants work on a portfolio of externally funded projects and have a thriving research degree programme. The NCRPA undertakes projects and consultancy services across a number of core artistic and related areas including educational arts policy and strategic development, training and continuing professional development of performing arts professionals, e-learning and networked resource provision in the performing arts. Over its 3 year duration, HOTBED (standing for Handing on the Tradition by Electronic Dissemination), a three-year JISC-funded project at the RSAMD, evaluated the use of networked digital sound materials in the conservatoire curriculum. The project built a networked collection of sound and video resources and tools to manipulate them, investigating through close user needs analysis how best to exploit these to enhance learning and teaching. RSAMD’s recent SHEFC-funded project on a feasibility study and future strategy for digital archiving at the RSAMD also reflects research interests in utilisation of new technology in learning and teaching in the performing arts. Celia Duffy - As Head of Research at the RSAMD’s NCRPA, Celia leads the team responsible for development and management of research, consultancy and knowledge transfer activities at the RSAMD and has overall responsibility for research degree programmes. A recent panel member of the Arts and Humanities Research Council, Celia Duffy currently serves on its newly formed ICT Programme Steering Group and the AHRC Centre for the History and Analysis of Recorded Music Management Committee. She is a long-standing member of the JISC's Moving Picture and Sound Working Group, the JISC/NSF Spoken Word project’s Steering group and chairs the British Library’s Archive Sound Recordings Project User Panel. Recent consultancy work includes monitoring and expert advice to the Heritage Lottery Fund, advice

http://www.ahrb.ac.uk/apply/research/strategicinitiatives/ict_in_arts_humanities_research.asp

http://www.ahrb.ac.uk/apply/research/strategicinitiatives/ict_in_arts_humanities_research.asp

http://www.charm.rhul.ac.uk/

http://www.jisc.ac.uk/index.cfm?name=wg_moving_members

Proposal #033902, 27/3/2006 Page 58 of 62 to the Associated Board of the Royal Schools of Music on distance learning approaches and an evaluation and e-learning strategy for the Royal College of Music. Stephen Broad is Research Lecturer at the RSAMD. He undertook interdisciplinary studies at Universities of Glasgow (MA, Music and Physics) and Oxford (DPhil, [pending]). In addition to research interests in musicology and music history, he was Research Officer for the National Audit of Youth Music, commissioned jointly by Youth Music, the Scottish Arts Council and the Musicians’ Union. He has research expertise in quantitative data collection and analysis and evaluation.

Applied Logic Laboratory - ALL Leaders, WP3 ALL (Hungary) is an SME which was established in 1986 from a group of mathematicians and experts of different fields of computer science and software engineering with the initial aim of sentencing themselves to scientific research and development of intelligent systems in different fields of application. ALL's fundamental idea has been a wide scale of researches based on a new, unified logic oriented approach in the area of advanced information technology. In the meantime ALL has become an organisation of international reputation with a wide network of scientific co-operation including joint R+D groups in Moscow and Kiev. ALL's activities includes the development of various methods of plausible reasoning including statistical, logical and fuzzy logic methods, case-based reasoning and methods based on analogy. These methods are used to realise abduction, deduction and induction for reasoning. ALL participates in R+D projects, develops special IT systems and markets its services and products. During the SoundStore project, ALL developed a technology capable to effectively index speech data while allowing fast data retrieval. The technology was was successfully demonstrated in a prototype application which indexed a database of spoken Hungarian. The result of the projects stimulated the interest of several multimedia archives, and is being further developed in the scope of the SoundStore 2 project at ALL. ALL is a member of the Hungarian Upeer Ontology (MEO) project’s team and responsible for the Development of the logical layer of the ontology and for the improvement and application of the OntoClean technology. Tamas Gergely Dr. (Mathematician, Ph.D., D.Sc., Fellow of the Russian Academy of Natural Sciences) is the Director of Applied Logic Laboratory His research areas include computer science, artificial intelligence, cognitive systems, modeling of systems of high complexity, medical and biological informatics. Miklós Szőts, Msc. in Civil Engineering, Phd in Computer Science, is one of the leading researchers of ALL. His experience lies in knowledge based systems, knowledge management and ontology technologies. He is specialized in meta-ontologies and medicine related knowledge base systems. He is a lecturer at the Software Engineering Faculty of the Budapest University of Technology and Economics. Balázs László Msc. in Mathematics. He is the leader of the SoundStore and SoundStore 2 projects and has expertise in the fields of speech related algorithms, text classification, knowledge management and Natural language understanding.

Leopold-Franzens University of Innsbruck - LFUI Leaders, WP2 LFUI (Austria) is one of the major research institutions in the area of Semantic Web and Semantic Web services technology worldwide, especially with its Digital Enterprise Research Institute (DERI) managed by Prof. Dieter Fensel, who is one of the leading experts in this field and has a strong record in successfully undertaking IST projects. The institute is currently involved in a number of FP5 and FP6 EU projects related to the Semantic Web and Semantic Web services such as: DIP9, SWWS10, Esperonto11, SEKT12 and Knowledge Web13. LFUI collaborates with international research institutions in Europe, the United States, Japan, South Korea, Australia, and Singapore, and maintains project partnerships with leading industrial partners, like SAP, BT, or HP. Within the EASAIER project, University of Innsbruck will contribute as research partner and will mainly be responsible for developing an appropriate ontology language, for providing methodologies and tools for 9 http://dip.semanticweb.org/ 10 http://swws.semanticweb.org/ 11 http://esperonto.semanticweb.org/ 12 http://sekt.semanticweb.org/ 13 http://knowledgeweb.semanticweb.org/

Proposal #033902, 27/3/2006 Page 59 of 62 ontology design, storage, and maintenance, and for adapting and extending Semantic Web technology to the sound retrieval application domain Prof. Dr. Dieter Fensel (1960) obtained a Diploma in Social Science at the Free University of Berlin and a Diploma in Computer Science at the Technical University of Berlin in 1989. In 1993 he was awarded a Doctoral degree in economic science (Dr. rer. pol.) at the University of Karlsruhe and in 1998 he received his Habilitation in Applied Computer Science. He was working at the University of Karlsruhe (AIFB), the University of Amsterdam (UvA), and the Vrije Universiteit Amsterdam (VU). In 2002, he accepted a full professor position and a chair for Computer Science at the University of Innsbruck, Austria. He has been involved in several national and internal research projects, for example, in the IST projects dip, IBROW, Knowledge Web, Ontoknowledge, Ontoweb, SWWS, and Wonderweb. He has been the project coordinator of dip, Knowledge Web, Ontoknowledge, Ontoweb, and SWWS. He published around 150 papers as books and journal, book, conference, and workshop contributions. He co-organized around 150 scientific workshops and conferences and has edited several special issues of scientific journals. He is associated editor of the Knowledge and Information Systems: An International Journal (KAIS), IEEE Intelligent Systems, the Electronic Transactions on Artificial Intelligence (ETAI), Web Intelligence and Agent Systems (WIAS), Elsevier's Journal on Web Semantics: Science, Services and Agents on the World Wide Web and the Lecture Notes in Computer Science (LNCS) subline entitled "Semantics in Data Management". His current research interests include Ontologies, Semantic Web, Semantic Web services, Knowledge Management, Enterprise Application Integration, and Electronic Commerce. Dr. Martin Hepp (1971) obtained a Master degree in Business Management and Business Information Systems at the University of Würzburg in 1999 and a PhD in Business Information Systems (summa cum laude) at the same institution. In his PhD thesis, he evaluated the conceptual dynamics of business ontologies and semi-automated maintenance and evolution of industrial ontologies. The thesis was awarded the 2004 Dissertation Award of the Alcatel SEL Foundation for Communication Research and the Dissertation Award 2004 of the Unterfränkische Gedenkjahrstiftung. Starting the Academic Year 2003, he became an Assistant Professor of Computer Information Systems at Florida Gulf Coast University in Fort Myers, Florida. He was a Visiting Scholar at Boston University in 2002 and a Visiting Scientist with the e-Business Solutions Group at IBM Research, Zurich Research Laboratory in 2004. Since 2004, he is a Senior Researcher with the Leopold-Franzens University of Innsbruck and leads the research cluster “Semantics in Business Information Systems” (SEBIS) at the same institution. His current research interests include Electronic markets, Business Ontologies, Semantic Web services, and Data and Process Modeling techniques. Dr. Ying Ding is Senior Researcher at the University of Innsbruck. Before, she worked as a senior researcher at the Division of Mathematics & Computer Science at the Free University of Amsterdam. She completed her Ph.D. in School of Applied Science, Nanyang Technological University, Singapore. The concerns of her doctoral research are information retrieval, information visualization and knowledge management by using data mining techniques. After that, she has been involved in various European-Union funded projects: research-oriented EU projects (On-To-Knowledge, IBROW, SWWS, COG, h-Techsight, Esperonto), thematic network (Ontoweb), and Accompanied Measurements (Multiple). In EU 6th FP, she is involved in Knowledge Web (Network of Excellence), DIP (Integrated Project on Semantic Web Services) and SEKT (Integrated Project Semantic Knowledge Management). She is very proactive in many consultancy projects between the University and the companies. She has published around 40 papers in top-level journals and conferences. She is co-author of the book "Intelligent Information Integration in B2B Electronic Commerce". She is also co-author of book chapters in the book "Spinning the Semantic Web". Her current interest areas include Semantic Web, Semantic Web Services, semantic information retrieval, knowledge and content management, e-Commerce and commercial application of Semantic Web Technology.

NICE Systems Leaders, WP4 NICE Systems (Israel) is a worldwide leader of multimedia recording solutions, applications, and related professional services for business interaction management. NICE solutions, which support traditional, hybrid and VoIP environments, are used throughout the enterprise, public and security sectors for gaining insight from customer interactions and for enhancing individual and public security.

Proposal #033902, 27/3/2006 Page 60 of 62 In the enterprise sector, NICE is the market leader in providing the enterprise with solutions for the capture, quality monitoring, and analysis of customer interactions for a wide range of markets, from healthcare to retail and Telecoms, as well as outsourcing contact centers and financial services.

• Software, equipment, and professional service for contact centers of all types of organizations, and for financial trading floors

• Multimedia surveillance and control for corporate facilities, offices, factories, warehouses and other centers which require comprehensive video tracking and analysis of potential risk factors

For the public sector, NICE provides solutions that ensure freedom, mobility and security: • Multimedia command and control, and video content analytics for government agencies, hospitals,

transportation, and airport security. • Communications recording, storage and content analytics for government, agencies, First Responders,

defense organizations, air traffic control and similar traffic sites. • End-to-end solutions for the interception, monitoring and content analysis of various telephony and

internet network links for internal security and law enforcement.

Moshe Wasserblat is Audio Analysis Research Group Manager at NICE. Mr. Wasserblat holds an M.Sc. in Electrical and Computer Engineering from Ben Gurion University. He has 8 years of experience including Managing DSP code, Algorithms and Embedded R&D teams at NICE Systems. Previously, he developed DSP code and algorithms for modems at Orckit Communication. In 2002, he initiated the Magneton program for the Israeli Ministry of Industry and Trade. The research was focused on “Audio Classification for Content Analysis System in Telephony Channels.” From 2000-2004, he led work on audio algorithms in the KITE consortium. Mr. Wasserblat is a member of the IEEE and the ISCA (International Speech Communication Association). Dr. Yaniv Zigel is Senior DSP Algorithm Engineer at NICE. Dr. Zigel holds a Ph.D in Electrical and Computer Engineering (Research: feature selection for speaker recognition), M.Sc. in Electrical and Computer Engineering (Thesis: ECG signal compression), and B.Sc. in Electrical and Computer Engineering, all from Ben Gurion University. He is also a member of the IEEE and the ISCA, and a consultant and DSP algorithm developer for the MS-Tech company (Recognition of materials) and for Configate (Speaker recognition). Moshe Benjo has been with NICE Systems for 6 years, with 3 years experience in project management. His management experience has been concerned with both in R&D and Product management – including major versions and new products release, as well as experience in management of large scale projects. In the last 2 years he has managed the KITE consortium.

SILOGIC Leaders WP6 Silogic (France) is one of the leading independent IT services companies in France. Their ambition is to offer companies and administrations a wide range of services by combining a command of information technologies with knowledge of their business. . SILOGIC offers global solutions to implement scientific processing, processing of operational systems, and ground systems, by combining IT know-how with strong thematic skills in digital analysis and signal processing. Its domains of competence are real time embedded solutions, new information technologies and advanced signal processing Silogic is distinguished by its commitments to high-level standards in software quality and methodology. SILOGIC’s team of Methods and Quality engineers intervenes in process analysis and assists in quality assurance and control of software and the setting of standards. Their most experienced project leaders assist in works supervision or contract management, providing supervision of large projects, drawing up guidelines for deployment of the information system, designing new technical and functional architectures and offering methodological expertise. The development of industrial applications requires in-depth skills combining knowledge of the tangible targets and operating systems with knowledge of validation and certification methods. In this context, command of the product software is a determining factor. SILOGIC offers solutions using the latest software engineering methodologies and the newest information technologies. David Cher graduated from Paul Sabatier Toulouse’s University, in 1990: “Diplôme d’études supérieures spécialisées” DESS on teledetection and satellite image processing, at the “Centre Etudes Spatiales et des Rayonnements” (C.E.S.R.). He is AVITRACK project coordinator (FP6-2002-Aero-1-502818, contract n° AST3-CT-2003-502818) and project coordinator of the ETISEO, “Evaluation for video understanding”.

Proposal #033902, 27/3/2006 Page 61 of 62 ETISEO is a two-year project, starting in January 2005, sponsored by the French government, research ministry, in order to evaluate vision techniques for video surveillance. David Cher is also in charge of the collaboration with ORION-INRIA team, on the “image automatic segmentation guided by semantic information” and of collaboration between Silogic and ARIANA-INRIA team, on the use of Spot images. Since 2002, he has been project manager in SILOGIC for image processing equipment. Luc Barthélémy: Mr. Luc Barthélémy graduated from E.S.I.E.E in 1991 in computer science, and in 1992 obtained a Master’s degree in computer vision: “DEA Théorie et Application de la vision artificielle,” Nice Sophia-Antipolis University. His experience of image processing started on a VIEWS project: VOILA in stereovision at EADS, then at Thales Communication, in the study of algorithms for tracking of targets in monocular vision. He worked at Thales ISR, in the “Tools and Methods “ department, with the development of imagery and cartography tools. He then joined LivePicture Développement, as R&D engineer working on photocomposition software within an international framework with American and Canadian teams. He next entered EGG Solution Optronics as software project manager on the photo and video panoramic solutions. Today, he occupies the post of technical project manager within Silogic on the European project AVITRACK. Benoit Baurens graduated from the Paul Sabatier University in Toulouse in 1990 in Computer Science. He has been working 2 years in the domain of advanced database systems and Knowledge-based expert systems at the ECRC (European Computer-industry Research Centre) in Munchen before joining Alcatel in Vienna. There, he worked as a software engineer and project manager. He addressed specification, design and development of system and software platforms for the management of telecommunication networks (TMN domain), mostly in the frame of multi-partner projects. His expertise involves architectural design and specification, particularly in the field of distributed systems. He joined SILOGIC in1999 where he works as a project manager for R&D projects. He notably has led the DSE and DUNES projects.

Appendix B: Publications cited in the text [1] S. Barrett, C. Duffy, and K. Marshalsay, "HOTBED (Handing On Tradition By Electronic Dissemination)," Royal

Scottish Academy of Music and Drama, Glasgow, Report March 2004.www.hotbed.ac.uk [2] M. Asensio, "JISC User Requirement Study for a Moving Pictures and Sound Portal," The Joint Information

Systems Committee, Final Report November 2003.www.jisc.ac.uk/index.cfm?name=project_study_picsounds [3] "British Library/JISC Online Audio Usability Evaluation Workshop," Joint Information Systems Committee

(JISC), London, UK 11 October 2004.www.jisc.ac.uk/index.cfm?name=workshop_html [4] S. Dempster, "Report on the British Library and Joint Information Systems Committee Usability Evaluation

Workshop, 20th October 2004," JISC Moving Pictures and Sound Working Group, London, UK 20 October 2004 [5] J. W. Dunn, M. W. Davidson, J. R. Holloway, and G. Bernbom, "The Variations and Variations2 Digital Music

Library Projects at Indiana University," in Digital Libraries: Policy, Planning and Practice, J. Andrews and D. Law, Eds.: Ashgate Publishing, 2004, pp. 189-211.

[6] E. Allamanche, J. Herre, O. Helmuth, B. Frba, T. Kasten, and M. Cremer, "Content-Based Identification of Audio Material Using MPEG-7 Low Level Description," Proceedings of the International Symposium of Music Information Retrieval, 2001.

[7] A. Wang, "An Industrial-Strength Audio Algorithm," Proceedings of the 4th Annual International Conference on Music Information Retrieval (ISMIR'03), Baltimore, MD, 2003.

[8] "Om tiddly om pom: Music-recognition software," in The Economist, vol. 365, 2002, pp. 77 [9] S. Downie and S. J. Cunningham, "Toward a theory of music information retrieval queries: System design

implications," Proceedings of the Third International Conference on Music Information Retrieval (ISMIR), Paris, France, 2002.

[10] D. Bainbridge, S. J. Cunningham, and S. Downie, "How people describe their music information needs: A grounded theory analysis of music queries.," Proceedings of the Fourth International Conference on Music Information Retrieval (ISMIR), Baltimore, Maryland, 2003.

[11] S. J. Cunningham, N. Reeves, and M. Britland, "An ethnographic study of music information seeking: implications for the design of a music digital library," Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries (JCDL), Houston, Texas, pp. 5 - 16, 2003.

[12] "Final Report of the American Memory User Evaluation, 1991-1993," American Memory Evaluation Team, Library of Congress, Washington, DC 1993. memory.loc.gov/ammem/usereval.html

[13] E. Fønss-Jørgensen, "Applying Telematics Technology to Improve Public Access to Audio Archives (JUKEBOX)," Århus, Denmark 1997.www.statsbiblioteket.dk/Jukebox/finalrep.html

[14] R. Tucker, "Harmonised Access & Retrieval for Music-Oriented Networked Information (HARMONICA)," 1997-2000. /projects.fnb.nl/harmonica/

[15] "Higher Education Training Needs Analysis (HETNA)," Scottish Higher Education Funding Council (SHEFC), Sheffield, UK November 2004.www.shefc.ac.uk/about_us/departments/learning_teaching/hetna/hetna.html

Proposal #033902, 27/3/2006 Page 62 of 62 [16] A. Marsden, "ICT Tools for Searching, Annotation and Analysis of Audio-Visual Media," UK Arts and Humanities

Research Council, Lancaster, UK September 2005.www.ahrbict.rdg.ac.uk/activities/marsden.htm [17] C. Févotte and C. Doncarli, "A unified presentation of blind source separation methods for convolutive mixtures

using block-diagonalization," Proceedings of the 4th Symposium on Independent Component Analysis and Blind Source Separation (ICA 2003), Nara, Japan, 2003.

[18] O. Yilmaz and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Transactions on Signal Processing, vol. 52, pp. 1830-1847, 2004.

[19] A. Smith, D. R. Allen, and K. Allen, Survey of the State of Audio Collections in Academic Libraries. Washington, D.C.: Council on Library and Information Resources, 2004.

[20] F. F. 2020, "Revised scenarios and guidance," SPRU-Science and Technology Policy Research, University of Sussex 2002

[21] Y. Kompatsiaris, Y. Avrithis, P. Hobson, and M. Strintzis, "Integrating Knowledge, Semantics And Content For User-Centred Intelligent Media Services: The Acemedia Project," Proceedings of the 5th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Lisboa, Portugal, 2004.

[22] J. S. Downie, J. Futrelle, and D. Tcheng, "The International Music Information Retrieval Systems Evaluation Laboratory: Governance, Access and Security," Proceedings of the 5th International Conference on Music Information Retrieval: ISMIR 2004, Barcelona, Spain, 2004.

[23] D. Donald, R. Wright, and A. Longmuir, "JISC/NSF Digital Libraries in the Classroom: “Spoken Word project,”," JISC/CNI Meeting, Brighton, UK July 2004. www.spokenword.ac.uk/

[24] J. S. Downie, K. West, A. Ehmann, and E. Vincent, "The 2005 Music Information Retrieval Evaluation eXchange (MIREX 2005): Preliminary overview," Proceedings of the 6th International Conference on Music Information Retrieval: ISMIR 2004, London, UK, 2005.

[25] I. H. Witten, D. Bainbridge, and S. J. Boddie, "Greenstone: Open-Source Digital Library Software," D-Lib Magazine, vol. 7, 2001.

[26] R. Typke, F. Wiering, and R. Veltkamp, "A Survey Of Music Information Retrieval Systems," Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), London, UK, pp. 153-159, 2005.