41
COSMIR and the OpenMIC Challenge the talk previously known as A Plan for Sustainable Music IR Evaluation Brian McFee* Eric Humphrey Julián Urbano**

MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Embed Size (px)

Citation preview

Page 1: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

COSMIR and the OpenMIC Challengethe talk previously known as

A Plan for Sustainable Music IR Evaluation

Brian McFee* Eric Humphrey Julián Urbano**

Page 2: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

solving problems with science

Page 3: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Hypothesis(model)

Experiment(evaluation)

Progress depends on reproducibility

Page 4: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation
Page 5: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

We’ve known this for a while

● Many years of MIREX!

● Lots of participation

● It’s been great for the community

Page 6: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation
Page 7: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation
Page 8: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

MIREX (cartoon form)

Scientists Code MIREX machines(and task captains)

Data (private)

Results

Page 9: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Evaluating the evaluation model

We would not be where we are today without MIREX.But this paradigm faces an uphill battle :’o(

Page 10: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Costs of doing business

● Computer time

● Human labor

● Data collection

Page 11: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Costs of doing business

● Computer time

● Human labor

● Data collection

Annual sunk costs(proportional to participants)

Best ! for $

*arrows are probably not to scale

Page 12: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Costs of doing business

● Computer time

● Human labor

● Data collection Best ! for $

*arrows are probably not to scale

Annual sunk costs(proportional to participants)

The worst thing that could happen is growth!

Page 13: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Limited feedback in the lifecycle

Hypothesis(model)

Experiment(evaluation)

Performance metrics (always)System outputs (sometimes)Input data (almost never)

Page 14: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Stale data implies bias

https://frinkiac.com/caption/S07E24/252468https://frinkiac.com/caption/S07E24/288671

Page 15: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

The current model is unsustainable

● Inefficient distribution of labor

● Limited feedback

● Inherent and unchecked bias

Page 16: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

What is a sustainable model?

● Kaggle is a data science evaluation community (sound familiar?)

● How it works:○ Download data○ Upload predictions

○ Observe results

● The user-base is huge○ 536,000 registered users○ 4,000 forum posts per month

○ 3,500 competition submissions per day (!!!)

Page 17: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

What is a sustainable model?

● Kaggle is a data science evaluation community (sound familiar?)

● How it works:○ Download data○ Upload predictions

○ Observe oresults

● The user-base is huge○ 536,000 registered users○ 4,000 forum posts per month

○ 3,500 competition submissions per day (!!!)

Distributed computation.

Page 18: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Academic Instances

Related / Prior work in Multimedia

• DCASE – Detection and Classification of Acoustic Scenes and Events

• NIST – Iris Challenge Evaluation, SRE i-Vector Challenge, Face Recognition Grand Challenge, TRECVid, ...

• MediaEval –  Emotion in Music, Querying Musical Scores, Soundtrack Selection

Page 19: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Open content

● Participants need unfettered access to audio content

● Without input data, error analysis is impossible

● Creative commons-licensed music is plentiful on the internet!○ Jamendo: 500K tracks

○ FMA: 90K tracks

Page 20: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

The distributed model is sustainable

● Distributed computation

● Open data means clear feedback

● Efficient allocation of human effort

Page 21: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

But what about annotation?

Page 22: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Incremental evaluation

● Performance estimates +/- degree of uncertainty

● Annotate the most informative examples first

○ Beats: [Holzapfel et al., TASLP 2012]

○ Similarity: [Urbano and Schedl, IJMIR 2013]

○ Chords: [Humphrey & Bello, ISMIR 2015]

○ Structure: [Nieto, PhD thesis 2015]

● Which tracks do we annotate for evaluation?

○ None, at first!

Page 23: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Incremental evaluation

● Performance estimates +/- degree of uncertainty

● Annotate the most informative examples first

○ Beats: [Holzapfel et al., TASLP 2012]

○ Similarity: [Urbano and Schedl, IJMIR 2013]

○ Chords: [Humphrey & Bello, ISMIR 2015]

○ Structure: [Nieto, PhD thesis 2015]

● Which tracks do we annotate for evaluation?

○ None, at first!

This is already common practice in MIR.

Let’s standardize it!

Page 24: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

The evaluation loop

1. Collect CC-licensed music

2. Define tasks

3. ($) Release annotated development set

4. Collect system outputs

5. ($) Annotate informative examples

6. Report scores

7. Retire and release old data

Human costs ($) directly produce data

Page 25: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

The evaluation loop

1. Collect CC-licensed music

2. Define tasks

3. ($) Release annotated development set

4. Collect system outputs

5. ($) Annotate informative examples

6. Report scores

7. Retire and release old data

Human costs ($) directly produce data

Page 26: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

What are the drawbacks here?

● Loss of algorithmic transparency

● Potential for cheating?

● CC/PD music isn’t “real” enough

Page 27: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

What are the drawbacks here?

● Loss of algorithmic transparency

● Potential for cheating?

● CC/PD music isn’t “real” enough

● Linking to source makes results verifiable and replicable!

● What’s the incentive for cheating?● Can we make it unpractical?● Even if people do cheat, we still get the

annotations.

● For which tasks?

Page 28: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Okay ... so now what?

Page 29: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation
Page 30: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation
Page 31: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

The Open-MIC Challenge 2017

Open Music Instrument Classification Challenge

● Task 1 (classification): what instruments are played in this audio file?

● Task 2 (retrieval): in which audio files (from this corpus) is this instrument played?

Page 32: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation
Page 33: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

● Complements what is currently covered in MIREX

● Objective and simple (?) task for annotators

● A large, well-annotated data set would be valuable for the community

The Open-MIC Challenge 2017

https://cosmir.github.io/open-mic/

● 20+ collaborators from various groups across academia and industry

Page 34: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Content Management System Backend

● Manages audio content and user annotations

● Flask web application will hide endpoints behind user authentication

● Working with industry to secure support for Google Compute Platform (App Engine, DataStore)

● Designed to scale with participation / users

Page 35: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Web Annotator

● Requests audio and instrument taxonomy from CMS backend

● Provides audio playback, instrument tag selection, annotation submission

● Authenticates users for annotator attribution & data quality

● Provides statistics and feedback on individual / overall annotation effort

Page 36: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Web Annotator – SONYC

Page 37: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Web Annotator – Gracenote

Page 38: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

● Audio content: green light from Jamendo!

● Instrument taxonomy – converging to ≈25 classes for now

● Currently iterating on task definition, measures, annotation task design, etc.

Data and Task Definition

Page 39: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Roadmap (Tentative)

Sept2016

Oct2017

JuneOct Nov Dec Jan2017

Feb Mar Apr May July Aug Sept

Backend

Web Annotator

IngestAudio

Annotation

ISMIRChallenge OpenTask Definition

Instrument Taxonomy

Evaluation Machinery

Build / release development set

Page 40: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

● https://cosmir.github.io … https://cosmir.github.io/open-mic/contribute.html

● Watch the GitHub repo, read / comment on issues, submit PRs: https://github.com/cosmir/open-mic

● Ask questions on [email protected]

● Participate in the Open-MIC challenge (coming Summer 2017)

● Talk to us / me! { [email protected] / [email protected] / [email protected] }

Want to get involved?

Page 41: MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Music Information Retrieval (MIR) Evaluation

Thanks!