Upload
multimediaeval
View
30
Download
1
Embed Size (px)
Citation preview
COSMIR and the OpenMIC Challengethe talk previously known as
A Plan for Sustainable Music IR Evaluation
Brian McFee* Eric Humphrey Julián Urbano**
solving problems with science
Hypothesis(model)
Experiment(evaluation)
Progress depends on reproducibility
We’ve known this for a while
● Many years of MIREX!
● Lots of participation
● It’s been great for the community
MIREX (cartoon form)
Scientists Code MIREX machines(and task captains)
Data (private)
Results
Evaluating the evaluation model
We would not be where we are today without MIREX.But this paradigm faces an uphill battle :’o(
Costs of doing business
● Computer time
● Human labor
● Data collection
Costs of doing business
● Computer time
● Human labor
● Data collection
Annual sunk costs(proportional to participants)
Best ! for $
*arrows are probably not to scale
Costs of doing business
● Computer time
● Human labor
● Data collection Best ! for $
*arrows are probably not to scale
Annual sunk costs(proportional to participants)
The worst thing that could happen is growth!
Limited feedback in the lifecycle
Hypothesis(model)
Experiment(evaluation)
Performance metrics (always)System outputs (sometimes)Input data (almost never)
Stale data implies bias
https://frinkiac.com/caption/S07E24/252468https://frinkiac.com/caption/S07E24/288671
The current model is unsustainable
● Inefficient distribution of labor
● Limited feedback
● Inherent and unchecked bias
What is a sustainable model?
● Kaggle is a data science evaluation community (sound familiar?)
● How it works:○ Download data○ Upload predictions
○ Observe results
● The user-base is huge○ 536,000 registered users○ 4,000 forum posts per month
○ 3,500 competition submissions per day (!!!)
What is a sustainable model?
● Kaggle is a data science evaluation community (sound familiar?)
● How it works:○ Download data○ Upload predictions
○ Observe oresults
● The user-base is huge○ 536,000 registered users○ 4,000 forum posts per month
○ 3,500 competition submissions per day (!!!)
Distributed computation.
Academic Instances
Related / Prior work in Multimedia
• DCASE – Detection and Classification of Acoustic Scenes and Events
• NIST – Iris Challenge Evaluation, SRE i-Vector Challenge, Face Recognition Grand Challenge, TRECVid, ...
• MediaEval – Emotion in Music, Querying Musical Scores, Soundtrack Selection
Open content
● Participants need unfettered access to audio content
● Without input data, error analysis is impossible
● Creative commons-licensed music is plentiful on the internet!○ Jamendo: 500K tracks
○ FMA: 90K tracks
The distributed model is sustainable
● Distributed computation
● Open data means clear feedback
● Efficient allocation of human effort
But what about annotation?
Incremental evaluation
● Performance estimates +/- degree of uncertainty
● Annotate the most informative examples first
○ Beats: [Holzapfel et al., TASLP 2012]
○ Similarity: [Urbano and Schedl, IJMIR 2013]
○ Chords: [Humphrey & Bello, ISMIR 2015]
○ Structure: [Nieto, PhD thesis 2015]
● Which tracks do we annotate for evaluation?
○ None, at first!
Incremental evaluation
● Performance estimates +/- degree of uncertainty
● Annotate the most informative examples first
○ Beats: [Holzapfel et al., TASLP 2012]
○ Similarity: [Urbano and Schedl, IJMIR 2013]
○ Chords: [Humphrey & Bello, ISMIR 2015]
○ Structure: [Nieto, PhD thesis 2015]
● Which tracks do we annotate for evaluation?
○ None, at first!
This is already common practice in MIR.
Let’s standardize it!
The evaluation loop
1. Collect CC-licensed music
2. Define tasks
3. ($) Release annotated development set
4. Collect system outputs
5. ($) Annotate informative examples
6. Report scores
7. Retire and release old data
Human costs ($) directly produce data
The evaluation loop
1. Collect CC-licensed music
2. Define tasks
3. ($) Release annotated development set
4. Collect system outputs
5. ($) Annotate informative examples
6. Report scores
7. Retire and release old data
Human costs ($) directly produce data
What are the drawbacks here?
● Loss of algorithmic transparency
● Potential for cheating?
● CC/PD music isn’t “real” enough
What are the drawbacks here?
● Loss of algorithmic transparency
● Potential for cheating?
● CC/PD music isn’t “real” enough
● Linking to source makes results verifiable and replicable!
● What’s the incentive for cheating?● Can we make it unpractical?● Even if people do cheat, we still get the
annotations.
● For which tasks?
Okay ... so now what?
The Open-MIC Challenge 2017
Open Music Instrument Classification Challenge
● Task 1 (classification): what instruments are played in this audio file?
● Task 2 (retrieval): in which audio files (from this corpus) is this instrument played?
● Complements what is currently covered in MIREX
● Objective and simple (?) task for annotators
● A large, well-annotated data set would be valuable for the community
The Open-MIC Challenge 2017
https://cosmir.github.io/open-mic/
● 20+ collaborators from various groups across academia and industry
Content Management System Backend
● Manages audio content and user annotations
● Flask web application will hide endpoints behind user authentication
● Working with industry to secure support for Google Compute Platform (App Engine, DataStore)
● Designed to scale with participation / users
Web Annotator
● Requests audio and instrument taxonomy from CMS backend
● Provides audio playback, instrument tag selection, annotation submission
● Authenticates users for annotator attribution & data quality
● Provides statistics and feedback on individual / overall annotation effort
Web Annotator – SONYC
Web Annotator – Gracenote
● Audio content: green light from Jamendo!
● Instrument taxonomy – converging to ≈25 classes for now
● Currently iterating on task definition, measures, annotation task design, etc.
Data and Task Definition
Roadmap (Tentative)
Sept2016
Oct2017
JuneOct Nov Dec Jan2017
Feb Mar Apr May July Aug Sept
Backend
Web Annotator
IngestAudio
Annotation
ISMIRChallenge OpenTask Definition
Instrument Taxonomy
Evaluation Machinery
Build / release development set
● https://cosmir.github.io … https://cosmir.github.io/open-mic/contribute.html
● Watch the GitHub repo, read / comment on issues, submit PRs: https://github.com/cosmir/open-mic
● Ask questions on [email protected]
● Participate in the Open-MIC challenge (coming Summer 2017)
● Talk to us / me! { [email protected] / [email protected] / [email protected] }
Want to get involved?
Thanks!