36
Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton ( [email protected] ) Co-Authors: Daniel A. Smith, mc schraefel, Joe Lambert

Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton ([email protected])

Embed Size (px)

Citation preview

Page 1: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Supporting the Digital Humanities

Vienna, 19–20 October 2010

Findings and Outcomes of the musicSpace Project

Speaker: David Bretherton ([email protected])

Co-Authors: Daniel A. Smith, mc schraefel, Joe Lambert

Page 2: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Presentation overview

I am going to focus on one particular outcome of musicSpace: a successor project called ‘MusicNet’.

I will concentrate on how musicSpace provided the motivation for MusicNet

2

Page 3: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

musicSpace

3

3-year project that concluded September 2010http://musicspace.mspace.fm

Page 4: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

musicSpace’s goals

To integrate access to leading online music resources using the mSpace faceted browser.

Demonstrate that integration could support rapid exploration & knowledge building.

Enable complex, multipart queries.

4

Page 5: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

MusicNet

5

July 2010 – June 2011http://musicnet.mspace.fm

Page 6: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

MusicNet’s goals

Mint URIs for composers so that content providers can unambiguously identify them.

– Hope to expand to include all music-related entities.

Publish alignment data to back-link into our data partners’ catalogues, and to other resources.

Build a suite of tools to support the alignment and integration of new linked data resources.

Build a demonstration service to illustrate the uses and benefits of the URIs and alignment data.

6

Page 7: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Contents

1. Brief overview of musicSpace

2. How musicSpace provided the motivation for ‘MusicNet’

3. MusicNet’s alignment tool

7

Page 8: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

1. Brief overview of musicSpace

8

Page 9: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Problem

9

Page 10: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

10

Centuries of material ...

Page 11: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

11

... is now increasingly digitised

Page 12: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Yet data is often ‘siloed’.

Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Media type (text, image, audio,

video)– Date of creation/publication– Subject

12

Page 13: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Yet data is often ‘siloed’.

Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Language– Copyright holder– Ad hoc/insecure nature of project

funding

13

Page 14: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Yet data is often ‘siloed’.

Interoperability has generally not been given a high enough priority.

14

Page 15: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Using current online music data resources presents barriers at

all stages of the research process:

15

It is hard to speculatively browse around a subject area.

‘Real-world’ multipart queries are effectively intractable.

Page 16: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

16

The barriers to tractability and their solutions

Need to consult several sources … and metadata from one source cannot guide searches of another source.

Insufficient granularity of data and/or search option.

Multi-part queries have to be broken down and results collated manually.

Solutions:

Integration

Increase granularity

Optimally interactive UI (‘mSpace’)

Page 17: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Solution

17

Page 18: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

18

‘musicSpace’ is a faceted browser

Page 19: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

19

Demonstration

‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded?

Screencast 1:

http://www.youtube.com/watch?v=keTN12OWies&hd=1

Page 20: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

2. How musicSpace provided the motivation for MusicNet

20

Page 21: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Data is not ‘clean’...

21

Schubert Schubert, Franz Schubert, Franz Peter Shu-po-tʻe, ‡d  1797-1828 Schubert ‡d  1797-1828 F. P. Schubert Schubert, ... ‡d  1797-1828 Schubert, F. Schubert, F. ‡d  1797-1828 Schubert, Fr. Schubert, Fr. ‡d  1797-1828 Schubert, Franciszek. Schubert, Franc. ‡d  1797-1828 Schubert, Francois ‡d  1797-1828 Schubert, Franz P. ‡d  1797-1828

Schubert, Franz Peter Schubert, Franz Peter, ‡d  1797-1828 Schubert, Franz Peter ‡d  1797-1828 Schubert, Francois, ‡d  1797-1828 Schubert. Schubert ‡d  1797-1828 Shu-po-tʿe ‡d  1797-1828 Shubert, F. (Frant $s% ) ‡d  1797-1828 Shubert, F. ‡q  (Frant $s% ), ‡d  1797-1828 Shubert, Frant $s% , ‡d  1797-1828 Shubert, Frant $s% ‡d  1797-1828 Shūberuto, F. Shūberuto, Furantsu ‡d  1797-1828 Subert, Franc ‡d  1797-1828 Subertas, F. (Francas), ‡d  1797-1828

Subertas, Francas Peteris,   1797-1828‡d Subert, F.

, .Subertas F ‡d 1797-1828 פרנץ, שוברט

シューベルト, F., 1797-1828 シューベルト , フランツ ‡d  1797-1828 舒柏特 , 弗朗茨 Schubert, Francois   1797-1828‡d

, Schubert Franz Peter   1797-1828‡d

Page 22: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Causes of dirty data

Different naming conventions;– e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’

Inclusion of non-name data in name field; – e.g. ‘Schubert, Franz, 1797-1828. Songs’,

or ‘Allen, Betty (Teresa)’

Different languages (and alphabets);

User input errors. – e.g. ‘Bach, Johan Sebastien’

22

Page 23: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Dirty data degrades the user experience

23

Searching for compositions by the composer Franz Schubert (1797–1828)...

Screencast 2:

http://www.youtube.com/watch?v=pFsYfz1vlAg&hd=1

Page 24: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

3. MusicNet’s alignment tool

24

Page 25: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Prototype 1 (musicSpace era)

25

Page 26: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Used Alignment API & Google Docs

We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc.

Alignment API produces a similarity measure for each possible match.

We planned to set a threshold for automatic approval.

Matches below that threshold would be sent to a Google Docs spreadsheet for expert review.

26

Page 27: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Shortcoming 1: no threshold

It was not possible to identify a threshold for automatic approval.

Terms are judged to be similar if they have just, say, one different character, but a difference of one character is usually significant in a name.

Names are proper nouns, and so are unsuited to WordNet’s assumptions about misspelling.

27

Page 28: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Shortcoming 1: no threshold

False matches with high similarity measures:

True matches with low similarity measures:

28

Page 29: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Shortcoming 2: no context

Alignment API compares names as strings, and the system strips the names of their context (i.e. additional metadata). – Lack of context meant the musicologist had

no way to verify the match.

Significant flaw; automation had failed so we where relying on manual review.

29

Page 30: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Prototype 2 (building a custom tool

for MusicNet)

30

Page 31: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Lessons learned

From Prototype 1:– A completely automated solution is out of the

question (for the moment...). – We needed a custom tool with a human-friendly UI

(we also wanted keyboard shortcuts for speed).– Access to additional metadata (i.e. context), so

matches can be researched by the reviewer.

From experience with faceted browsers: – Alphabetically sorted columns enable one to spot

synonymous names at a glance.· Normally sources give names surname first; duplication

arises from the different representation of given names.

31

Page 32: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Alignment process Data*

32

Suggested groups

Algorithm compares hash of alpha-only l.c. version of name

No groups suggested

User verified* or rejected*

Synonym groups

Manual grouping (research*)

URIs Alternative names Back links*

Page 33: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

UI of Prototype 2

33

Page 34: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Prototype 2 demo

34

Screencast 3:

http://www.youtube.com/watch?v=5f8iaryZMk0&hd=1

Page 35: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

Indicative use cases

Composer URIs: – Music(ological) content providers– Basis of a (re)search portal

Alignment tool: – Aligning databases with no authorities;– Or where authorities are inconsistent.

35

Page 36: Supporting the Digital Humanities Vienna, 19–20 October 2010 Findings and Outcomes of the musicSpace Project Speaker: David Bretherton (D.Bretherton@soton.ac.uk)

36

Thank you for listening!