10
HELDIG Summit Helsinki 7 November 2019 Leo Lahti (University of Turku) Bibliographic Data Harmonization in Research open ecosystems for scalable collaboration leo.lahti @iki.fi | @openreslabs

Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

HELDIG SummitHelsinki

7 November 2019Leo Lahti (University of Turku)

Bibliographic Data Harmonization in Research open ecosystems for scalable collaboration

[email protected] | @openreslabs

Page 2: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

Shakespeare was made big by small books!

Data: ESTC | Figure: DH2019 (best!) poster, Utrecht.

Drastic shift from large (2fo/4to) to small (8vo/12mo) books observed around 1700’s.

… how reliable and representative this data set is?

Page 3: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

One (non-standard) XML file

~480 000 entries (1470-1800)

Designed for information retrieval rather than quantitative analysis

Not openly available

Browsable online: http://estc.bl.ac.ukSubject catalogue of the University Library of Graz.

Source: Wikimedia Commons.

Page 4: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

Research potential of library catalogues has been debated for decades

Bibliography and Scienceby

G. Thomas Tanselle

Page 5: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

<

Actors: - 558,243 original- 92,044 (16%) harmonized

Variants of Shakespeare in ESTCghost of shakespearekenrick, william shakespeareshakespeare, johnshakespeare room (birmingham, england)shakespeare, thomas, active 1598shakespeare, williamshakespeare, william, 1564-1616shakespeare, william, 1564-1616., (adaptations)shakespeare, william, 1564-1616, (adaptations)shakespeare, william, 1564-1616., (adaptions)shakespeare, william, 1564-1616., (selections)

Original data not ready for analysis

Actor harmonization: Mark Hill, Ville Vaara

Page 6: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

From library catalogues to research reports?Research potential

Open

bibliographic data scienceecosystem

Research cases

Page 7: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

Open data science ecosystem?Authors PublishersEditions Publication place GatheringsPage countLanguage Genre...

R for Data Science / H. Wickham

Dedicated data science infrastructureReproducible & automated workflowsOpen source (use/contribute/develop)Semi-automated curationHighly collaborative effort

Page 8: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

“Standard” doc sizes vary across time and spaceData availability (HPB):

- Gatherings: 22.5%- Height: 11.6%- Width: 1.1%

Page 9: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

Counting editions by publishers in London 1637-1662

- Manual curation (David Gants)- Automated analysis (Iiro Tiihonen)

Good correspondence supports our automated approach.

Boost curation & scalability by automation

Manually curated data from: David Gants. A Quantitative Analysis of the London Book Trade. Studies in Bibliography 55:185-213, 2002

Page 10: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare

Thanks!

Material for the slides contributed by: Mikko Tolonen, Leo Lahti, Jani Marjanen, Mark Hill, Ali Ijaz, Ville Vaara, Hege Roivainen, Iiro Tiihonen

Helsinki Computational History Group:https://www.helsinki.fi/en/researchgroups/computational-history