20
The opportunistic librarian: A Leuven confession Demmy Verbeke

2906 system 2 google mas atômicos Interação entre la

Embed Size (px)

DESCRIPTION

mas atômicos Interação entre lamas atômicos Interação entre lamas atômicos Interação entre lamas atômicos Interação entre lamas atômicos Interação entre lamas atômicos Interação entre la

Citation preview

The opportunistic librarian: A Leuven confession

The opportunistic librarian: A Leuven confession

Demmy Verbeke

Libraries and DH

acrl.ala.org/dh

(Posner 2013)

Why the Digital Humanities?

(Spiro 2011; Vandegrift – Varner 2013)

Provide wide access to cultural information

Enhance teaching and learning

Transform scholarly communication

Make a public impact

Enable manipulation of data

Leiden University Library (1610)

Joe & Rika Mansueto Library,

University of Chicago

Saltire Centre, Glasgow Caledonian University

The collaboration triangle

Humanities research

Information technology

Information management

DH

R&D in libraries

(Nowviskie 2013, Nowviskie 2014)

“When a library can both support basic digital scholarship needs through distributed services and create a critical mass of staffing and intellectual energy in something like a center (however conceived), it has set the conditions for the advancement of knowledge itself, through the fulfillment of research desires yet unknown, un-expressed.”

(Nowviskie 2014)

Centre of expertise for:

Plagiarism

Copyright

Open Access

Digital Humanities

Academic Bibliography &

Institutional Repository

Reference librarians

A library supporting research

Institutional context

since November 2011: DH Task Force, Arts Faculty (vice dean of research, faculty librarian, head of the faculty’s computer department, research support officer of the faculty, and all interested researchers)

2014: 3 new academic positions: Tenure track professor in DH (Arts Faculty), Computer Science for DH (Department of Computer Science), Human-Media Interaction (Institute for Media Studies)

2015: Advanced Master in Digital Humanities

A library supporting DH

White paper “Digital Humanities en/in KU Leuven bibliotheken“ of the Library Council of the Humanities and Social Sciences Group

(February 2013)

intention to focus on:

digitisation projects

supporting relevant grant applications

partnering in DH projects, from inception to completion (and beyond)

providing training in DH tools

playing an expert role in the field of scholarly communication

Project example

Portable Light Dome (“Mini-dome”)

www.arts.kuleuven.be/info/ONO/Meso/digitalisatie

Project example

RICH - Reflectance Imaging for Cultural Heritage

www.illuminare.be/rich_project

Project example

Europeana Photography

www.europeana-photography.eu

Project example

OCR/NER for 17th-, 18th- and 19th-C Dutch books

Funding:

SUpport action Centre for CompEtEnce in Digitisation (www.succeed-project.eu)

Team:

Digitisation services of University Library (Diewer van der Meijden, Mark Verbrugge, Bruno Vandermeulen)

LIBIS (Sam Alloing)

Arts Faculty Library (Demmy Verbeke)

Student workers (Jolien Berckmans, Els Meskens)

Support:

INL (Instituut voor Nederlandse Lexicologie)

KU Leuven & succeed

goals

End goal:

integration of OCR in digitisation workflow at KU Leuven

integration of NER in digitisation workflow at KU Leuven

Specifically:

learn from digitising textual material with a view to OCR (rather than as a representation of the book as physical object)

understand OCR possibilities

learn how to enrich textual material with NER

develop workflows, identify infrastructure problems, etc.

KU Leuven & succeed

corpus

13 books from the pretiosa collection of the Gulden Librije:

translations from Latin

monolingual Dutch (so without Latin original)

books with comparable, simple typefaces (no Gothic)

books that have not been digitized yet

Augustinus, Stad Gods (1876-8); Augustinus, Belydenis (1741); Boëthius, Vertroostinge der wysgeerte (1703); Horatius, Over de dichtkunst (1866); Horatius, Hekeldichten en brieven (1728); Nepos, Leevens van doorlugtige mannen (1796); Nepos, Leeven der doorluchtige veld-ooversten (1726); Ovidius, Treur-digten (1814-5); Ovidius, Treur-gesangen (1692); Seneca, Christelycke Seneca (1705); Tacitus, Vande ghedenkwaerdige geschiedenissen der Romeinen (1645); Vergilius, Wercken (1737); Vergilius, Aeneis (1662)

KU Leuven & succeed

tools

ABBY Finereader Engine SDK 11 – OCR

User Pattern Trainer of ABBY Finereader – train OCR

IMPACT historical lexicon for Dutch, integrated as a FineReader external dictionary – improve OCR

Aletheia – build ground truth

ocrevalUAtion – compare OCR results

NER tool for Europeana Newspapers – NER

NE Attestation Tool – manually correct NER

NERT – build training & test set

Conclusion

“This is one of the great opportunity spaces that the Digital Humanities opens up, giving archivists, librarians, and curators a chance to not simply enlarge but completely re-envision their communities, publics, and missions.”

(Burdick et al. 2012, 48-49)

References

@[email protected]

Anne Burdick and others, Digital_Humanities (Cambridge: MIT Press, 2012)

Christian Clausner, Stefan Pletschacher and Apostolos Antonacopoulos, ‘Efficient OCR Training Data Generation with Aletheia’, in Proceedings of the 11th International Association for Pattern Recognition (IAPR) Workshop on Document Analysis Systems (DAS2014)

William A. Kretzschmar and William Gray Potter, “Library Collaboration with Large Digital Humanities Projects,” Literary and Linguistic Computing 25, no. 4 (2010): 439–445

&D