Research data spring
Enabling Complex Analysis of Large Scale Digital Collections
27/2/2015
Lots of money has been spent digitising heritage collections. Digitised heritage collections are data. But non-computationally trained scholars don't know what to ask of large quantities of data. Often they do not have access to high performance computing facilities.
We aim to address this fundamental problem by extending research data management processes in order to enable novel research and a deeper understanding of emerging research needs.
Team
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 2
James BakerCurator, Digital
Research
Melissa TerrasProf of Digital Humanities
David BeavanSenior Research
Associate
Martin Zaltz AustwickLecturer in Data
Visualisation
Scope and Gap
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 3
Non-computationally trained scholars don't know what to ask of large quantities of digitised data
Large scale digitised collections are delivered in ad hoc forms. Exemplar workflows for analysis of large scale digitised collections are hard to find
Deploy and index large scale British Library (BL) digitised collections at UCL Research IT Services (UCL RITS). Work with researchers to turn their research questions into computational analysis. Create and release derived data, queries, and visualisations (that demonstrate potential
use) as citeable, CC-BY workflow packages
“I want to know all the
sentences that mention
European cities circa 1850 to 1900 in a BL
digitised texts and take away those results as a data set”
Impact and Benefits
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 4
Outputs from phase one of the project would be used as case studies and exemplars engage a wider community and reduce research inefficiency
The project will generate engagement with new scholarly communities around rich data resources
Narratives and workflows would be used in interdisciplinary teaching at host institutions (Melissa: MA/MSc Digital Humanities, Martin: BASc Arts and Science, MRes Advanced Spatial Analysis and Visualisation; James: BL Doctoral Training, MA History, University of Kent)
Sustainability
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 5
Derived data, queries, documentation, and visualisations released as citeable, CC-BY workflow packages with DOIs (DataCite or Figshare)
Workflow packages embedded in teaching and research training
Research computing communities beyond UCL deepen understanding of complex, poorly structured, and heterogeneous humanities data to enable process improvement
Through BL Labs, university teaching, and BAU outreach activities, narratives and lessons learned will have substantial life beyond of the project
Outputs, milestones and indicators of success
18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 6
To month 3:● Deploy 68k digitised books (circa 4bn words!) at
UCL● Identify 3+ early career researchers (2 in hand)
● Run multi-day pilot workshop in partnership with all parties, to work iteratively on data, workflow and research questions
● Output: workflow packages, derived data, visualisations to enable research insights
Social & technical barriers to analysis of large scale digitised collections are reduced
To month 7:● Lead workshops and hackdays for the wider research community● Deploy new BL datasets (based on researcher needs)● Consolidate workflow packages and recipes● Gather requirements for future infrastructure development (beyond
scope of the project)
To month 13:● Recruit data
champions to drive wider adoption of methods
● Support community led workshops focussed on specific domain needs and challenges
● Create cookbook from recepies