7
Research data spring Enabling Complex Analysis of Large Scale Digital Collections 27/2/2015 Lots of money has been spent digitising heritage collections. Digitised heritage collections are data. But non- computationally trained scholars don't know what to ask of large quantities of data. Often they do not have access to high performance computing facilities. We aim to address this fundamental problem by extending research data management processes in order to enable novel research and a deeper understanding of emerging research needs.

Research data spring Enabling Complex Analysis of Large Scale Digital Collections 27/2/2015 Lots of money has been spent digitising heritage collections

Embed Size (px)

Citation preview

Research data spring

Enabling Complex Analysis of Large Scale Digital Collections

27/2/2015

Lots of money has been spent digitising heritage collections. Digitised heritage collections are data. But non-computationally trained scholars don't know what to ask of large quantities of data. Often they do not have access to high performance computing facilities.

We aim to address this fundamental problem by extending research data management processes in order to enable novel research and a deeper understanding of emerging research needs.

Team

18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 2

James BakerCurator, Digital

Research

Melissa TerrasProf of Digital Humanities

David BeavanSenior Research

Associate

Martin Zaltz AustwickLecturer in Data

Visualisation

Scope and Gap

18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 3

Non-computationally trained scholars don't know what to ask of large quantities of digitised data

Large scale digitised collections are delivered in ad hoc forms. Exemplar workflows for analysis of large scale digitised collections are hard to find

Deploy and index large scale British Library (BL) digitised collections at UCL Research IT Services (UCL RITS). Work with researchers to turn their research questions into computational analysis. Create and release derived data, queries, and visualisations (that demonstrate potential

use) as citeable, CC-BY workflow packages

“I want to know all the

sentences that mention

European cities circa 1850 to 1900 in a BL

digitised texts and take away those results as a data set”

Impact and Benefits

18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 4

Outputs from phase one of the project would be used as case studies and exemplars engage a wider community and reduce research inefficiency

The project will generate engagement with new scholarly communities around rich data resources

Narratives and workflows would be used in interdisciplinary teaching at host institutions (Melissa: MA/MSc Digital Humanities, Martin: BASc Arts and Science, MRes Advanced Spatial Analysis and Visualisation; James: BL Doctoral Training, MA History, University of Kent)

Sustainability

18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 5

Derived data, queries, documentation, and visualisations released as citeable, CC-BY workflow packages with DOIs (DataCite or Figshare)

Workflow packages embedded in teaching and research training

Research computing communities beyond UCL deepen understanding of complex, poorly structured, and heterogeneous humanities data to enable process improvement

Through BL Labs, university teaching, and BAU outreach activities, narratives and lessons learned will have substantial life beyond of the project

Outputs, milestones and indicators of success

18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 6

To month 3:● Deploy 68k digitised books (circa 4bn words!) at

UCL● Identify 3+ early career researchers (2 in hand)

● Run multi-day pilot workshop in partnership with all parties, to work iteratively on data, workflow and research questions

● Output: workflow packages, derived data, visualisations to enable research insights

Social & technical barriers to analysis of large scale digitised collections are reduced

To month 7:● Lead workshops and hackdays for the wider research community● Deploy new BL datasets (based on researcher needs)● Consolidate workflow packages and recipes● Gather requirements for future infrastructure development (beyond

scope of the project)

To month 13:● Recruit data

champions to drive wider adoption of methods

● Support community led workshops focussed on specific domain needs and challenges

● Create cookbook from recepies

Funding

18/02/2015 Enabling Complex Analysis of Large Scale Digital Collections 7

To month 3:

UCL RITS Development: £5,500Materials Development, Management and Administration:£10,025Delivery of pilot workshops: £4,100

Total, full economic cost: £19,625