Transcript
Page 1: Big Data in the Arts and Humanities

• It has been calculated that the books in the Library of Congress represent 235 terabytes of data

• In 1994, at a conference on data management at ULCC, I heard the word ‘petabyte’ for first time: ‘the problems which confront the meteorologist today will worry the arts and humanities in ten years time’

• A petabyte equals approximately four Library of Congresses• The Large Hadron Collider generates one petabyte of data

every second (which would cost $1 trillion a year to store)• The Square Kilometre Array will generate 4 petabytes of

data a second• A new word for me in 2013: zettabyte (equivalent of 250

billion DVDs)• By 2020, digital universe will be 35 zettabytes (44 times as

big as in 2009)

Page 2: Big Data in the Arts and Humanities

Is there big data in the arts and humanities?

Letter of Gladstone to Disraeli, 1878: British Library, Add. MS. 44457, f. 166

The political and literary papers of Gladstone preserved in the British Library comprise 762 volumes containing approx. 160,000 documents.

Page 3: Big Data in the Arts and Humanities

George W. Bush Presidential Library:200 million e-mails

4 million photographs

Page 4: Big Data in the Arts and Humanities
Page 5: Big Data in the Arts and Humanities
Page 6: Big Data in the Arts and Humanities

• FamilySearch Genealogy Information from around the world 2.8 billion names; 300 million added every year 560 million digital images 10 million hits per day; 7,000 transactions / minute Digitising microfilm to JPEG2000 Digitised in Utah, preserved in Maryland Initial storage 7Pb Ingest rate : 20Tb per day, rising to 50Tb per day

Page 7: Big Data in the Arts and Humanities

Anglo-American Legal Tradition (http://aalt.law.uh.edu) contains over 8,500,000 images of the major series of legal records for medieval and early modern England, but how do we navigate, annotate and share these materials?

Page 8: Big Data in the Arts and Humanities

Credit: Mark Basham, Diamond,

Raw images

(2,600 X 4,000 px each image/projection)(Total: 6,000 images/projections)

Sinogramsfrom the projections

(Total: 4,000 sinogramsEach sinogram: 2,600 x 4,000 x 1 px)

A sinogram

Reconstructed slicesin a 3d volume

(Total: 4,000 slices)

2,600 px

4,000 px

6, 000 projections

A slice

Tomographic Reconstruction

~100Gb per 3D image - ~40 mins on 16 GPU cluster ~10 TB per experiment” - ~3 days on site~ 1PB per year (per beamline)

Working on using the Emerald (376 GPUs)

Page 9: Big Data in the Arts and Humanities

Use of Diamond Light Source to image fragile manuscripts

Use of Diamond Light Source to investigate silver degradation in altarpieces

Page 10: Big Data in the Arts and Humanities

Laser Scans of St Kilda and Rosslyn Chapel, produced by Digital Design Studio, Glasgow School of Art, as part of the ‘Scottish Ten’ project. Scans such as these produce data sets

containing billions of points, amounting to many terabytes. Such data can have great cultural and economic value.

Page 11: Big Data in the Arts and Humanities

http://worldservice.prototyping.bbc.co.uk/

Page 12: Big Data in the Arts and Humanities
Page 13: Big Data in the Arts and Humanities
Page 14: Big Data in the Arts and Humanities

Big Data in the Humanities• Frequently (but not always) heterogeneous, comprising different

layers of information: comparable to climate and environmental data• Extremely varied in format, from linguistic corpora to images and

video. Navigation and processing major issue, even when in digital format

• Nevertheless strong common characteristics with big data issues in other disciplines and scope for shared dialogue

• One characteristic of ‘big data’ hype is use of predictive analytics. Can predictive analytics work with some arts and humanities data (eg sound?)

• But data driven research (as opposed to hypothesis driven research) is potentially important in arts and humanities

• Arts and humanities can potentially help in dealing with big data by new interpretative approaches

Page 15: Big Data in the Arts and Humanities
Page 16: Big Data in the Arts and Humanities

Lise Autogena and Joshua Portway, Most Blue Skies, 2010: http://www.autogena.org/mbs.html

Page 17: Big Data in the Arts and Humanities

Fabio Lattanzi Antinori,The Obelisk (2012): Open Data InstituteScanning online news websites in real-time, the artwork changes from

opaque to transparent, according to the quantity of references to the four main crimes against peace (genocide, crimes against humanity, crimes of

aggression and crimes of war)

Page 18: Big Data in the Arts and Humanities

Available at: http://mith.umd.edu/sharedhorizons/resources/

Page 19: Big Data in the Arts and Humanities
Page 20: Big Data in the Arts and Humanities
Page 21: Big Data in the Arts and Humanities

Recommended