Upload
teagan-gobble
View
218
Download
5
Tags:
Embed Size (px)
Citation preview
Data cleansing for Dummies:Google to the rescue!!
Dave SmithPetrology Collections Manager
Drag picture to placeholder or click icon to add
The Natural History Museum, London
Architectural wonders
• Waterhouse building opened in 1881
• Steel frame and terracotta
• Purpose built for natural history collections
• 1000 staff
• 350 science staff
• 72 million specimens (estimated)
• Life Sciences
– Plants, animals, birds, insects
• Earth Sciences
– Minerals & gems, rocks, fossils, meteorites
The Museum
My role
• Geologist by training• Collections Manager for rock collections
– 125,000 rocks– 10,000 decorative stones– 37,000 ocean sediments– 16,000 ore specimens
• Departmental EMu administrator– Registry management– Report writing– Training & documentation– EMu support & upgrade testing– Communication
‘Fingers in lots of pies’
• Have been involved in cross-museum initiatives involving EMu.
Data cleansing for Dummies:Google to the rescue!!
Dave SmithPetrology Collections Manager
011100101001010101001010001000101111100001010100101001001000100101011101011001001001000101001010010101
The problem
Core Information
• 89,000 Records (73%)
– Identification = 52,100
– Provenance = 64,215
– Acquisition = 38,700
– Storage = 14,300
Numbers
Register volume Acquisition records Specimen records
1-5 634 19,283
1-5 (supplementary) 501 (490) 1965 (1927)
1-5 (merged) 1124 21,210
6-11 1832 30,080
Geological Society 510 9,852
TOTAL 3466 63,107
The Problem
• Data sits outside Emu – how to get it in?
• Not as easy as it sounds – many barriers…
• Notes field used for data with uncertain placeholder.
• Sites data of variable levels of atomisation depending on experience of digitiser.
Acquisition Lot entry
The Problem
• Data sits outside Emu – how to get it in?
• Not as easy as it sounds – many barriers…
• Notes field used for data with uncertain placeholder.
• Sites data of variable levels of atomisation depending on experience of digitiser.
• Approx. 95% of specimens have a record in EMu with a minimum of registration number. Once cleaned - How to update records without overwriting enhanced data
• Unfamiliarity with Access
• Short time periods for data cleansing.
The Solution
• Google Refine
• Open Refine (Github)
• Personal web service
• Runs in your browser
The demo
Benefits
• Intuitive User Interface
• Powerful editing / data manipulation functions
• Can’t make mistakes! Endless undo…..!
• Pick up where you left it Maintains history
• Link to open-data sources to validate your data
• Augment your data with free open data sources.