16
Martin Graham & Jessie Kennedy Edinburgh Napier University VESPER Visual Exploration of Species-Referenced Repositories

TDWG 2013 Vesper

Embed Size (px)

Citation preview

Page 1: TDWG 2013 Vesper

Martin Graham & Jessie KennedyEdinburgh Napier University

VESPERVisual Exploration of Species-Referenced Repositories

Page 2: TDWG 2013 Vesper

• VESPER – an exploration into data quality issues for Darwin Core Archives (DWCA)

• DWCA’s are files for storing detailed species-based data sets

• How does a user know which data sets are useful and complete?

Introduction

Page 3: TDWG 2013 Vesper

• GBIF has tools to test DWCA validity

• This work is about visualising data we assume is “valid” but are unsure of “usefulness”– Taxonomy is broken– Dates are wrong– Lions in the sea

• In many cases the usefulness of such data is only seen when visualised in context

Valid vs. Useful

Page 4: TDWG 2013 Vesper

• Web-based visualisation of DWCAs– Uses HTML5

• SVG, CSS3, FileWriters, ArrayBuffers– D3 toolkit– Client side only

• Visualise basic dimensions of data– Taxonomy– Geography– Time– & Miscellaneous Stats

Approach

Page 5: TDWG 2013 Vesper

Darwin Core Archives

Meta.xml

Eml.xml

CoreTaxa/Occurrence Data

Extension

Extension

Meta Files (XML)

Data Files (CSV)

De

scribes

Exactly one

Zero or more

Extension ID == Core ID

Page 6: TDWG 2013 Vesper

• Zip files make things smaller– Good for network transport– But analysing the data means we have to make things big

again

Zapped by Zip

Expand a lot

Expand even more(String copying, UTF-16 etc)

Page 7: TDWG 2013 Vesper

• Partial Unzip• Analyse fields listed in meta file

– Disregard verbose fields

• Find combinations of fields that can be used to generate a visualisation

• List choice of available visualisations for a meta.xml and just extract chosen fields

Zip Zapped

Implicit Taxonomy acceptedNameUsageID, parentNameUsageID

Explicit Taxonomy Any of Kingdom, order, family, genus etc

Map decimalLongitude, decimalLatitude

Timeline eventDate

Page 8: TDWG 2013 Vesper

• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues

Taxonomy

Page 9: TDWG 2013 Vesper

• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues

Taxonomy

Page 10: TDWG 2013 Vesper

• Based on popular leaflet.js library– And Markercluster plugin– Some adaptations to show selected items

Geography

Page 11: TDWG 2013 Vesper

• Simple bar chart– With rangeslider– Zoom in and see yearly patterns (i.e not much at xmas)

Temporal

Page 12: TDWG 2013 Vesper

• Sanity check - Empty data count

Miscellaneous

Page 13: TDWG 2013 Vesper

• Taxonomic fan-out for hollow curve anomalies

• Export selected IDs– These can be saved or sent somewhere else

Miscellaneous

Page 14: TDWG 2013 Vesper

• Selections in one view are reflected in the other views for the same data– Multiple views, linking

Selection

Page 15: TDWG 2013 Vesper

• Javascript visualisations for DWCA archives

• Quickly shows areas of quality issue

• Can handle large archives if only key fields are analysed

Conclusion

Page 16: TDWG 2013 Vesper

• http://www.soc.napier.ac.uk/~cs22/vesperDemo/vesper/demoNew.html– Feedback welcome

• Thanks to GBIF, Canadensys, EMBL for data

• Funded by BBSRC

• Ask for a demo

Fin