TDWG 2013 Vesper

Preview:

Citation preview

Martin Graham & Jessie KennedyEdinburgh Napier University

VESPERVisual Exploration of Species-Referenced Repositories

• VESPER – an exploration into data quality issues for Darwin Core Archives (DWCA)

• DWCA’s are files for storing detailed species-based data sets

• How does a user know which data sets are useful and complete?

Introduction

• GBIF has tools to test DWCA validity

• This work is about visualising data we assume is “valid” but are unsure of “usefulness”– Taxonomy is broken– Dates are wrong– Lions in the sea

• In many cases the usefulness of such data is only seen when visualised in context

Valid vs. Useful

• Web-based visualisation of DWCAs– Uses HTML5

• SVG, CSS3, FileWriters, ArrayBuffers– D3 toolkit– Client side only

• Visualise basic dimensions of data– Taxonomy– Geography– Time– & Miscellaneous Stats

Approach

Darwin Core Archives

Meta.xml

Eml.xml

CoreTaxa/Occurrence Data

Extension

Extension

Meta Files (XML)

Data Files (CSV)

De

scribes

Exactly one

Zero or more

Extension ID == Core ID

• Zip files make things smaller– Good for network transport– But analysing the data means we have to make things big

again

Zapped by Zip

Expand a lot

Expand even more(String copying, UTF-16 etc)

• Partial Unzip• Analyse fields listed in meta file

– Disregard verbose fields

• Find combinations of fields that can be used to generate a visualisation

• List choice of available visualisations for a meta.xml and just extract chosen fields

Zip Zapped

Implicit Taxonomy acceptedNameUsageID, parentNameUsageID

Explicit Taxonomy Any of Kingdom, order, family, genus etc

Map decimalLongitude, decimalLatitude

Timeline eventDate

• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues

Taxonomy

• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues

Taxonomy

• Based on popular leaflet.js library– And Markercluster plugin– Some adaptations to show selected items

Geography

• Simple bar chart– With rangeslider– Zoom in and see yearly patterns (i.e not much at xmas)

Temporal

• Sanity check - Empty data count

Miscellaneous

• Taxonomic fan-out for hollow curve anomalies

• Export selected IDs– These can be saved or sent somewhere else

Miscellaneous

• Selections in one view are reflected in the other views for the same data– Multiple views, linking

Selection

• Javascript visualisations for DWCA archives

• Quickly shows areas of quality issue

• Can handle large archives if only key fields are analysed

Conclusion

• http://www.soc.napier.ac.uk/~cs22/vesperDemo/vesper/demoNew.html– Feedback welcome

• Thanks to GBIF, Canadensys, EMBL for data

• Funded by BBSRC

• Ask for a demo

Fin

Recommended