23
Marrying Models and Data: Adventures in Modeling, Data Wrangling and Software Design Anne E. Thessen, Elizabeth North, Sean McGinnis and Ian Mitchell

Marrying models and data: Adventures in Modeling, Data Wrangling and Software Design

Embed Size (px)

Citation preview

Marrying Models and Data: Adventures in Modeling, Data

Wrangling and Software Design

Anne E. Thessen, Elizabeth North, Sean McGinnis and Ian Mitchell

LTRANS

• Lagrangian Transport Model

• Open Source

• http://northweb.hpl.umces.edu/LTRANS.htm

• Used to predict transport of particles, subsurface hydrocarbons, and surface oil slicks (in development)

GISR Deepwater Horizon Database

• Over 7 million georeferenced data points

• Over 9 GB

• Over 2000 analytes and parameters

Number of Data Points

Database Contents

• Oceanographic Data

– Salinity

– Temperature

– Oxygen

– More

• Chemistry Data

– Hydrocarbons

– Heavy metals

– Nutrients

– More

Database Contents

• Oceanographic Data

– Salinity

– Temperature

– Oxygen

– More

• Chemistry Data

– Hydrocarbons

– Heavy metals

– Nutrients

– More

• Air

• Water

• Tissue

• Sediment/Soil

Example Plots for One Analyte

Naphthalene, August 1-15, 2010

mg l-1

Heterogeneity

• Heterogeneity

– Terms

– Units

– Format

– Structure

Carboxybenzene

Benzoic AcidE210

C7H6O2

Dracylic Acid

Benzoic Acid

2,016 1,848

Heterogeneity

• Heterogeneity

– Terms

– Units

– Format

– Structure

n-Decane

ppb

ppbv ng/gμg/g mg/kgppt

parts per trillion

μg/kg

103 66

Metadata

• Metadata

– Missing

– Not computable0.23

UnitName

Location

Time

Attribution

Metadata

• Metadata

– Missing

– Not computable0.23

UnitName

Location

Time

Attribution

Uncertainty

Method

The Great Data Hunt

• Discovery

– Project directory

– Funding agency records

– Literature

– Internet search

n = 140

Total Data Sets Discovered

The Great Data Hunt

• Discovery

– Project directory

– Funding agency records

– Literature

– Internet search

We identified 90 relevant data sets

Relevant

The Great Data Hunt

• Discovery

• Access

– Online

– Ask directly

– Literature

Relevant

The Great Data Hunt

• Discovery

• Access

– Online

– Ask directly

– Literature

We received responses to 59% of our inquires and obtained 34% of the identified data sets

Relevant

The Great Data Hunt

• Discovery

• Access

– Online

– Ask directly

– Literature

We received responses to 59% of our inquires and obtained 34% of the identified data sets

41% of those responses were received within 24 hours and 29% were received within the first week

Days to Response

Freq

uen

cy

The Great Data Hunt

• Discovery

• Access

– Online

– Ask directly

– Literature

We received responses to 59% of our inquires and obtained 34% of the identified data sets

41% of those responses were received within 24 hours and 29% were received within the first week

0-20 email exchanges per data set

Number of Emails

Freq

uen

cy

The Great Data Hunt

• Discovery

• Access

• Citation

– Literature

– Existing requirements

– Generate new

Why didn’t people share?

• Paper not published yet – 35%

• Passed the buck – 20%

• Too busy – 10%

• Medical problems – 10%

• Poor quality – 10%

Why should anyone share?

• Mandated

• Increased citation and visibility

• Early access to GISR database

• New insights

Future Work

• Incorporate data as available

• Incorporate user feedback

• Web Access

• Users’ Guide

• Manuscripts

Thank You to Data Providers• NOAA/NOS Office of Response and

Restoration• Commonwealth Scientific and Industrial

Research Organization• Environmental Protection Commission of

Hillsborough County• National Estuarine Research Reserves• Sarah Allan• Kim Anderson• Jamie Pierson• Nan Walker• Ed Overton• Richard Aronson• Ryan Moody• Charlotte Brunner• William Patterson• Kyeong Park• Kendra Daly• Liz Kujawinski• Jana Goldman• Jay Lunden• Samuel Georgian• Leslie Wade

• Joe Montoya• Terry Hazen• Mandy Joye• Richard Camilli• Chris Reddy• John Kessler• David Valentine• Tom Soniat• Matt Tarr• Tom Bianchi• Tom Miller• Elise Gornish• Terry Wade• Steven Lohrenz• Dick Snyder• Paul Montagna• Patrick Bieber• Wei Wu• Mitchell Roffer• Dongjoo Joung• Mark Williams• Don Blake• Jordan Pino

• John Valentine• Jeffrey Baguely• Gary Ervin• Erik Cordes• Michaeol Perdue• Bill Stickle• Andrew Zimmerman• Andrew Whitehead• Alice Ortmann• Alan Shiller• Laodong Guo• A. Ravishankara• Ken Aikin• Tom Ryerson• Prabhakar Clement• Christine Ennis• Eric Williams• Ed Sherwood• Julie Bosch• Wade Jeffrey• Chet Pilley• Just Cebrian• Ambrose Bordelon

Questions?