EricKansa_PublishingandPushing_ParallelC4

  • Upload
    bunjah

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    1/49

    Publishing and Pushing:Mixing Models for Communicating

    Research Data in Archaeology

    Sarah Whitcher KansaThe Alexandria Archive Institute

    & Open Context

    Unless otherwise indicated, this work is licensed under a Creative Commons

    Attribution 3.0 License

    Benjamin ArbuckleUniversity of North Carolina,

    Chapel Hill

    Eric C. Kansa (@ekansa)UC Berkeley D-Lab

    & Open Context

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    2/49

    Introduction

    Challenges in Reusing Data1. Background

    2. Data publishing workflow

    3. Data curation and dynamism

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    3/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    4/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    5/49

    Need more carrots!

    1. Citation, credit,intellectually valued

    2. Research outcomes(new insights from datareuse!)

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    6/49

    EOL Computable DataChallenge

    (Ben Arbuckle, Sarah

    W. Kansa, Eric Kansa)

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    7/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    8/49

    Large scale data sharing &integration for exploring the

    origins of farming.Funded by EOL / NEH

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    9/49

    1. 300,000 bone specimens

    2. Complex: dozens, up to 110descriptive fields

    3. 34 contributors from 15

    archaeological sites4. More than 4 person years

    of effort to create the data !

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    10/49

    Relatively collaborative bunch,Ben Arbuckle cultivatedrelationships & built trust over

    years prior to EOL funding.

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    11/49

    204: Dynamics of Data Reuse when Aggregating Data through Time and

    Space: The Case of Archaeology and Zoology

    Elizabeth Yakel; Ixchel Faniel; Rebecca Frank

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    12/49

    Introduction

    Challenges in Reusing Data1. Background

    2. Data publishing workflow

    3. Data curation and dynamism

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    13/49

    1. Referenced by US NationalScience Foundation andNational Endowment for the

    Humanities for DataManagement

    2. Data sharing aspublishing metaphor

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    14/49

    Raw Data: Idiosyncratic,sometimes highly coded,often inconsistent

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    15/49

    Raw Data Can Be Unappetizing

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    16/49

    Publishing Workflow

    Improve / Enhance

    1. Consistency

    2. Context(intelligibility)

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    17/49

    Sometimes data is betterserved cooked

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    18/49

    - Documentation

    - Review, editing- Annotation

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    19/49

    - Documentation

    - Review, editing- Annotation

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    20/49

    - Documentation

    - Review, editing- Annotation

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    21/49

    - Documentation

    - Review, editing- Annotation

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    22/49

    - Documentation

    - Review, editing- Annotation

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    23/49

    Ovis orientalis

    Code: 14

    Wild

    sheep

    Code: 70

    Code: 16

    Ovis orientalis

    Code: 15

    Sheep,

    wild

    O.

    orientalis

    Sheep

    (wild)

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    24/49

    - Documentation

    - Review, editing- Annotation

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    25/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    26/49

    Controlled vocabulary

    Linked Data applications

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    27/49

    Sheep/goathttp://eol.org/pages/32609438/

    1. Needed to mint new

    concepts likesheep/goat

    2. Vocabularies need tobe responsive formultidisciplinaryapplications

    http://eol.org/pages/311906/http://eol.org/pages/311906/
  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    28/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    29/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    30/49

    Linking to UBERON1. Needed a controlled vocabulary for

    bone anatomy2. Better data modeling than common in

    zooarchaeology, adds quality.

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    31/49

    Linking to UBERON1. Models links between anatomy,

    developmental biology, and genetics2. Unexpected links between the

    Humanities and Bioinformatics!

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    32/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    33/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    34/49

    7000 BC (many pigs, cattle)

    7500 BC (sheep + goat dominate, few pigs, few cattle)

    6500 BC (few pigs, mixing with wild animals?)

    8000 BC (cattle, pigs,

    sheep + goats)

    Nota neat model of progress to adopt a more productiveeconomy. Very different, sometimes piecemeal adoption in

    different regions.

    Separate coastal and inland routes for the spread of domestic

    animals, over a 1000-year time period.

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    35/49

    Easy to Align

    1. Animal taxonomy

    2. Bone anatomy

    3. Sex determinations

    4. Side of the animal

    5. Fusion (bone growth, up toa point)

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    36/49

    Hard to Align (poor modeling, recording)

    1. Tooth wear (age)

    2. Fusion data3. Measurements

    Despite common research methods!!

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    37/49

    Under the hood exposure

    will lead to better datadocumentation practices?

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    38/49

    Nobody expected their datato see wider scrutiny either..

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    39/49

    Professional expectations for data reuse

    1. Need better data modeling(than feasible with, cough,Excel)

    2. Data validation,normalization

    3. Requires training &incentives for researchersto care more about quality

    of their data!

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    40/49

    Data are challenging!

    1. Decoding takes 10x longer

    2. Data management plans shouldalso cover data modeling, qualitycontrol (esp. validation)

    3. More work needed modelingresearch methods (esp. sampling)

    4. Editing, annotation requires lots ofback-and-forth with data authors

    5. Data needs investment to beuseful!

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    41/49

    Introduction

    Challenges in Reusing Data1. Background

    2. Data publishing workflow

    3. Data curation and dynamism

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    42/49

    Investing in Data is a Continual Need

    1. Data and code co-evolve. Newvisualizations, analysis may revealunseen problems in data.

    2. Data and metadata change routinely(revised stratigraphy requires ongoingupdates to data in this analysis)

    3. Problems, interpretive issues in data(and annotations) keep cropping up.

    4. Is publ i sh inga bad metaphor implyinga static product?

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    43/49

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    44/49

    Data sharing as publication

    Data sharing as open sourcerelease cycles?

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    45/49

    Data sharing as publication

    Data sharing as open sourcerelease cycles?

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    46/49

    Data sharing as publication

    AND

    Data sharing as open sourcerelease cycles

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    47/49

    One does not simplywalk into Mordor

    Academia and share

    usable data

    Image Credit: Copyright Newline Cinema

    Fi l Th ht

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    48/49

    Final Thoughts

    Data require intellectualinvestment, methodological andtheoretical innovation.

    Institutional structures poorlyconfigured to support datapowered research

    New professional roles needed,but who will pay for it?

    Th k !

  • 8/12/2019 EricKansa_PublishingandPushing_ParallelC4

    49/49

    Thank you!

    IDCC reviewers(excellent, very helpful

    comments!)