View
221
Download
0
Category
Preview:
Citation preview
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
1/49
Publishing and Pushing:Mixing Models for Communicating
Research Data in Archaeology
Sarah Whitcher KansaThe Alexandria Archive Institute
& Open Context
Unless otherwise indicated, this work is licensed under a Creative Commons
Attribution 3.0 License
Benjamin ArbuckleUniversity of North Carolina,
Chapel Hill
Eric C. Kansa (@ekansa)UC Berkeley D-Lab
& Open Context
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
2/49
Introduction
Challenges in Reusing Data1. Background
2. Data publishing workflow
3. Data curation and dynamism
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
3/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
4/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
5/49
Need more carrots!
1. Citation, credit,intellectually valued
2. Research outcomes(new insights from datareuse!)
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
6/49
EOL Computable DataChallenge
(Ben Arbuckle, Sarah
W. Kansa, Eric Kansa)
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
7/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
8/49
Large scale data sharing &integration for exploring the
origins of farming.Funded by EOL / NEH
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
9/49
1. 300,000 bone specimens
2. Complex: dozens, up to 110descriptive fields
3. 34 contributors from 15
archaeological sites4. More than 4 person years
of effort to create the data !
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
10/49
Relatively collaborative bunch,Ben Arbuckle cultivatedrelationships & built trust over
years prior to EOL funding.
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
11/49
204: Dynamics of Data Reuse when Aggregating Data through Time and
Space: The Case of Archaeology and Zoology
Elizabeth Yakel; Ixchel Faniel; Rebecca Frank
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
12/49
Introduction
Challenges in Reusing Data1. Background
2. Data publishing workflow
3. Data curation and dynamism
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
13/49
1. Referenced by US NationalScience Foundation andNational Endowment for the
Humanities for DataManagement
2. Data sharing aspublishing metaphor
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
14/49
Raw Data: Idiosyncratic,sometimes highly coded,often inconsistent
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
15/49
Raw Data Can Be Unappetizing
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
16/49
Publishing Workflow
Improve / Enhance
1. Consistency
2. Context(intelligibility)
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
17/49
Sometimes data is betterserved cooked
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
18/49
- Documentation
- Review, editing- Annotation
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
19/49
- Documentation
- Review, editing- Annotation
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
20/49
- Documentation
- Review, editing- Annotation
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
21/49
- Documentation
- Review, editing- Annotation
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
22/49
- Documentation
- Review, editing- Annotation
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
23/49
Ovis orientalis
Code: 14
Wild
sheep
Code: 70
Code: 16
Ovis orientalis
Code: 15
Sheep,
wild
O.
orientalis
Sheep
(wild)
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
24/49
- Documentation
- Review, editing- Annotation
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
25/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
26/49
Controlled vocabulary
Linked Data applications
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
27/49
Sheep/goathttp://eol.org/pages/32609438/
1. Needed to mint new
concepts likesheep/goat
2. Vocabularies need tobe responsive formultidisciplinaryapplications
http://eol.org/pages/311906/http://eol.org/pages/311906/8/12/2019 EricKansa_PublishingandPushing_ParallelC4
28/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
29/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
30/49
Linking to UBERON1. Needed a controlled vocabulary for
bone anatomy2. Better data modeling than common in
zooarchaeology, adds quality.
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
31/49
Linking to UBERON1. Models links between anatomy,
developmental biology, and genetics2. Unexpected links between the
Humanities and Bioinformatics!
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
32/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
33/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
34/49
7000 BC (many pigs, cattle)
7500 BC (sheep + goat dominate, few pigs, few cattle)
6500 BC (few pigs, mixing with wild animals?)
8000 BC (cattle, pigs,
sheep + goats)
Nota neat model of progress to adopt a more productiveeconomy. Very different, sometimes piecemeal adoption in
different regions.
Separate coastal and inland routes for the spread of domestic
animals, over a 1000-year time period.
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
35/49
Easy to Align
1. Animal taxonomy
2. Bone anatomy
3. Sex determinations
4. Side of the animal
5. Fusion (bone growth, up toa point)
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
36/49
Hard to Align (poor modeling, recording)
1. Tooth wear (age)
2. Fusion data3. Measurements
Despite common research methods!!
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
37/49
Under the hood exposure
will lead to better datadocumentation practices?
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
38/49
Nobody expected their datato see wider scrutiny either..
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
39/49
Professional expectations for data reuse
1. Need better data modeling(than feasible with, cough,Excel)
2. Data validation,normalization
3. Requires training &incentives for researchersto care more about quality
of their data!
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
40/49
Data are challenging!
1. Decoding takes 10x longer
2. Data management plans shouldalso cover data modeling, qualitycontrol (esp. validation)
3. More work needed modelingresearch methods (esp. sampling)
4. Editing, annotation requires lots ofback-and-forth with data authors
5. Data needs investment to beuseful!
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
41/49
Introduction
Challenges in Reusing Data1. Background
2. Data publishing workflow
3. Data curation and dynamism
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
42/49
Investing in Data is a Continual Need
1. Data and code co-evolve. Newvisualizations, analysis may revealunseen problems in data.
2. Data and metadata change routinely(revised stratigraphy requires ongoingupdates to data in this analysis)
3. Problems, interpretive issues in data(and annotations) keep cropping up.
4. Is publ i sh inga bad metaphor implyinga static product?
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
43/49
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
44/49
Data sharing as publication
Data sharing as open sourcerelease cycles?
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
45/49
Data sharing as publication
Data sharing as open sourcerelease cycles?
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
46/49
Data sharing as publication
AND
Data sharing as open sourcerelease cycles
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
47/49
One does not simplywalk into Mordor
Academia and share
usable data
Image Credit: Copyright Newline Cinema
Fi l Th ht
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
48/49
Final Thoughts
Data require intellectualinvestment, methodological andtheoretical innovation.
Institutional structures poorlyconfigured to support datapowered research
New professional roles needed,but who will pay for it?
Th k !
8/12/2019 EricKansa_PublishingandPushing_ParallelC4
49/49
Thank you!
IDCC reviewers(excellent, very helpful
comments!)
Recommended