41
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Embed Size (px)

Citation preview

Page 1: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

1

Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block

Crowdsourcing DDI Development: New Features from the CED2AR Project

Page 2: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

2

• Part of the NSF Census Research Network (NCRN) (Grant #1131848)

• Lightweight, DDI driven web application• Enables search, browsing and editing across codebooks• Provides an open API for developers• Live example at demo.ncrn.cornell.edu

What is CED2AR?

Lars Vilhuber
Generalized verbiage.
Page 3: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

3

• Emphasis on collaborative editing (small set of users)– Online editor– Versioned and tracked metadata through Git– Tied into external authentication frameworks

EDDI 2014 “Collaborative editing…”

Page 4: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

4

• Support crowdsourced DDI curation through CED2AR– Accommodating more users – Allow for application specific customization– Create incentives and guidance for users– Abstract technical barriers

Now

Page 5: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

5

• Initial metadata (DDI) has been created and ingested into a CED²AR instance

• Metadata may be– Incomplete (valid DDI but empty or non-informative fields)– Lacking user feedback (on value or constraints of variables)

• Assumption: – Archivist is not the only specialist on a particular dataset– Users collectively have information that is not initially

included in metadata

Starting point here

Lars Vilhuber
I have moved the "Backend workflow" to "extra slides" - it's not relevant here
Page 6: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

6

1. User searches through CED2AR or external search engine

2. User discovers data relevant to their query

3. User can choose to contribute structured or unstructured documentation for datasets

– No DDI knowledge required – user documents on fields, without needing to know how that fits into a particular metadata structure

– May involve creating links (provenance) to other datasets

User Workflow

Page 7: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

7

1. Search engine optimization enhancements to DDI

2. Exposing community contributions

Retaining Users3. Flexible authentication

4. Easy to use editor

5. Metadata scoring

6. Tracking and identifying community contributions

Attracting Users

Page 8: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

8

Search Engine Optimization• Expanding the interoperability of DDI

Page 9: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

9

• Support OpenID and OAuth2– Currently using Google with OAuth2– Developing connectors to work with additional providers

• CED2AR handles identity management

Authentication

Page 10: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

10

Editing• Automatic validation, and editor for rich content

Page 11: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

11

Editing

• Allows for ASCII Math

Page 12: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

12

Editing• Growing support for additional DDI fields, exposed or

not

Page 13: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

13

Metadata Scoring• Exposing sparse documentation

Page 14: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

14

User Contributions

Page 15: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

15

• Uses Git, a distributed version control system• Every aspect of the system is configurable

– Scheduled tasks check for changes– Once changes exceed threshold, they are pushed– Pending changes are pushed after a time limit or on demand

Versioning

Page 16: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture

16

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0

1. User gets copy of DDI to edit

Page 17: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture

17

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0 rev 1

Codebook 1.0 rev 2

Codebook 1.0 rev N

Codebook 1.0

1. User gets copy of DDI to edit

2. Each edit is versioned

Page 18: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture

18

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0 rev 1

Codebook 1.0 rev 2

Codebook 1.0 rev N

Codebook 1.0

Codebook 1.1

1. User gets copy of DDI to edit 3. Data provider

merges user’s edits back into official DDI

2. Each edit is versioned

Page 19: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture

Page 20: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

20

Architecture

WebApplication

Server

Local RepositoryDatabase

RemoteRepository

Page 21: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

21

Architecture

Server

RemoteRepository

CED2AR Instance

Page 22: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

22

Architecture

RemoteRepository

CED2AR Instance

CED2AR Instance

CED2AR Instance

CED2AR Instance

Page 23: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

23

Architecture

RemoteRepository

CED2AR Instance

CED2AR Instance

CED2AR Instance

CED2AR Instance

CED2AR Instance(Official)

Page 24: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

24

• Our implementation uses Bitbucket• Commit messages describe changes• Users linked by email address• Commit hashes are stored on CED2AR• Remote synchronization is optional

Remote Location

Page 25: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

25

Remote Location

Page 26: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

26

Tracking Changes

Page 27: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Continued Work: Improving merge control

27

Master Branch (Official version)

User Contributed Branch

Codebook 1.0

Codebook 1.0 rev 1

Codebook 1.0 rev 2

Codebook 1.0 rev N

Codebook 1.0

Codebook 1.1

1. User gets copy of DDI to edit 3. Data provider

merges user’s edits back into official DDI

2. Each edit is versioned

Page 28: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

28

• Workflow as described assumes metadata curator merges information

• Within the limits of a 24-hour day: what’s the likelihood that that process scales?

• Alternate: “wiki” methodology

Continued Work: The uncontrolled merge

Page 29: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

Page 30: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches2

Users pull from wiki branch into any instance of CED2AR

Page 31: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches2

Codebook rev 1

Users push back to branch manually

Page 32: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches2

3

Codebook rev 1

New users work off most recent revision by default

Page 33: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches2

3

Codebook rev 1

Codebook rev X

Page 34: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches2

3

Codebook rev 1

Codebook rev X

Page 35: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

Master Branch (Official version)

Codebook 1.0

Codebook 1.0

Wiki Branch (Community version)

1

User Branches2

3

Codebook rev 1

Codebook rev X

User is responsible for merging

Page 36: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Architecture (alternate)

36

Master Branch (Official version)

Codebook 1.0

Codebook rev X

Wiki Branch (Crowdsource version)

CED²AR User Interface exposes both

versions(with attribution)

Page 37: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

37

• Merging crowd-sourced content back into official documentation

Continued Work: Improving merge control

Page 38: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

38

Thank you!

Questions?

[email protected]

ncrn.cornell.edu

github.com/ncrncornell

Page 39: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

39

Extra slides

Page 40: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

40

• Tagging variables with a controlled vocabulary and a folksonomy

Continued Work: Facilitating Editing

Page 41: 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

41

Ingest Workflow