20
Data Life Cycle: Introduction, Definitions and Considerations EUDAT, Sept. 25, 2014 Prof. Peter Fox ([email protected], @taswegian, #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive Science/ IT and Web Science) Rensselaer Polytechnic Institute, Troy, NY USA

Data Life Cycle: Introduction, Definitions and Considerations · PDF fileData Life Cycle: Introduction, Definitions and Considerations EUDAT, Sept. 25, ... Data Life Cycle embedded

Embed Size (px)

Citation preview

Data Life Cycle: Introduction,

Definitions and Considerations

EUDAT, Sept. 25, 2014

Prof. Peter Fox ([email protected], @taswegian, #twcrpi)

Tetherless World Constellation Chair, Earth and Environmental Science/

Computer Science/ Cognitive Science/ IT and Web Science)

Rensselaer Polytechnic Institute, Troy, NY USA

Life Cycle

2

Definitions

• Data (management) life-cycle broad elements -

– Acquisition: Process of recording or generating a concrete

artefact from the concept (see transduction)

– Curation: The activity of managing the use of data from its

point of creation to ensure it is available for discovery and

re-use in the future

– Preservation: Process of retaining usability of data in

some source form for intended and unintended use

• Stewardship: Process of maintaining integrity for

acquisition, curation, preservation

• BUT…3

Definitions ctd.

• (Data) Management: Process of

arranging for discovery, access and use

of data, information and all related

elements. Also oversees or effects

control of processes for acquisition,

curation, preservation and stewardship.

Involves fiscal and intellectual

responsibility.

4

5

Digital Curation Centre

6

MIT DDI Alliance Life

Cycle

7

Data-Information-Knowledge

Ecosystem

8

Data Information Knowledge

Producers Consumers

Context

Presentation

Organization

Integration

Conversation

Creation

Gathering

Experience

Data Life Cycle embedded

in Research Life Cycle

• Information Life Cycle

• Knowledge Life Cycle

Type of knowledge

created• Tacit (created and stored informally):

– Human memory

– Localize, e.g. hard drive of the computer

– Movement of tacit information into a

formalized structure

• Explicit (created and sorted formally):

– Network shared

– Network Web site/intranet

– Informal knowledge-management system

– Document-management system

– Formal KM system10

Curation…

11

Producers Consumers

Quality Control

Fitness for Purpose Fitness for Use

Quality Assessment

Trustee Trustor

Acquisition meets

science

12

Enough with the

unlabelled ARROWS and

INTERFACES!

Workflows and Life Cycles

• Yann covered much of this but the

question remains:

– Why is the life cycle not “just” a workflow?

– Well it is: sort of…..

• Workflows give internal “provenance”

• To capture embedding of data life cycle

in research life cycle + external

provenance

20080602 Fox VSTO et al. 15

• Provenance in this

data pipeline/ life

cycle?

• Provenance is

metadata in

context

• What context?

– Who you are?

– What you are

asking?

– What you will use

the answer for?

Modeling a Provenance

Use Case – data workflow

• What calibrations

have been applied

to this image?

Provenance concepts

Solar Science concepts

Data

Product

Data Processing concepts

Data

Filtering

Process

Raw

Image

Optics

Calibration

Process

Data

Calibration

Process

Flat-field

Calibration

Angle of

Incidence

Calibration

Junk Data

Filter

16

What calibrations have been applied

to this image?

• We construct a query returns any

individuals with type Calibration

used as the InferenceRule in the

justification from any artifact the

current artifact was derived from.

• We assume that any calibration

applied to an artifact the current

artifact was derived from can also

be considered as „applied‟ to the

current artifact, and that the

wasDerivedFrom property is

transitive17

#Image

#_A2

#Intermediate

2

#_A1

#Intermediate

1

#_A0

#RawImage

#_A0

wa

sD

erive

dF

rom

wa

sD

erive

dF

rom

wa

sD

erive

dF

rom

#Angle of

Incidence

Calibration

#Flat Field

Calibration

Calibration

Calibration

rdf:type

rdf:type

ha

sIn

fere

nceR

ule

ha

sIn

fere

nceR

ule

SourceUsage

Concept Alignment

(PML)

Data

Capture

Instrument

Data

Product

Data

Calibration

Raw

Data

NodeSet

Justification

Conclusion

NodeSet

Justification

Conclusion

hasAntecedentList

Observation

Period

hasSourceUsage

Source

DateTime

Rule

Engine

RuleCalibration

18

Plug for provenance in

life cycle

• Provenance concepts describe how

domain concepts are related

• Domain and provenance models

should be independent, but aligned

• Aligning with a well-supported

provenance model can enhance

interoperability and tool support

• Aligned knowledge base supports

complex multi-domain query and

search

19

Life cycle is a complex issue

but no longer intracable

• Must be

– Managed

– Modelled

– Documented

– Contextualized

Information models

Provenance

• As part of the use case, but also often

outside it (pre-condition, trigger, …)20