Data Quality at the Scale of Aggregation

Preview:

Citation preview

DATA QUALITY AT THE SCALE OF AGGREGATION

IF WE ALL USE STANDARDS, WHY IS THE DATA SO CRAP IN THE END?

QUALITY IS CONTEXTUAL

QUALITY IS CONTEXTUALWhat is the “context” of aggregation? Specifically, DPLA’s aggregation…

• Heterogeneous• Basic metadata• Reliance on metadata vs. text• Reliance on item-level metadata

DATA ISSUES IN DPLAContent Issues• Meaningless

values• Missing values• Confusing values• Incomplete values

Technical Issues• Granularity• Inappropriate

values• Lack of

normalization• Noisy data• Lack of standards

SHARING METADATAContentConsistencyCoherenceContextCommunicationConformance to standards

…but which “standard”

DPLA & DATA QUALITYData is robu

stDescriptive fields are present and have meaningful

values

Required properties have meaningful values

Data adheres to standards

All data is normalized in terms of punctuation, presence of noise, etc.

Required properties are present and semantically correct

Technical problems

Contentproblems

Contentquality

DPLA DATA QUALITY WORKFLOW

Initial AnalysisQA in BlacklightVisual review in test portal site

WE NEED MORE.

WE NEED BETTER.

EUROPEANA DQCData Quality Committee (DQC) formed within Europeana• Reviewing mandatory elements• Data checking and normalization• Evaluation of meaningful metadata values• Quality of content• Coordination with other quality-related initiatives

DPLA QUALITY INITIATIVES

WE NEED MORE.

WE NEED BETTER.

LET’S TALK.

Recommended