Upload
gretchen-gueguen
View
385
Download
0
Embed Size (px)
Citation preview
DATA QUALITY AT THE SCALE OF AGGREGATION
IF WE ALL USE STANDARDS, WHY IS THE DATA SO CRAP IN THE END?
QUALITY IS CONTEXTUAL
QUALITY IS CONTEXTUALWhat is the “context” of aggregation? Specifically, DPLA’s aggregation…
• Heterogeneous• Basic metadata• Reliance on metadata vs. text• Reliance on item-level metadata
DATA ISSUES IN DPLAContent Issues• Meaningless
values• Missing values• Confusing values• Incomplete values
Technical Issues• Granularity• Inappropriate
values• Lack of
normalization• Noisy data• Lack of standards
SHARING METADATAContentConsistencyCoherenceContextCommunicationConformance to standards
…but which “standard”
DPLA & DATA QUALITYData is robu
stDescriptive fields are present and have meaningful
values
Required properties have meaningful values
Data adheres to standards
All data is normalized in terms of punctuation, presence of noise, etc.
Required properties are present and semantically correct
Technical problems
Contentproblems
Contentquality
DPLA DATA QUALITY WORKFLOW
Initial AnalysisQA in BlacklightVisual review in test portal site
WE NEED MORE.
WE NEED BETTER.
EUROPEANA DQCData Quality Committee (DQC) formed within Europeana• Reviewing mandatory elements• Data checking and normalization• Evaluation of meaningful metadata values• Quality of content• Coordination with other quality-related initiatives
DPLA QUALITY INITIATIVES
WE NEED MORE.
WE NEED BETTER.
LET’S TALK.