13
Developing Reliable Automatic Metadata Generation: Feedback from MatDL Pathway NSDL Annual Meeting , Washington, DC November 6-8 2007 Advancing NSDL Networks Cathy S. Lowe, Laura M. Bartolo, Kent State University

Developing Reliable Automatic Metadata Generation: Feedback from MatDL Pathway

  • Upload
    jensen

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Developing Reliable Automatic Metadata Generation: Feedback from MatDL Pathway. NSDL Annual Meeting , Washington, DC November 6-8 2007 Advancing NSDL Networks. Cathy S. Lowe, Laura M. Bartolo, Kent State University. Outline. MatDL Pathway - PowerPoint PPT Presentation

Citation preview

Page 1: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

Developing Reliable Automatic Metadata

Generation: Feedback from MatDL

Pathway NSDL Annual Meeting , Washington, DC November 6-8 2007

Advancing NSDL Networks

Cathy S. Lowe, Laura M. Bartolo, Kent State University

Page 2: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Outline MatDL Pathway iVia metadata generation for PDFs & test set Evolution of result set

description title keywords author

Next Steps

Page 3: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

NSF MS Initiatives(NIRTs, MRSECs, IMIs)

•Soft Matter Wiki

NSF MS Initiatives(NIRTs, MRSECs, IMIs)

•Soft Matter Wiki

Teaching ResourceDevelopment•MS Teaching Archive

Teaching ResourceDevelopment•MS Teaching Archive

Code Development• Matforge

•NIST FiPy•CMU•DOE CMSN

Code Development• Matforge

•NIST FiPy•CMU•DOE CMSN

Virtual Labs•Intro to Solid State Chem•Intro to Bio Physics•Modern Chemistry

Virtual Labs•Intro to Solid State Chem•Intro to Bio Physics•Modern Chemistry

Stewardship•MatDL Repository

Stewardship•MatDL Repository

http://matdl.orghttp://teaching.matdl.org http://matdlforge.org

http://matdl.org/virtuallabshttp://matdl.org/matdlwiki

NSDL Materials Digital Library Pathway

Page 4: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

iVia metadata generation & original test set Worked with iVia metadata generation only Test set

PDF format 83 undergraduate research papers from

Cornell Center for Materials Research (CCMR) REU program

Page 5: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Page 6: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Page 7: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Evolution of result set Metadata generation for PDFs not available

(2005) Metadata generation for PDFs available

(2006) – improving over time description title keyword author ** recently available

Page 8: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Description generationGood accuracy for explicit “Abstract”

Correct - ~38% Partially correct – ~33% Incorrect/not generated – ~29%

Page 9: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Title generationVery good accuracy

precision 91.09% recall 89.30%

Page 10: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Keyword generationManually rated 5 keyphrases per document – Good accuracy

Highly descriptive - 39%Acceptable - 41%Unacceptable - 20%

Page 11: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Author generation --new functionalityApplied to original sample:

Correct - 45% Partially correct - 27% Incorrect/not generated - 28%

Page 12: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Next Steps Collaboration mutually beneficial for tool

developers & NSDL community-based repositories

Continue to work with tool as it improves Continue/expand working with MRSECs

REU resources

Page 13: Developing Reliable Automatic Metadata Generation:   Feedback from MatDL Pathway

NSDL Annual Meeting 2007 Washington, DC

Thank you & Questions?

[email protected] NSDL Materials Digital Library Pathway is supported by the

National Science Foundation DUE-0532831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.