19
Large Scale Data Clean-ups & Challenges for the Library Ksenija Mincic-Obradovic Asia Pacific Metadata Advisory Board Meeting 3-4 August 2014 Pattaya, Thailand

Large Scale Data Clean-ups & Challenges for the Library

Embed Size (px)

Citation preview

Page 1: Large Scale Data Clean-ups & Challenges for the Library

Large Scale Data Clean-ups & Challenges

for the Library

Ksenija Mincic-Obradovic

Asia Pacific Metadata Advisory Board Meeting 3-4 August 2014Pattaya, Thailand

Page 2: Large Scale Data Clean-ups & Challenges for the Library

“Data cleaning is considered as a main challenge in the era of big data, due to the increasing volume, velocity and variety of data.”

(Tang, 2014)

Page 3: Large Scale Data Clean-ups & Challenges for the Library

Data cleaning process:

• Identifying data errors

• Repairing data errors

• Preventing data errors

Page 4: Large Scale Data Clean-ups & Challenges for the Library

In LIBRARIES,we might have to clean up data to:

• Remove ceased e- titles• Update changed URLs• Enable DDA/PDA purchasing• Perform gap analysis• Enable system migrations• Enable system integrations• Improve display in ILS

Page 5: Large Scale Data Clean-ups & Challenges for the Library

Main types of mistakes in e-book records

• MARC21 errors– E.g.: coding, wrong indicators, wrong characters… – Consequence: wrong indexing, records rejected…

• Wrong identifiers– 001, 010, 020/022, 035, 856z– Consequence: wrong matching, duplicates…

• Mistakes in description fields– E.g. wrong title, wrong author,– Consequence: bad display, faceting doesn’t work…

• Lack of URLs– Consequence: e-book cannot be accessed

Page 6: Large Scale Data Clean-ups & Challenges for the Library

Example 1: Fixing MARC21 errors in vendors/publishers files with e-book records

• Use programmes such as MARCReport and MarcEdit to identify errors

• Use MARCGlobal and MARCEdit to fix data

• Load file in the local catalogue

Page 7: Large Scale Data Clean-ups & Challenges for the Library
Page 8: Large Scale Data Clean-ups & Challenges for the Library
Page 9: Large Scale Data Clean-ups & Challenges for the Library
Page 10: Large Scale Data Clean-ups & Challenges for the Library

Example 2:Updating the NUC(National Union Catalogue)

• New Zealand national level project • Started in 2008 • Automated way of reporting changes to the

library holdings (additions and deletions) to the NUC

• Using OSMOSIS, a software tool, developed by the TMQ (Fla)

Page 11: Large Scale Data Clean-ups & Challenges for the Library

Identifiers for Matching Bibliographic Data

• 001 - Control Number • 010 - Library of Congress Number• 020/022 - ISBN/ISSN• 035 - System Control Number• 856 $z e-book SpringerLink

Page 12: Large Scale Data Clean-ups & Challenges for the Library

OSMOSIS Report (11/2014

Page 13: Large Scale Data Clean-ups & Challenges for the Library

020(ISBN)

Page 14: Large Scale Data Clean-ups & Challenges for the Library

020(ISBN)

Page 15: Large Scale Data Clean-ups & Challenges for the Library

Recommendations

• Check and clean data in vendor files before loading to your catalogue.

• Follow national and international standards in all aspects.

• Perform regular database maintenance.• Encourage cooperation between libraries and

vendors/publishers.

Page 16: Large Scale Data Clean-ups & Challenges for the Library
Page 17: Large Scale Data Clean-ups & Challenges for the Library

References

Beall, J. (2005). 10 ways to improve data quality: with a coordinated effort, your library can make significant progress in cleaning up its online catalog. American Libraries, 36(3), 36+. Retrieved from http://go.galegroup.com.ezproxy.auckland.ac.nz/ps/i.do?id=GALE%7CA139719467&v=2.1&u=learn&it=r&p=AONE&sw=w&asid=8bc9b1a0d979542543f18fc581b25da2

Rahm, E. (2004) Data Cleaning: Problems and Current Approaches . In Galindo, F., Takizawa, Makoto, & Traunmuller, R. (2004). Database and expert systems applications 15th International Conference, DEXA 2004, Zaragoza, Spain, August 30-September 3, 2004 : Proceedings (Lecture notes in computer science ; 3180). Berlin ; New York: Springer.

Tang, N. (2014). Big Data Cleaning. In Chen, L. (2014). Web technologies and applications : 16th Asia-Pacific Web Conference, APWeb 2014, Changsha, China, September 5-7, 2014. Proceedings (Lecture notes in computer science ; 8709).

Page 19: Large Scale Data Clean-ups & Challenges for the Library

Thank you

ขอบค�ณKsenija Mincic-Obradovic

[email protected]