22
Columbia’s Born-Digital Preservation Infrastructure & Ford International Fellowships Program

Columbias Born-Digital Preservation Infrastructure Ford International Fellowships Program

Embed Size (px)

DESCRIPTION

Columbia Libraries Digital Program 1.Collection-based digitization 2.Long-term digital preservation 3.Website development 4.Digital Library infrastructure development 5.Born-digital collection archiving and access

Citation preview

Page 1: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Columbia’s Born-Digital Preservation Infrastructure

&Ford International Fellowships Program

Page 2: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Columbia University

Page 3: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Columbia Libraries Digital Program

1. Collection-based digitization2. Long-term digital preservation3. Website development4. Digital Library infrastructure development

5. Born-digital collection archiving and access

Page 4: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 5: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Ford International Fellowships Program

“The IFP has since 2001 offered fellowships for post-graduate study to leaders from underserved communities in Asia, Africa, Latin America, and Russia, and will complete its work in 2014. Their archives include documentation and videos of the more than 3,300 IFP fellows who passed through the program as well as comprehensive planning and adminstrative files.”

Page 6: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Ford IFP Grant• Received in October 2011• Become permanent archive for IFP’s paper and digital archives• $1 million• 3 years (technology portion)• Archive and provide access IFP’s archives• And …

“ … to build out a full set of repository-based systems and services so that it can more easily acquire, ingest, process, preserve and make accessible both paper and born-digital organizational records.”

Page 7: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

High-Level View

Page 8: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Countries Harvested From:• Brazil• Chile and Peru• China• Egypt• Ghana• Guatemala• India• Indonesia• Kenya• Mexico• Mozambique• Nigeria• Palestine (Gaza and West Bank)• Philippines• Russia• Senegal• South Africa• Tanzania• Thailand• Uganda• United States - NYC Secretariat• Vietnam

Page 9: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Languages Encountered:• English• Russian• Portuguese• Spanish• Chinese• Arabic• Indonesian• French• Thai• Vietnamese

Page 10: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Files Harvested:

334,000 and counting …

Page 11: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

File Formats Encountered:32, 3gp, accdb, adb, adp, adx, ai, aif, amr, asf, avi, axd, back, bat, bin, bk, blb, bmp, BridgeSort, btr, bup, cab, cat, cda, cdr, cfg, chm, cnf, cnm, con, css, cst, csv, cxt, d, dat, db, dbf, ddb, ddx, dfont, dir, dll, dmi, doc, doc-MRB, docm, docx, dot, ds_store, dtd, dwz, dxr, edb, edx, eml, emz, eps, exe, F&A, fcp, fff, fh9, fil, flp, flv, fol, frm, gdb, gdx, gif, hdb, hdx, hk4, hlp, hta, htm, html, ico, idx, ifo, inc, indd, inf, info, ini, itc2, itdb, itl, jar, jp2, jpe, jpeg, jpg, js, l, lck, ldb, lnk, log, m4a, m4v, mbx, mdb, mde, mdi, mdx, mht, mid, mls, mno, mov, mp3, mp4, mpeg, mpg, mpp, msf, msg, msi, mso, msv, mswmm, nri, ocx, odc, odt, ofa, oft, opd, opf, otf, p65, pab, pages, pcx, pdf, php, pif, plist, pm, pm!, pm0, pm5, pmd, pmh, pmi, pmj, pml, pmm, pmo, pmr, pms, pmx, pnc, pnd, png, pns, pnx, pot, pps, ppsx, ppt, pptx, prod, prod1, properties, psd, psp, pst, pub, qpw, qxd, r, ra, ra-att, rar, rdp, rel, rels, rem, rex, rpt, rsc, rtf, sav, sc4, sdb, sdx, sh, shs, snm, spi, spss, spv, spx, sql, svn-base, swa, swf, sys, tdb, tdx, thm, thmx, tif, tiff, tlb, tmp, toc, tpl, ttf, txt, txz, up, url, usr, utf8, vcf, vdproj, vob, vsd, wav, wbk, webarchive, wma, wmf, wmv, wmz, wpd, wpl, wps, xla, xlk, xls, xlsb, xlsm, xlsx, xlw, xml, xps, zip

(243 different file formats)

Page 12: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

High Level Workflow:

Page 13: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Technology Tools:

• FRED (Forensic Recovery of Evidence Device) – hardware / OS• Forensic Toolkit (FTK) – suit of tools• Archivematica – preservation analysis and packaging• Fedora – enterprise-level repository solution• Archivists Toolkit – archival processing tool• SOLR – powerful Lucene-based search server• Blacklight – open source discovery interface

Page 14: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 15: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 16: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 17: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 18: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 19: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program
Page 20: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Preservation ‘Curation’:• From original file, generate format versions that are more

preservable and more accessible than the original file form, e.g.,

• From MS .doc and .docx files generate .rtf and/or PDF-A• From MS .xls and .xlsx files generate tab-delimited format• From HD video files generate motion JPEG2000

• Database files? SPSS files? Pro Tools audio files? • ‘Legacy’ file formats?

Page 21: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

File Formats Encountered:32, 3gp, accdb, adb, adp, adx, ai, aif, amr, asf, avi, axd, back, bat, bin, bk, blb, bmp, BridgeSort, btr, bup, cab, cat, cda, cdr, cfg, chm, cnf, cnm, con, css, cst, csv, cxt, d, dat, db, dbf, ddb, ddx, dfont, dir, dll, dmi, doc, doc-MRB, docm, docx, dot, ds_store, dtd, dwz, dxr, edb, edx, eml, emz, eps, exe, F&A, fcp, fff, fh9, fil, flp, flv, fol, frm, gdb, gdx, gif, hdb, hdx, hk4, hlp, hta, htm, html, ico, idx, ifo, inc, indd, inf, info, ini, itc2, itdb, itl, jar, jp2, jpe, jpeg, jpg, js, l, lck, ldb, lnk, log, m4a, m4v, mbx, mdb, mde, mdi, mdx, mht, mid, mls, mno, mov, mp3, mp4, mpeg, mpg, mpp, msf, msg, msi, mso, msv, mswmm, nri, ocx, odc, odt, ofa, oft, opd, opf, otf, p65, pab, pages, pcx, pdf, php, pif, plist, pm, pm!, pm0, pm5, pmd, pmh, pmi, pmj, pml, pmm, pmo, pmr, pms, pmx, pnc, pnd, png, pns, pnx, pot, pps, ppsx, ppt, pptx, prod, prod1, properties, psd, psp, pst, pub, qpw, qxd, r, ra, ra-att, rar, rdp, rel, rels, rem, rex, rpt, rsc, rtf, sav, sc4, sdb, sdx, sh, shs, snm, spi, spss, spv, spx, sql, svn-base, swa, swf, sys, tdb, tdx, thm, thmx, tif, tiff, tlb, tmp, toc, tpl, ttf, txt, txz, up, url, usr, utf8, vcf, vdproj, vob, vsd, wav, wbk, webarchive, wma, wmf, wmv, wmz, wpd, wpl, wps, xla, xlk, xls, xlsb, xlsm, xlsx, xlw, xml, xps, zip

(243 different file formats)

Page 22: Columbias Born-Digital Preservation Infrastructure  Ford International Fellowships Program

Special IFP Project Challenges …

Determining and encoding intellectual property rights

Determining and encoding information relating to privacy and access

Metadata creation / extractionWorking with 23 separate entities in

advance of their data deliveries and office closings

Building scalable workflowsBuilding scalable storage infrastructure