12
University Archives University Archives & Archive-It WebCom 2011-03-29

University Archives University Archives & Archive-It WebCom 2011-03-29

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: University Archives University Archives & Archive-It WebCom 2011-03-29

University Archives

University Archives & Archive-It

WebCom 2011-03-29

Page 2: University Archives University Archives & Archive-It WebCom 2011-03-29

The Duke University Archives is responsible for the collection and management of records of

enduring value created by the University's administrative offices and academic units.

The Archives also acquires records of student, faculty and staff organizations, selected

personal papers, and books, images, audio, and other documentation about Duke

University.

Page 3: University Archives University Archives & Archive-It WebCom 2011-03-29

Archive-It Service Agreement

• $6,000 Subscription Fee• 8,000,000 URLS• 0.75 TB storage• 1-2 Active Collections• Maximum 200 Active Seeds

• Collection & Crawl Interface• Search Portal• All data will be copied to Internet Archive’s

Wayback Machine on contract termination

Page 4: University Archives University Archives & Archive-It WebCom 2011-03-29

Front Page

Page 5: University Archives University Archives & Archive-It WebCom 2011-03-29

Collection Page

Page 6: University Archives University Archives & Archive-It WebCom 2011-03-29

Page Capture Index

http://wayback.archive-it.org/1858/*/http://news.duke.edu/

Page 7: University Archives University Archives & Archive-It WebCom 2011-03-29

Page View

Page 8: University Archives University Archives & Archive-It WebCom 2011-03-29

Priorities this past year…

• Institutes & Student Groups

– Have a relatively short life

– The Archives rarely receives records transfers

from these groups

• Units with existing relationships

• Opportunities as they arise

Page 9: University Archives University Archives & Archive-It WebCom 2011-03-29

Crawl of duke.edu• Started March 4, 2011 4:34:20 PM• Completed March 7, 2011 5:46:52 PM

• Average Doc Rate 13.66 urls/sec• Average KB Rate 1,646 KB/s

• Total Documents 3,594,845• Total Data 413.2 GB

• Duke Domains Found 1,698

Page 10: University Archives University Archives & Archive-It WebCom 2011-03-29

Issues

• Capturing the “Look & Feel” of a site

• Crawler Traps (e.g. calendars)

• Junk URLS (e.g. bad CMS link generation)

Page 11: University Archives University Archives & Archive-It WebCom 2011-03-29

Robots Exclusions

We do want:• Look & Feel

– JavaScript– CSS– Images

• Policy• Publications• Events• RSS

We (usually) don’t want:• Every day of every year• Your taxonomies• Administrative pages• Maps/GIS• “Personal” pages

User-agent: archive.org_bot

Page 12: University Archives University Archives & Archive-It WebCom 2011-03-29

Contact me:

Seth Shaw

Electronic Records Archivist

Duke University Archives

684.6181

[email protected]