58
TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013 JULY 25–26, 2013 INDIANAPOLIS, INDIANA USA

TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Embed Size (px)

Citation preview

Page 1: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

TEMPORAL SPREAD IN ARCHIVEDCOMPOSITE RESOURCES(WORK IN PROGRESS)

SCOTT G. AINSWORTH

MICHAEL L. NELSON

OLD DOMINION UNIVERSITY

COMPUTER SCIENCE

WADL 2013

JULY 25–26, 2013

INDIANAPOLIS, INDIANA USA

Page 2: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

2

CONTENTS

Motivation

Related work

Preliminary work

Temporal Spread

Future work

Conclusion

7/26/13

Page 3: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

3

A FABLE FROM WAYBACK

7/26/13

Page 4: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

4

TEMPORAL SPREAD

7/26/13

2005-05-1401:36:08

+9 days

+18 days +18 days

+7 months

+2.1 years

Page 5: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

5

QUESTIONS• How much temporal spread exists in composite

mementos?

• How can temporal spread be minimized?

• What factors contribute, positively or negatively, to spread?

• Does combining multiple archives produce better results?

• Would users with differing goals benefit from different minimization policies and heuristics?

• How can temporal coherence be displayed to users—simply?

7/26/13

Page 6: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

6

CONTENTS

Motivation

Related work

Preliminary work

Temporal Spread

Future work

Conclusion

7/26/13

Page 7: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

7

RELATED WORKControl Crawl Data Quality, Future collections

• Spaniol et al. – crawling strategy • Denev et al. – change rates by MIME type and

depth• Ben Saad et al. – metadata from crawl used to

select best results from archive

Our Focus: Existing Data Quality• Existing collections• Datetime selection policies

7/26/13

Page 8: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

8

RELATED WORKUse Patterns

• AlNoamony et al. – Archive Access Patterns• Humans vs. Robots• Dip, dive, slide, & skim

Identifying Duplicates• Simple identity – images, other binary formats

• direct comparison• Hash comparison

• HTML, CSS (text)• Shingling, Jaccard distances, etc.• SimHash most promise ⃪�

7/26/13

Page 9: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

9

RELATED WORK – MEMENTO*• HTTP extension for datetime negotiation

Request

Response

7/26/13

GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1…Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT…

HTTP/1.1 200 OK…Memento-Datetime: Sat, 14 May 2005 01:36:08 GMT…

*https://datatracker.ietf.org/doc/draft-vandesompel-memento/

Page 10: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

10

CONTENTS

Motivation

Related work

Preliminary work How much of the Web is archived Temporal Drift

Temporal Spread

Future work

Conclusion

7/26/13

Page 11: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

11

HOW MUCH IS ARCHIVED?

7/26/13

35 – 90% At least one archived copy

17 – 49% 2 – 5 copies

1 – 8% 6 – 10 copies

8 – 63% > 10 copies JCDL’11

Internet Archive Search Engine Other

Page 12: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

12

CONTENTS

Motivation

Related work

Preliminary work How much of the Web is archived Temporal Drift

Temporal Spread

Future work

Conclusion

7/26/13

Page 13: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

13

TEMPORAL DRIFTComparing two policies

• Sliding –target datetime changes• Sticky – target datetime held steady

7/26/13

Page 14: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

14

SLIDING TARGET

7/26/13

2005-05-14 01:36:08

Page 15: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

15

SLIDING TARGET

7/26/13

2005-04-2200:17:52

Page 16: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

16

SLIDING TARGET

7/26/13

2005-03-3109:16:10

Page 17: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

17

TEMPORAL DRIFTWHAT WE EXPECTED2005-05-14 @ 01:36:08

WHAT WE GOT2005-03-31 @ 09:16:10

7/26/13

Page 18: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

18

STICKY TARGET

What if the target is held steady?

(Enabled by Memento API)

7/26/13

Page 19: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

19

2005-05-14STICKY TARGET

7/26/13

Mem

ento

Fo

x E

xten

sio

n2005-05-14

01:36:08

Page 20: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

20

STICKY TARGET

7/26/13

2005-04-2200:17:52

Page 21: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

21

STICKY TARGET

7/26/13

2005-05-1401:36:08

Page 22: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

22

DRIFT COMPARISON

PageSliding Sticky

Datetime Drift Datetime Drift

CS Home2005-05-14

01:36:08– 2005-05-14

01:36:08–

Science Home

2005-04-2200:17:52

22.1 days 2005-04-2200:17:52

22.1 days

CS Home2005-03-31

09:16:1043.7 days(+21.6 days)

2005-05-1401:36:08

Mean 32.9 days 11.0 days

7/26/13

Page 23: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

23

MEDIAN DRIFT BY STEP

● Sliding● Sticky

Med

ian

Drif

t (m

onth

s)

7/26/13

Step Number

JCDL’13

Page 24: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

24

CONTENTS

Motivation

Related work

Preliminary work How much of the Web is archived Temporal Drift

Temporal Spread

Future work

Conclusion

7/26/13

Page 25: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

25

TEMPORAL SPREAD

7/26/13

Page 26: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

26

COMPOSITE MEMENTO

PRESENTATION STRUCTURE

7/26/13

Page 27: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

27

TEMPORAL SPREAD

7/26/13

2005-05-1401:36:08

+9 days

+18 days +18 days

+7 months

+2.1 years

Page 28: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

28

EMBEDDED RESOURCESResource Memento-Datetime Delta Resource

Memento-Datetime

Delta

http://www.cs.odu.edu 2005-05-14 01:36:08 spacer.gif 2005-06-01 16:23:10 18.6 d

mm_menu.js 2005-05-23 02:39:12 9.0 d jimcheng.gif 2005-06-01 16:37:39 18.6 d

style.css 2005-05-23 02:39:39 9.0 d jsmith.gif 2005-06-01 16:58:50 18.6 d

gfx-logo-odu-crown.gif 2005-05-23 02:39:39 9.0 d rmenu_1st_featured_alumni.png 2005-06-01 21:21:45 18.8 d

ddmenu_ddown.js 2005-05-23 02:39:43 9.0 d hmenu_college_...-new.png 2005-12-21 20:14:25 7.3 mo

university.js 2005-05-23 02:39:56 9.0 d rmenu_1st_upcoming_news.png 2005-12-21 20:15:14 7.3 mo

rmenu_1st_about.png 2005-06-01 13:40:25 18.5 d rmenu_1st_upcoming_events.png 2005-12-21 21:01:12 7.3 mo

rmenu_bottom_229.gif 2005-06-01 14:07:29 18.5 d lmenu_1st_resources.png 2005-12-28 17:47:41 7.5 mo

shadow-bl.gif 2005-06-01 14:55:53 18.6 d bullet_blue_triangle.gif 2005-12-28 19:43:48 7.5 mo

ecsbdg.jpg 2005-06-01 14:56:17 18.6 d logo-cs.gif 2005-12-28 19:54:29 7.5 mo

shadow-br.gif 2005-06-01 15:18:18 18.6 d rmenu_1st_featured_student.png 2007-06-12 02:36:07 2.1 years

gfx-btn-go-dblue.gif 2005-06-01 15:34:19 18.6 d shadow-b.gif 2007-06-21 02:35:17 2.1 years

shadow-tr.gif 2005-06-01 15:55:57 18.6 d shadow-r.gif 404 Not Found

header-right1.gif 2005-06-01 16:06:16 18.6 d

7/26/13

Embedded Resources 26

Mean Delta 125.9 days

Standard Deviation 207.7 days

Spread 2.1 years

Page 29: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

29

REPRESENTING SPREAD

COMPOSITE MEMENTO

TEMPORAL SPREAD CHART

7/26/13

RootEmbeddedDiff. DomainReused

Page 30: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

30

TEMPORAL SPREAD – ODU CS

7/26/13

Page 31: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

31

FIRST EXPERIMENT

• 1,000 URIs from DMOZ (Open Directory)• Download all timemaps• Download all composite mementos• Download all embedded resources• Single and Multiple Archives• Four Heuristics

7/26/13

Page 32: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

32

PRELIMINARY RESULTSCount Description Percent

1,000 Root URI-Rs

910 Root timemaps 91%

87,847 Root URI-Ms in timemaps

96.5 URI-Ms per Root URI-R

85,570 Root memento downloaded 97%

1,488,420 Embedded URI-Rs

17.4 Embedded URI-Rs per Root memento

7/26/13

Page 33: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

33

SINGLE/MULTI & HEURISTICSDescription Minimize

Distance, Single

Archive

Minimize Distance,

Multi-Archive

3-Month Window,

Multi-Archive

Embedded URI-Rs 1,488,440 1,488,420 1,447,351

Embedded URI-Ms in timemaps 1,169,787 1,186,456 500,541

URI-M/Embedded URI-R 0.79 0.80 0.35

% Complete 73.8% 75.4% 33.8%

Mean spread 200.2 200.1 15.1

Standard Deviation 219.2 219.9 14.3

7/26/13

Page 34: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

34

TEMPORAL COHERENCE

7/26/13

1 Memento, Bracketed Root

Page 35: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

35

TEMPORAL COHERENCE

7/26/13

1 Memento, Bracketed Root

Page 36: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

36

TEMPORAL COHERENCE

7/26/13

1 Memento, Bracketed Root

Page 37: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

37

TEMPORAL COHERENCE

7/26/13

1 Memento, Root Not Bracketed

Page 38: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

38

TEMPORAL COHERENCE

7/26/13

1 Memento, Root Not Bracketed

Page 39: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

39

TEMPORAL COHERENCE

7/26/13

1 Memento, No Last-Modified

Page 40: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

40

TEMPORAL COHERENCE

7/26/13

1 Memento, Before Root

Page 41: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

41

TEMPORAL COHERENCE

7/26/13

2 Mementos, Root Not Bracketed

Page 42: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

42

TEMPORAL COHERENCE

7/26/13

2 Mementos, Root Not Bracketed

Page 43: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

43

TEMPORAL COHERENCE

7/26/13

2 Mementos, Use Content – Similarity

Page 44: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

44

TEMPORAL COHERENCE

7/26/13

2 Mementos, Contents Equal or Equivalent

Page 45: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

45

TEMPORAL COHERENCE

7/26/13

2 Mementos, Contents Not Equal or Equivalent

Page 46: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

46

CURRENT EXPERIMENT

• 4,000 URIs from JCDL’11 “How Much…” paper• 1 URI/month vice all• Temporal coherence patterns• Target WSDM 2013

7/26/13

Page 47: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

47

CURRENT EXPERIMENT

7/26/13

Page 48: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

48

CONTENTS

Motivation

Related work

Preliminary work

Temporal Spread

Future work

Conclusion

7/26/13

Page 49: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

49

FUTURE WORKTimemaps, Redirection, Missing Mementos

• Timemaps only tell part of the story

• URI-R redirection (302 from source)

• URI-M redirection (Archive action)

• Mementos in timemaps but not accessible

• Policies must consider user needs• Leave it missing• Show “best” substitute

7/26/13

Page 50: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

50

FUTURE WORKSimilarity & Duplication

• Delta are currently | root – embedded |

• If bracketing mementos are identical,should delta be zero?

• HTML is usually modified by the archive

• Can’t check for equality

• Shingling? SimHash?

7/26/13

0 +30d–30d

Page 51: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

51

FUTURE WORKCommunicating Status

7/26/13

Page 52: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

52

FUTURE WORKPolicies & Heuristics

• Current Spread Heuristics• Minimize distance• Past only• Past preferred• Near or within distance• Single vs. multi-archive

• Refine to meet user expectations• Speed (minimize time)• Accuracy (minimize temporal error)

7/26/13

Page 53: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

53

CONTENTS

Motivation

Related work

Preliminary work

Future work

Conclusion

7/26/13

Page 54: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

54

CONCLUSION

Extensive research on improving acquisition

exists

Best use of existing collections needs study

We are looking at

• Characterizing existing holdings

• Characterizing temporal coherence

• Policies that minimize impact of temporal

incoherence

• Visualizations of temporal coherence

7/26/13

Page 55: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

55

MY QUESTIONS

7/26/13

Coherent

Page 56: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

56

MY QUESTIONS

7/26/13

Violation

Page 57: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

57

MY QUESTIONS

7/26/13

What do

these mean

to users?

(3)

(2)

(1)

(4)

Page 58: TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013

Join

t C

onfe

renc

e o

n D

igita

l Lib

rarie

s (J

CD

L) 2

013

Scott G. Ainsworth • Michael L. Nelson

58

MY QUESTIONS

7/26/13

What does

this mean

to users?