Unsupervised Creation of Small World Networks for the
Preservation of Digital Objects
Charles L. Cartledge
Michael L. Nelson
Old Dominion University
Department of Computer Science
Norfolk, Virginia
SP145 JCDL Short Paper Presentation 2
Order of Presentation
• Technology enablers• Constraints• Simple rules for Complex Behavior• Simulation approach• Simulation results• Future work
SP145 JCDL Short Paper Presentation 4
Technology Enablers
Cost data: http://www.archivebuilders.com/whitepapers/22011p.pdf
SP145 JCDL Short Paper Presentation 5
Constraints“ … Tomorrow we could see the National
Library of Medicine abolished by Congress, Elsevier dismantled by a corporate raider, the Royal Society
declared bankrupt, or the University of Michigan Press destroyed by a meteor. All are highly unlikely, but over a long period of time unlikely events will happen. …”
(emphasis CLC)
W. Y. Arms, “Preservation of Scientific Serials: Three Current Examples,” Journal of Electronic Publishing, Dec., 1999
Expectancy data: http://www.cdc.gov/nchs/data/nvsr/nvsr57/nvsr57_14.pdf
8075
12 – 101 yrs
Picture: Patricia W. and J Douglas Perry Library, Old Dominion Universityhttp://www2.westminster-mo.edu/wc_users/homepages/staff/brownr/ClosedCollegeIndex.htm
Those that die,
do so in
avg. 23
yrs. http://www.lbl.gov/Science-Articles/Archive/ssc-and-future.htmlhttp://www.dod.mil/brac/
http://www.hq.nasa.gov/office/pao/97budget/zbr.txt
5 – 60 yrs
SP145 JCDL Short Paper Presentation 6Doctoral Consortium 6
Reynolds’s Rules for Flocking
• Collision Avoidance: avoid collisions with nearby flock mates
• Velocity Matching: attempt to match
velocity with nearby flock mates
• Flock Centering: attempt to stay close to
nearby flock mates
Images and rules:http://www.red3d.com/cwr/boids/
My interpretation
• Namespace collision avoidance
• Following others to available storage
locations
• Deleting copies of one’s self to provide room for late arrivers
SP145 JCDL Short Paper Presentation 7
Types of Graphs
Regular Random
Path length Long Short Clustering coefficient
High
Low
Small World
Shorter
Still high
(Each graph has 20 vertices and 40 edges.)
SP145 JCDL Short Paper Presentation 9
Unsupervised Small World Graph Creation
• gamma = 0.0
• alpha = 0.99
• gamma = 0.7
• alpha = 0.99• 0.2 <= beta <=0.66• gamma < 0.6
CC is shown as dark lines
L is shown as light lines
SP145 JCDL Short Paper Presentation 10
Phases/ActivitiesCreation
(Human or archivist activities)
Wandering(Autonomous activities)
Connecting(Autonomous activities)
Flocking(Autonomous activities)
SP145 JCDL Short Paper Presentation 12
Wandering
A B
D C
Who are you connected to?Connected to:
<Nil>W
ho are
you
conn
ected
to?
Con
nected
to:A
Who are you
connected to?
Connected to:BW
ho are you
connect
ed to
?
Connected
to:
A
Con
nected
to:B
, C
Wh
o are you
con
nected
to?
Connected to:A
Who are you connected to?
SP145 JCDL Short Paper Presentation 13
Connecting
A B
D C
Possible
connection
Connection
NOT establish
edP
ossi
ble
co
nn
ecti
onC
onn
ecti
on
esta
bli
shed
SP145 JCDL Short Paper Presentation 15
Typical Simulation Parameters• alpha = 0.5
• beta = 0.6
• gamma = 0.1
• Number of DOs = 1000
• Number of hosts = 1000
• Min number desired replicas = 3
• Max number desired replicas = 10
• Max number of replicas per host = 20
Future work
• Test the autonomous graphs for resilience to error and attack
• Test what happens when a graph becomes disconnected
• Test what happens when a disconnected graph becomes re-connected
SP145 JCDL Short Paper Presentation 17
SP145 JCDL Short Paper Presentation 18
Conclusions
• We have shown that Digital Objects can autonomously create small world graphs based on locally gleaned data
• These graphs can be used for long term preservation
• We intend to study these graphs focusing on their tolerance to isolated and widespread failures