View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Small-worldSmall-worldlink structureslink structures
acrossacrossan academic web spacean academic web space
- a Library and Information Science approach- a Library and Information Science approach
Lennart Björneborn, PhD
Royal School of Library and Information Science Copenhagen, Denmark
www.db.dk/lb
November 19, 2004School of Information and Library ScienceUniversity of North Carolina at Chapel Hill
M.C. Escher: House of Stairs, 1951
WWW = distributed knowledge organization= collaborative ’weaving’= selforganized macro-level aggregations of micro-level interactions= reflect social/cultural formations
Woo
d et
al.
(199
5)
small-world small-world networksnetworks
small-world = highly clustered + short paths– short distances through shortcuts between nodes in network
– small-world = short local + short global distances
– efficient diffusion of signals, contacts, ideas, viruses, etc. in networks
social network analysis in 1960s: ’six degrees of separation’– today: ‘small worlds’ in biological, chemical, technical, social networks– brains, ecological food webs, scientific collaboration networks, etc.
0
50
100
150
200
250
300
350
400
450
0 1000 2000 3000 4000 5000 6000 7000 8000
Subsites
In-n
eig
hb
ors
scale-free link distributionscale-free link distribution
power law = # in-neighbors / subsite
5
research motivationresearch motivation
distributed knowledge organization small world structures
exploratory capabilities (accessibility + navigability)
– core issues in LIS (library and information science)
– short link paths human web surfers + digital web crawlers can reach and retrieve web pages
establish an understanding of what micro-level web activities
contribute to small-world properties on the Web?
– especially: how do academic link creators actually connect
documents, topics, genres, and sites across the Web?
6
main research questionmain research question what types of web links,
web pages and web sites
function as cross-topic connectors
in small-world link structures
across an academic web space?
webometricswebometrics the study of quantitative aspects of
the construction and use of
info. resources, structures and technologies on the Web,
drawing on bibliometric and informetric approaches
– graph theory + social network analysis
– Almind & Ingwersen (1997): ‘webometrics’
– web contents, links, user behaviour, search engine performance
informetrics
bibliometrics
scientometrics
webometrics
cybermetrics
© Björneborn 2004
basic link terminologybasic link terminology B has an inlink from A : ~ citation B has an outlink to C : ~ reference B has a selflink : ~ self-citation
E and F are reciprocally linked H is reachable from A by a link path A has a transversal link to G : shortcut
C and D have co-inlinks from B : ~ co-citation
B and E have co-outlinks to D : ~ bibliographic coupling
co-links
© Björneborn 2004
A
B
D
E G
F
H
C
UK link dataUK link data 20012001 109 UK universities
– main sites excluded
7669 subsites– www.hum.port.ac.uk
– www.atm.ox.ac.uk
– ...
3.4 million web pages 39.3 mill. page outlinks
– 34.4 million site selflinks
– 4.9 million site outlinks
delimited data set – 105 817 web pages
– 207 865 links between 7669 subsites
5-step methodology5-step methodologyA. Graph model of 7669 UK academic
subsites;
B. 189 random subsites in SCC(Strongest Connected Component);
C. 10 path nets with all shortest paths between five pairs of topically dissimilar SCC subsites;
D. Source and target pages along shortest link paths in 10 path nets;
E. Links, pages and subsites providing transversal (cross-topic) connections in 10 path nets.
A
CB
DE
corona modelcorona modelreachability structuresreachability structures
SCCStrongest Connected
Component
IN-Tendrilsconnected from IN
OUTreachable from SCC
INtraversable to
SCC
OUT-Tendrilsconnected to OUT
Tubeconnecting IN to OUT
Disconnected
© Björneborn 2004
bow-tie model
(Broder et al. 2000)
.ac.uk
.uk
cfd.me.umist.ac.uk
ercoftac.mech.surrey.ac.uk
cajun.cs.nott.ac.uk
ukoln.bath.ac.uk
cs.man.ac.uk
ashmol.ox.ac.uk
collections.ucl.ac.uk
vlmp.museophile.sbu.ac.uk
shortest shortest link pathlink path
© Björneborn 2004
path net = ‘mini’ small worldpath net = ‘mini’ small world
transversal link
path net = all shortest link paths between two given nodes (subsites)
network analysis tool = Pajek > adjacency matrix © Björneborn 2004
14
10 path nets10 path nets
hum.port.ac.uk
Faculty of Humanities and Social Sciences, Portsmouth
Atmospheric, Oceanic and Planetary Physics,
Oxford atm.ox.ac.uk
economics.soton. ac.ukEconomics Dept, Southampton
Chemistry Dept, Glasgow chem.gla.ac.uk
psy.man.ac.ukPsychology Dept, Manchester
Mathematics Dept, Glasgow Caledonian maths.gcal.ac.uk
speech.essex.ac.uk
Speech Research Group, Linguistics Dept, Essex
Palaeontology Research Group, Earth Sciences Dept, Bristol palaeo.gly.bris.ac.uk
geog.plym.ac.ukGeography Dept, Plymouth
Ophthalmology Dept,[eye research] Oxford eye.ox.ac.uk
5 pairs of topically dissimilar subsites
+ both directions
= 10 path nets with all shortest paths
15
indicative findingsindicative findings no generalizable findings – indicative only
– national + sectoral + institutional delimitation = UK academic subsites
– temporal delimitation = 2001 snapshot : do not cover dynamic changes
– small stratified sample of 10 path nets
may however be fruitful for future large-scale research
– computer-science sites = academic cross-topic connectors
– personal link creators > web cohesion ‘glue’ > especially link lists
– over 80% of transversal links may be academic (research, teaching)
– close relation: hubs / authorities and betweenness centrality
– web of genres genre drift + topic drift small world
web of genres & genre driftweb of genres & genre drift
© Björneborn 2004
17
possible small-world implications/applicationspossible small-world implications/applications library and information science
– also focus on distributed knowledge organization (www)
– also focus on exploratory capabilities in distributed info.systems convergent (goal-directed) and divergent (serendipitous) info.behavior
web sociology / science studies– small-world links > cross-social / cross-domain weak ties
– counteract balkanization into disconnected / unreachable insularities
– small-world ‘gate-keepers’ with betweenness centrality in networks
– tracking interdisciplinary boundary crossings
– web mining of fertile areas for cross-disciplinary cross-pollination
search engines– better coverage in web traversal + harvesting
– zoomable maps of web clusters + small-world shortcuts
18
important question for important question for LISLIS
how can information systems
- especially libraries and www –
be used to stimulate
curiosity,
serendipity (unexpected but useful info),
creativity and
knowledge diffusion ?
information systems as possibility spaces
shouldn’t just meet explicit information needs of users
– but also enable users to develop their needs
users’ information behaviorusers’ information behavior
convergent behavior goal-directed, rational
e.g., Boolean searching
present, explicit info.needs problems, work tasks
‘information recovery’
top-down, topic-focused, ’convergent’ info.system
divergent behavior non-goal-directed, intuitive
e.g., browsing, serendipity
latent, implicit info.needs triggered interests, curiosity
‘information discovery’
bottom-up, topic-scattered, ’divergent’ info.system
proposed broadened aim of LIS research
”help users explore and exploit options embedded in information environments”
traditional LIS research areas
Björneborn
complementaryLIS research areas
20
so wwwhat …?so wwwhat …?
librarians and other info.specialists can create and locate clusters
and navigational aids on the Web
– making it easier to find requested info.
but also allow possibilities for diversity and serendipity
– making it easier to encounter unexpected info.
just like in physical libraries = ingenious possibility spaces :-)
21
Five ’laws’ of web connectivityFive ’laws’ of web connectivity– Links are for use – the very essence of hypertext;
– Every surfer his or her link– the rich diversity of links across topics and genres;
– Every link its surfer – ditto;
– Save the time of the surfer– by visualizing web clusters and small-world shortcuts;
– The Web is a growing organism– we are still in the Web’s infancy
Inspired by S.R. Ranganathan (1931). The five laws of library science:
“Books are for use.
Every reader his or her book.
Every book its reader.
Save the time of the reader.
The Library is a growing organism.”
© Björneborn 2004
PhD thesis 2004PhD thesis 2004
Small-World Link Structuresacross an Academic Web Space
- a Library and Information Science Approach
Lennart Björneborn
www.db.dk/lb
24
longest path netlongest path net
path net with the longest shortest link paths between two given subsites