24
Small-world Small-world link structures link structures across across an academic web space an academic web space - a Library and Information Science appr - a Library and Information Science appro Lennart Björneborn, PhD Royal School of Library and Information Sci Copenhagen, Denmark www.db.dk/lb November 19, 2004 School of Information and Library Scien University of North Carolina at Chapel H M.C. Escher: House of Stairs, 1951

Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

Small-worldSmall-worldlink structureslink structures

acrossacrossan academic web spacean academic web space

- a Library and Information Science approach- a Library and Information Science approach

Lennart Björneborn, PhD

Royal School of Library and Information Science Copenhagen, Denmark

www.db.dk/lb

November 19, 2004School of Information and Library ScienceUniversity of North Carolina at Chapel Hill

M.C. Escher: House of Stairs, 1951

Page 2: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

WWW = distributed knowledge organization= collaborative ’weaving’= selforganized macro-level aggregations of micro-level interactions= reflect social/cultural formations

Woo

d et

al.

(199

5)

Page 3: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

small-world small-world networksnetworks

small-world = highly clustered + short paths– short distances through shortcuts between nodes in network

– small-world = short local + short global distances

– efficient diffusion of signals, contacts, ideas, viruses, etc. in networks

social network analysis in 1960s: ’six degrees of separation’– today: ‘small worlds’ in biological, chemical, technical, social networks– brains, ecological food webs, scientific collaboration networks, etc.

Page 4: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

0

50

100

150

200

250

300

350

400

450

0 1000 2000 3000 4000 5000 6000 7000 8000

Subsites

In-n

eig

hb

ors

scale-free link distributionscale-free link distribution

power law = # in-neighbors / subsite

Page 5: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

5

research motivationresearch motivation

distributed knowledge organization small world structures

exploratory capabilities (accessibility + navigability)

– core issues in LIS (library and information science)

– short link paths human web surfers + digital web crawlers can reach and retrieve web pages

establish an understanding of what micro-level web activities

contribute to small-world properties on the Web?

– especially: how do academic link creators actually connect

documents, topics, genres, and sites across the Web?

Page 6: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

6

main research questionmain research question what types of web links,

web pages and web sites

function as cross-topic connectors

in small-world link structures

across an academic web space?

Page 7: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

webometricswebometrics the study of quantitative aspects of

the construction and use of

info. resources, structures and technologies on the Web,

drawing on bibliometric and informetric approaches

– graph theory + social network analysis

– Almind & Ingwersen (1997): ‘webometrics’

– web contents, links, user behaviour, search engine performance

informetrics

bibliometrics

scientometrics

webometrics

cybermetrics

© Björneborn 2004

Page 8: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

basic link terminologybasic link terminology B has an inlink from A : ~ citation B has an outlink to C : ~ reference B has a selflink : ~ self-citation

E and F are reciprocally linked H is reachable from A by a link path A has a transversal link to G : shortcut

C and D have co-inlinks from B : ~ co-citation

B and E have co-outlinks to D : ~ bibliographic coupling

co-links

© Björneborn 2004

A

B

D

E G

F

H

C

Page 9: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

UK link dataUK link data 20012001 109 UK universities

– main sites excluded

7669 subsites– www.hum.port.ac.uk

– www.atm.ox.ac.uk

– ...

3.4 million web pages 39.3 mill. page outlinks

– 34.4 million site selflinks

– 4.9 million site outlinks

delimited data set – 105 817 web pages

– 207 865 links between 7669 subsites

Page 10: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

5-step methodology5-step methodologyA. Graph model of 7669 UK academic

subsites;

B. 189 random subsites in SCC(Strongest Connected Component);

C. 10 path nets with all shortest paths between five pairs of topically dissimilar SCC subsites;

D. Source and target pages along shortest link paths in 10 path nets;

E. Links, pages and subsites providing transversal (cross-topic) connections in 10 path nets.

A

CB

DE

Page 11: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

corona modelcorona modelreachability structuresreachability structures

SCCStrongest Connected

Component

IN-Tendrilsconnected from IN

OUTreachable from SCC

INtraversable to

SCC

OUT-Tendrilsconnected to OUT

Tubeconnecting IN to OUT

Disconnected

© Björneborn 2004

bow-tie model

(Broder et al. 2000)

Page 12: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

.ac.uk

.uk

cfd.me.umist.ac.uk

ercoftac.mech.surrey.ac.uk

cajun.cs.nott.ac.uk

ukoln.bath.ac.uk

cs.man.ac.uk

ashmol.ox.ac.uk

collections.ucl.ac.uk

vlmp.museophile.sbu.ac.uk

shortest shortest link pathlink path

© Björneborn 2004

Page 13: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

path net = ‘mini’ small worldpath net = ‘mini’ small world

transversal link

path net = all shortest link paths between two given nodes (subsites)

network analysis tool = Pajek > adjacency matrix © Björneborn 2004

Page 14: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

14

10 path nets10 path nets

hum.port.ac.uk

Faculty of Humanities and Social Sciences, Portsmouth

Atmospheric, Oceanic and Planetary Physics,

Oxford atm.ox.ac.uk

economics.soton. ac.ukEconomics Dept, Southampton

Chemistry Dept, Glasgow chem.gla.ac.uk

psy.man.ac.ukPsychology Dept, Manchester

Mathematics Dept, Glasgow Caledonian maths.gcal.ac.uk

speech.essex.ac.uk

Speech Research Group, Linguistics Dept, Essex

Palaeontology Research Group, Earth Sciences Dept, Bristol palaeo.gly.bris.ac.uk

geog.plym.ac.ukGeography Dept, Plymouth

Ophthalmology Dept,[eye research] Oxford eye.ox.ac.uk

5 pairs of topically dissimilar subsites

+ both directions

= 10 path nets with all shortest paths

Page 15: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

15

indicative findingsindicative findings no generalizable findings – indicative only

– national + sectoral + institutional delimitation = UK academic subsites

– temporal delimitation = 2001 snapshot : do not cover dynamic changes

– small stratified sample of 10 path nets

may however be fruitful for future large-scale research

– computer-science sites = academic cross-topic connectors

– personal link creators > web cohesion ‘glue’ > especially link lists

– over 80% of transversal links may be academic (research, teaching)

– close relation: hubs / authorities and betweenness centrality

– web of genres genre drift + topic drift small world

Page 16: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

web of genres & genre driftweb of genres & genre drift

© Björneborn 2004

Page 17: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

17

possible small-world implications/applicationspossible small-world implications/applications library and information science

– also focus on distributed knowledge organization (www)

– also focus on exploratory capabilities in distributed info.systems convergent (goal-directed) and divergent (serendipitous) info.behavior

web sociology / science studies– small-world links > cross-social / cross-domain weak ties

– counteract balkanization into disconnected / unreachable insularities

– small-world ‘gate-keepers’ with betweenness centrality in networks

– tracking interdisciplinary boundary crossings

– web mining of fertile areas for cross-disciplinary cross-pollination

search engines– better coverage in web traversal + harvesting

– zoomable maps of web clusters + small-world shortcuts

Page 18: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

18

important question for important question for LISLIS

how can information systems

- especially libraries and www –

be used to stimulate

curiosity,

serendipity (unexpected but useful info),

creativity and

knowledge diffusion ?

information systems as possibility spaces

shouldn’t just meet explicit information needs of users

– but also enable users to develop their needs

Page 19: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

users’ information behaviorusers’ information behavior

convergent behavior goal-directed, rational

e.g., Boolean searching

present, explicit info.needs problems, work tasks

‘information recovery’

top-down, topic-focused, ’convergent’ info.system

divergent behavior non-goal-directed, intuitive

e.g., browsing, serendipity

latent, implicit info.needs triggered interests, curiosity

‘information discovery’

bottom-up, topic-scattered, ’divergent’ info.system

proposed broadened aim of LIS research

”help users explore and exploit options embedded in information environments”

traditional LIS research areas

Björneborn

complementaryLIS research areas

Page 20: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

20

so wwwhat …?so wwwhat …?

librarians and other info.specialists can create and locate clusters

and navigational aids on the Web

– making it easier to find requested info.

but also allow possibilities for diversity and serendipity

– making it easier to encounter unexpected info.

just like in physical libraries = ingenious possibility spaces :-)

Page 21: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

21

Five ’laws’ of web connectivityFive ’laws’ of web connectivity– Links are for use – the very essence of hypertext;

– Every surfer his or her link– the rich diversity of links across topics and genres;

– Every link its surfer – ditto;

– Save the time of the surfer– by visualizing web clusters and small-world shortcuts;

– The Web is a growing organism– we are still in the Web’s infancy

Inspired by S.R. Ranganathan (1931). The five laws of library science:

“Books are for use.

Every reader his or her book.

Every book its reader.

Save the time of the reader.

The Library is a growing organism.”

© Björneborn 2004

Page 22: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

PhD thesis 2004PhD thesis 2004

Small-World Link Structuresacross an Academic Web Space

- a Library and Information Science Approach

Lennart Björneborn

www.db.dk/lb

Page 23: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and
Page 24: Small-world link structures across an academic web space - a Library and Information Science approach Lennart Björneborn, PhD Royal School of Library and

24

longest path netlongest path net

path net with the longest shortest link paths between two given subsites