Upload
julie-lawrence
View
217
Download
0
Embed Size (px)
Citation preview
© December 1999 George Paliouras, All Rights Reserved 1
Learning Communities of Learning Communities of Users on the InternetUsers on the Internet
George Paliouras
Christos Papatheodorou
Vangelis Karkaletsis
Constantine D. Spyropoulos
NCSR “Demokritos”
Email: [email protected]
WWW: http://www.iit.demokritos.gr/skel
© December 1999 George Paliouras, All Rights Reserved 2
Structure of the talkStructure of the talk
• Services on the Internet• Personalization of Internet services• Learning communities from usage data• Three case studies
– Information broker (filtering)– Digital library (retrieval)– Web-site (navigation)
• Conclusions
© December 1999 George Paliouras, All Rights Reserved 3
WWW: the new face of WWW: the new face of the Netthe NetOnce upon a time, the Internet was a forum for exchanging information. Then … …came
the Web.The Web introduced new capabilities …
…and attracted many more people …
…increasing commercial interest …
…and turning the Net into a real forum …
© December 1999 George Paliouras, All Rights Reserved 4
Services on the InternetServices on the Internet
Information providers are still the majority…
Commercial Non-Commercial
CNN Reuters
Times Yahoo
CORDIS NCSTRL
MLNET Library
© December 1999 George Paliouras, All Rights Reserved 5
• We have looked at three different types:– Information filtering (profiles)– Information retrieval (queries)– Navigation
Information accessInformation access
• The Web has introduced new ways to access information.
passive
active
• … covering the majority of today’s information services.
© December 1999 George Paliouras, All Rights Reserved 6
Personalized information Personalized information accessaccess
• Adaptation of the system to the user.
• Social motivation:– Better service for the citizen (reduction of
the information overload).
• Commercial motivation: – Customer relationship management
(targeted advertisement, customer retention, increased sales, etc.)
© December 1999 George Paliouras, All Rights Reserved 7
Personalized information Personalized information accessaccess
“The Quantity of People Visiting Your Site Is Less Important Than the Quality of
Their Experience”
Evan I. Schwartz, Webonomics, Broadway Books, 1997
© December 1999 George Paliouras, All Rights Reserved 8
Personalized information Personalized information accessaccess
sources
server
receivers
© December 1999 George Paliouras, All Rights Reserved 9
User modelingUser modeling
• The process of constructing models that can be used to adapt the system to the user’s requirements.
• Types of user requirement:– Interests (e.g. sports and finance articles)– Knowledge level (e.g. novice – expert)– Preferences (e.g. appearance of GUI)– etc.
© December 1999 George Paliouras, All Rights Reserved 10
User ModelsUser Models
• User model (type A): [PERSONAL]
User x -> sports, stock market
• User model (type B): [PERSONAL]
User x, Age 26, Male -> sports, stock market
• User community: [GENERIC]
Users {x,y,z} -> sports, stock market
• User sterotype: [GENERIC]
Users {x,y,z}, Age [20..30], Male -> sports, stock market
© December 1999 George Paliouras, All Rights Reserved 11
Machine Learning / Data Machine Learning / Data MiningMining
• Acquisition of models from usage data.
• Types of learning– Supervised learning: requires manually
tagged examples.– Unsupervised learning: clusters untagged
examples, according to similarity.
© December 1999 George Paliouras, All Rights Reserved 12
Learning user modelsLearning user models
User 1 User 2 User 3 User 4 User 5
Observation of the users interacting with the system.
User models
Community 1 Community 2 User communities
© December 1999 George Paliouras, All Rights Reserved 13
Collaborative filteringCollaborative filtering
• Memory-based “learning”, (e.g. k-nn):– Given a group of users…– …and a new user…– …find similar users.
• Already in commercial use (e.g. Firefly, amazon.com)
• Problem: It does not give any insight about the usage of the system.
© December 1999 George Paliouras, All Rights Reserved 14
Clustering users into Clustering users into communitiescommunities
• Clustering methods:– Conceptual clustering
(COBWEB, ITERATE)– Graph-based clustering (Cluster mining)– Statistical clustering (Autoclass)– Neural clustering (Self-organising Maps)
© December 1999 George Paliouras, All Rights Reserved 15
Conceptual clusteringConceptual clustering
• COBWEB generates a hierarchy of concepts.
• Each concept is a cluster of objects.
• Our concepts are the communities.
• Our objects are “user models”.
• Similarity metric: category utility.
• Each user in only one community.
© December 1999 George Paliouras, All Rights Reserved 16
Meaningful communitiesMeaningful communities
• Question: What are the characteristics of a community?
• Answer: Community characterization, measuring frequency increase.
• Example: How frequently do users of the community read sports news, compared to the whole set of users.
© December 1999 George Paliouras, All Rights Reserved 17
Cluster miningCluster mining
• Searches for cliques in a graph of the following form:
hardware
mathematics of computing
software
computingmilieux
computingmethodologies
0.22
0.12 0.27
0.19
0.13
0.024
0.03
0.04
0.040.04 0.02
0.030.014 0.0262
0.02
© December 1999 George Paliouras, All Rights Reserved 18
Cluster miningCluster mining
• Nodes: features in the user model.
• Edge labels: frequency at which the two nodes appear together in the data.
• Edge reduction: using a threshold.
• Clique: commonly met pattern in the behavior of the users.
• Each user in several communities.
© December 1999 George Paliouras, All Rights Reserved 19
Case studiesCase studies
• Information broker (filtering)
• Digital Library
(retrieval)
• Web-site
(navigation)ACAI99
NCSTRL
?
© December 1999 George Paliouras, All Rights Reserved 20
Criteria for the Criteria for the communitiescommunities
• We evaluate the quality of community descriptions (behavioral patterns), by:– Coverage: Proportion of characteristics
appearing in the descriptions.– Overlap: Extend of overlap between
descriptions:– Meaningfulness: Do the descriptions make
sense? Are they interesting?
© December 1999 George Paliouras, All Rights Reserved 21
I: Profile-based filteringI: Profile-based filtering
• User models: profiles of news categories for each user.
• User communities: users with common news-reading interests.
• Community descriptions: news categories for each community.
© December 1999 George Paliouras, All Rights Reserved 22
I: COBWEBI: COBWEB
A (1078)
B (681)C (397)
D (328) E (353) F (98)G (181) H (118)
J
(104)
K
(161)
L
(95)
M
(102)
N
(156)
O
(38)
P
(17)
Q
(43)
R
(36)
S
(96)
I
(63)
W
(28)
V
(62)
U
(28)
T
(49)
Community hierarchy
© December 1999 George Paliouras, All Rights Reserved 23
0
0,2
0,4
0,6
0,8
1
0 0,5 1pruning parameter
cove
rage
cobweb (level 2)cobweb (level 3)
I: COBWEBI: COBWEB
Coverage Overlap
012345678
0 0,5 1pruning parameter
over
lap
cobweb (level 2)cobweb (level 3)
© December 1999 George Paliouras, All Rights Reserved 24
I: COBWEBI: COBWEB
D
E Internet (0.55)
F Economic ind. (0.73), Economics & Finance (0.68), Computers (0.6), Transport (0.53), Financial ind. (0.5)
G Economic ind. (0.58), Economics & Finance (0.61)
H Computers (0.53)
Community descriptions
© December 1999 George Paliouras, All Rights Reserved 25
I: Cluster miningI: Cluster mining
Behavioral patternsTelecom, Computers, Internet, Industries, Economics/Finance
Telecom, Computers, Networks
Telecom, Economic ind., Economics/Finance
Hardware, Software
Financial ind., Economic ind., Economics/Finance
Financial ind., Economic ind., Financial markets
Sport, Entertainment electronics
© December 1999 George Paliouras, All Rights Reserved 26
I: ComparisonI: Comparison
012345678
0 0,5 1Connectivity threshold (cluster mining)
and pruning parameter (COBWEB)
Ove
rlap
cluster mining
COBWEB(level 2)
© December 1999 George Paliouras, All Rights Reserved 27
II: Query-based retrievalII: Query-based retrieval
• User models: processed queries.
• User communities: user queries with common keywords.
• Community descriptions: characteristic keywords for each community.
• Pre-processing:– Lemmatization and synonyms (WordNet).– Generalization to top ACM categories.
© December 1999 George Paliouras, All Rights Reserved 28
II: COBWEBII: COBWEB
Community descriptionsComputer Systems Organisation (1.0)
Software (1.0)
Hardware (1.0)
Information Systems (1.0), Computing milieux (0.63), Computing methodologies (0.28)
Information Systems (1.0)
Computing methodologies (1.0), Hardware (1.0)
Computing methodologies (1.0), Software (1.0)
Computing methodologies (1.0)
© December 1999 George Paliouras, All Rights Reserved 29
II: Cluster miningII: Cluster mining
Behavioral patternsHardware, Software, Computing Milieux, Computing Methodologies
Hardware, Software, Computing Milieux, Maths of Computing
Hardware, Computer Systems Organisation
Theory of Computation, Maths of Computing
Information Systems, Software, Computing Milieux, Computing Methodologies
Information Systems, Software, Computing Milieux, Maths of Computing
© December 1999 George Paliouras, All Rights Reserved 30
III: Web-site navigationIII: Web-site navigation
• User models: access sessions as sets of pages or sets of page transitions.
• User communities: users with common navigation behavior.
• Community descriptions: Pages or page transitions for each community.
• Pre-processing: – Sessions from access logs. (duration)– Dimensionality reduction, by feature selection.
© December 1999 George Paliouras, All Rights Reserved 31
III: COBWEBIII: COBWEB
Community descriptions24>25, 23>24, 1>24, 1>19, 19>23
1>22, 22>20, 20>31, 31>27, 27>7, 19>23
22>31, 1>22
22>27, 1>22
1>30
1>30, 8>1, 1>8
30>31, 1>30
© December 1999 George Paliouras, All Rights Reserved 32
III: Cluster miningIII: Cluster mining
Behavioral patterns
1>19, 19>23, 23>24, 24>25
1>24, 24>25
1>22, 22>31
1>22, 22>20
1>30, 30>31
22>20, 20>31, 31>27
22>20, 20>27
1>8, 8>1
1>9, 9>2
20>31, 31>27, 27>7
19>23, 23>14
23>14, 27>7
1>2, 2>11
2>11, 11>12
1>23, 23>24
© December 1999 George Paliouras, All Rights Reserved 33
ConclusionsConclusions
• Community construction gives insight about the usage of information services.
• Unsupervised learning can do the job.
• Characterization makes the results useful.
• Substantial data engineering is need for different types of information access.