View
219
Download
1
Category
Tags:
Preview:
Citation preview
Gradual Adaption Model for Estimation of User Information Access Behavior
J. Chen, R.Y. Shtykh and Q. JinGraduate School of Human Sciences,
Waseda University, Japan
April 19, 2023 Waseda University 2
Background
• Why do we need information– In leisure: search a route or map for tour– In work: search business information– In learning: search academic papers
• Where do we get information– From traditional media such as books, magazines, etc.– From Internet
• How do we get information– To search it from a bookstore, library, etc.– To search it from Internet
• What do we have to face in information search– Too many search results including trashes
April 19, 2023 Waseda University 3
Information Recommendation
• Information recommendation– Web mining approaches
• Usage mining• Structure mining• Content mining• Semantic mining
– Web mining data• Content data: text and multimedia provided by web sites.• Structure data: organization inside a web page, internal and
external links, and the web site hierarchy.• Usage data: access logs data of web sites.• User profile: information data of users.• Semantic data: the data describe the structure and definition
of semantic web sites.
April 19, 2023 Waseda University 4
Study Approach
• Proposing a gradual adaption model for estimation of user information access behavior
• Analyzing a variety of users' information access data in terms of short, medium, long periods, and by remarkable and exceptional categories, and based on Full Bayesian Estimation
• Conducting experimental simulation to show the operability and effectiveness of the proposed model
April 19, 2023 Waseda University 5
Related Works
• WUM (Web Usage Mining)– Based on implicit users’ feedback
• A new document representation model (Poblete and Baeza-Yates, WWW2008)
– Experimented on a web site with a small number of vocabularies and specific to certain topics.
• Indentifying relevant web sites from user activities (Bilenko and White, WWW2008)
– Needs to spend more time to train the system– Personalize information recommendation
• Dynamic Link Generation (Yan, et al, WWW1996) – Consists of off-line and on-line modules
• SUGGEST 3.0 (Baraglia and Silvestri, WT2004) – For large web sites, and only have on-line module. But the size of logs
used to evaluated the system is small and limited.• LinkSelector (Fang and Sheng, ACM2004)
– Hyperlinks-structural and theirs access logs were used.
April 19, 2023 Waseda University 6
Definitions of Keyword, Link and Concept
• Keyword– Keywords in web pages
• Link– Web pages’ link
• Concept– Consists of a number of keywords and links
nature painting
Aristotle
Andersen cartoon
culture
Leonardo
Concept:
Philosophy
Concept:Literature
Concept:Art
Link a
Link b
April 19, 2023 Waseda University 7
Full Bayesian Estimation
• Full Bayesian Estimation• Ð is a data collection of concept• dt is the current number of click times of a concept• df is the current number of click times that a concept not be clicked• αt is the history number of click times of a concept
• αf is the history number of click times that a concept not be clicked
dptDP m )Ð|()Ð|( 1
d
dd
ddfftt dd
fftt
fftt 11 )1()()(
)(
ftft
tt
dd
d
April 19, 2023 Waseda University 8
Gradual Adaption Model
WebDocuments
Concept KB
Access Logs
Concept Analyser
Probability Estimator
Estimation Base
Input
GradualAdaptionRecommender
ShortMediumLongRemarkable / Exceptional
Matching
Search Query
Search
Click
Off-lineOn-line
April 19, 2023 Waseda University 9
Gradual Adaption Model
• We divide users’ interests into three terms of short, medium, long periods, and by remarkable, exceptional categories.
• This model is an adaptive one. – It can adapt to a transition of users’
information access behaviors.
• In the model, training is not needed, since the model uses Full Bayesian Estimation that has a learning function.
April 19, 2023 Waseda University 10
Gradual Adaption Model
Web Documents
Concept KB
Access Logs
Concept Analyser
Probability Estimator
Estimation Base
Input
GradualAdaptionRecommender
ShortMediumLongRemarkable / Exceptional
Matching
Search Query
Search
Click
Off-lineOn-line
April 19, 2023 Waseda University 11
Simulation and Evaluation
• Environment– Java, Tomcat, MySQL, and Nekohtml
• Data– Wikipedia on DVD Version 0.5
• more than 2000 web pages that belong to more than 180 concepts
April 19, 2023 Waseda University 12
Simulation and Evaluation
• Short period (such as 7 days / 1 week)– Test case
• This case is a user who has two interests, and these interests are affected by some factors easily.
• The expectation is that there is a possibility that the probability of the relation concept can change hugely in short or medium period, but not in long period.
• Two concepts of “Art” and “Artists” are assumed to be used, and the number of clicks is dynamically varying.
– Test result• The movement of the concept’s
rate changing frequently. • In some days, the probability of
concepts in short period is bigger than long period.
short period
0
0.2
0.4
0.6
0.8
1
1.2
Date
Pro
babi
lity
Art Artists Philosophers Philosophical thought movements
April 19, 2023 Waseda University 13
Simulation and Evaluation
• Medium period (such as 30 days / 1 month)– Test case
• This case is a user who has a temporary interest. • The user access the concept of temporary interest sometime. • The expectation is that this concept ought to keep a low rate in the
three periods.• One concept “Philosophers” is assumed to be used per three days,
– Test result• The change is becoming smaller. • But the probability of concepts in short period is bigger than medium period in some days.
medium period
00.10.20.30.40.50.60.70.80.9
Date
Pro
babi
lity
Art Artists Philosophers Philosophical thought movements
April 19, 2023 Waseda University 14
Simulation and Evaluation
• Long period (such as 90 days / 3 months)– Test case
• This case is a user who has a long-term interest. • The expectation is that the probability of the interested concept
ought to keep a high rate in long period.• One concept “Philosophical thought movements” is assumed to be
used everyday,
– Test result• The change becomes quite stable. • There is no big change in the
long period.
long period
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Date
Pro
babi
lity
Art Artists Philosophers Philosophical thought movements
April 19, 2023 Waseda University 15
Conclusion
• In this study, we have proposed a gradual adaption model (GAM) for estimation of user information access behavior.
• The three periods of GAM can correctly distinguish long-term and temporary interest of users even if has no system training.
April 19, 2023 Waseda University 16
Future Works
• To set more different patterns for short, medium and long periods to find more reasonable ones.
• To evaluate the proposed model with users' involvement.
• To compare our proposed approach with other related recommendation models.
Recommended