[IEEE 2010 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) - Hsinchu City, TBD, Taiwan (2010.11.18-2010.11.20)] 2010 International Conference

Using Wikipedia’s Content for Cross-Website Page Recommendations that Consider Serendipity

Pei-Chia Chang Information & Computer Sciences

University of Hawaii at Manoa, USA [email protected]

Luz M. Quiroga Information & Computer Sciences Library & Information Science

University of Hawaii at Manoa, USA [email protected]

Abstract— A majority of web personalization research concentrates on customizing a single website. On the contrary, recommending web pages across websites is the focus of this study. We emphasize that eliciting user interests among different topics within a domain is an important concern in cross-website page recommendations. Enhancing Wikipedia’s categorization system through heuristic information extraction, we constructed a system to study the coverage of a user’s interests in order to promote serendipity, new and interesting information, in recommendations. We compared our system’s performance regarding topicality and serendipity with the classical vector space model and obtained a slightly superior result.

Recommender; personalization;Wikipedia; information filtering

I. INTRODUCTION Our earlier work constructs a web page recommender by

using the ontological model we derived from the computer science (CS) categories and pages in Wikipedia as a knowledge base – WikipiBase [1]. The Wikipedia’s categorization system is augmented with extracted page keywords to formulate a semantic ontology for web page modeling and topical elicitation. Furthermore, a semantic user model is formulated by accumulating and integrating elicited topics from usage pages. Our recommender relies on the semantic user model and identifies topical diversity of a user’s interests for recommending pages across websites. Wikipedia is selected as a shared reference and a standardized knowledge base due to its accessibility and dynamic content. It is an ideal research platform for us to merge taxonomy (categories) with folksonomy (keywords), considering the advantages of both knowledge structures. Taxonomy may allow for conceptual association and disambiguation; folksonomy has the potential advantages of being ordinary and providing serendipity [2]

Our main research question is “Does our recommender

based on Wikipedia’s content provide semantically relevant recommendations, promoting serendipity, of pages from different websites in a selected domain?” We selected the CS domain for investigation due to its relatively rich categories and pages in Wikipedia. In this paper, evaluation metrics include two factors: topicality and serendipity. Topicality assesses whether a recommendation is related to

the subject area of a user’s interests [3]. Serendipity infers novelty, the degree to which the recommendation is new to a user and beyond what the user already knows [3], in a positive or interesting way.

II. RELATED WORK Content-based systems that recommend pages from

different websites include WebWatcher [4], Syskill & Webert [5], and WebMate [6]. Each of these user models is individually tied to its systems and provides no reusable semantic model, i.e. topical interests, to be utilized by another personalization system. In contrast, this work applies ontologies to capture a user’s topical interests using Wikipedia’s categories, which yields a simple and potentially interoperable model. Similar to our goal of semantic user modeling, Oberle et al. [7] apply usage mining to track users bases on concepts. Dai and Mobasher utilize usage metadata to formulate domain level profiles [8]. However, both studies focus on a single website and are not concerned with model reuse and serendipity in recommendations. Our system formulates a dynamic user model by matching keywords and mapping page topics from usage pages among different websites to Wikipedia’s categories. The user model can be readily reused and the subjects of categories, or domain scope for recommendations, can be controlled by users or by the system according to predefined areas of interest. This could possibly alleviate the worry of privacy concern in usage monitoring.

Furthermore, our work studies the coverage of a user’s

interests to promote serendipity in recommendations and increase the awareness of interest coverage and topical diversity. Related work on promoting diversity in recommendations include Symth’s discussion between the tradeoff of similarity and diversity [9] and Ziegler’s effort in looking into intra-similarity among different recommendation lists [10]. Both studies are concerned about the diversity given a selected topic, which differs from our focus on the diversity of a user’s interests in different topics within a domain. The identification of a user’s interests and interest coverage is important, as recommending pages

2010 International Conference on Technologies and Applications of Artificial Intelligence

978-0-7695-4253-9/10 $26.00 © 2010 IEEE

DOI 10.1109/TAAI.2010.55

293

across topics may go beyond content similarity to diversity or serendipity.

III. METHOD Figure 1 depicts our system's architecture. There are four major components in the system -- the crawler (top dashed box), the sensor, the Wikipedia knowledge base (WikiBase), and the matcher (bottom dashed box). The crawler utilizes the sensor to generate a corresponding content model for each newly fetched page and usage page. The sensor associates pages with categories based on WikiBase and updates the user model, which is an accumulation of content models. WikiBase consists of categories and keyword weights form Wikipedia. The matcher compares each page’s content model with the user model and provides recommendations. Details are as follow.

Figure 1 System Architecture

A. Constructing WikiBase Our system automatically maps a new page to Wikipedia’s categories based on WikiBase. We construct WikiBase by augmenting Wikipedia’s categories with heuristic information extraction to obtain keywords from pages belonging to the same category. Heuristics include page titles, categorical labels, anchor texts, italic and bold terms, and terms with a high Term Frequency - Inverse Document Frequency (TF-IDF) score. These keywords are extracted as a collection respectively for each category in Wikipedia. Each keyword has a significance weight that the sensor utilizes for modeling web pages. The weight is assigned based on the frequency of keywords appearing among all Wikipedia’s pages belong to the same category.

B. Modeling Pages with Wikipedia’s Categories The system maps a usage or a crawler-fetched page into

a content model by sensing any keyword that appears in the page, based on each category’s collection in WikiBase. The system then associates the page with the corresponding categories according to the keyword’s weight in WikiBase. The mapping process formulates a “categorical” or “semantic” vector space model of available categories in WikiBase for each usage or crawler-fetched page.

C. Interest Elicitation through Accumulating Usage Pages The sensor formulates the user model based on

extracting categorical topics from usage pages. Similar to the content model, the user model is a vector of available categories in WikiBase. Whenever the system maps a usage page into a content model, the sensor increases the value of the page’s relevant categories in the user vector model as well. Therefore, if a user accesses a specific topic multiple times or accesses multiple pages related to the same topic, the user model will score higher in the corresponding topical category. Thus, the user model constantly evolves because of usage updates.

D. Considering Interest Coverage in Recommendations The matcher compares the cosine similarity of the

crawler-retrieved pages with the user model and then generates recommendations. The comparison is straightforward because both content and user models are a categorical vector of the same dimension. As a unique contribution of our system, the matcher also relies on the ontological structure of WikiBase, or the topology of Wikipedia’s categories, to identify interesting topics and topical clusters according to the user model. Applying the concept of the minimal spanning tree, we further calculate the coverage of a user’s topical interests within the ontological structure as a Diversity Index (DI) [11]. We define interest coverage as the semantic association among identified categories of potential interest to a user within a selected domain. To calculate the DI value of a user, the system traverses all identified categories based on the ontological structure and formulates a minimal spanning tree. Edges are weighed differently according to the relation (hierarchical, adjacent, or associative) between two categories. DI is the cumulative weights of all edges in the tree and is normalized to a 100% scale. Additionally, topical clusters are located by grouping identified categories together based on the neighboring or hierarchical relationship.

Using DI is our main strategy to diversify

recommendations. The topic of a recommended page will be alternated among all identified topics of potential interest to a user, according to the user’s DI value. If a user has a DI value of 90%, the system will have a 90% chance of switching the next recommended page from one topic to another within all identified topics. On the other hand, if a user’s interests are more specific e.g. a DI value of 35%, the system will tend to recommend the same or similar identified topics. Considering DI in recommendation strategy, the system provides topical recommendations based on the interest coverage of a user. The importance of interest coverage is dual. On one hand, the coverage of a user’s interests may imply the user’s expertise in a domain. Page recommendations could consider recommending diverse topics for novices and focusing on a single or similar topic for experts. On the

294

other hand, we intend to increase the chance to gain valuable, memorable or interesting information, i.e. serendipity, through a recommender system that considers diversity of topical interests.

IV. EVALUATION Recommendations were assessed based on user ratings on

a scale of 1 (strongly disagree) to 5 (strongly agree) with respect to topicality and serendipity. The question statement for participants regarding topicality is: “I feel that the information provided in this page is relevant to at least one topic of my interests – providing information related to the subject area(s) of my interests.” Serendipity incorporates the concept of novelty and interestingness. The question statement for participants regarding serendipity is “I feel that the information in this page is novel -- providing unexpected and interesting information to me.” Precision is used to measure relevance on topicality and serendipity. We compared our system’s recommendations with those generated by the classical vector space model (VSM) [12] that applies cosine similarity as well.

Table 1 Interest Topics

algorithms, artificial intelligence, complexity theory, computer graphics, computational geometry, data format, data mining, databases, computer programming, computer science award,

ecommerce, formal language theory, human computer interaction, informatics, information theory, information technology industry,

management of information system, multimedia, mobile networking, parallel computing, programming languages, quantum computing, (computer/data) security, theoretical computer science,

web hosting, and noisy topics within CS

We recruited 23 working CS professionals as participants to evaluate our main research question in an online survey setting. Participants were pre-screened to be interested in at least one topic in Table 1. They were recruited through emailing college alumni of the computer science department. Participants provided their browsing histories (mostly 1 month) from the logs of their preferred web browser. A log extraction tool was provided to them to select only CS relevant page visits. We examined consistent visits to CS related pages for all participants to avoid biased sampling, e.g. visiting numerous CS pages only during a short length of the whole usage period. Given usage pages, the system generated recommendations from a standardized page pool. There were 632 randomly selected web pages about Table 1’s topics in the pool, with equal proportions of pages per topic plus noisy pages in other arbitrary topics of the CS domain. The selection included pages from search engine results and directory pages. We input the topics as query words in Google & Yahoo and obtained commonly appeared or top-ranked results. Directory pages refer to pages that are classified as one of Table 1’s topics in Yahoo Directory, Open Directory or Google Directory. In order to control confounding variables,

such as page length, or time-sensitive information, that may influence a user’s ratings, we selected pages containing no longer than 2000 words and have been updated within the past 3 years. Opinion-related or subjective pages were excluded.

In the VSM-based recommendations, we processed the one month usage pages as a single vector. All pages in the pool were converted into VSM as well. We selected recommendations by comparing the usage vector with every page vector in the pool and identifying the 15 most similar pages. Therefore, there were 15 pages selected respectively for the two types of recommendations – system generated and VSM-based. Both types of recommendations were mixed randomly and presented together for user evaluation regarding relevance on topicality and serendipity. For each user, we took the average of the 15 ratings for both types in the result analysis. To allow for potential explanations about ratings, participants have to provide a few topical keywords and optional comments. Additionally, they rated 10 categorical keywords predicted by our system as their interests. These 10 ratings were averaged as well. Our hypotheses include H1) there is no significant difference of the topicality ratings between our system and the VSM-based system, and H2) the serendipity ratings of our system are higher than the VSM-based ones.

A. Results

Figure 2 Ratings on Topicality

We conducted both a paired t-test and the Wilcoxon Signed-Rank test to compare the rating difference between our system and the VSM-based system. The t-test requires a normal distribution while the Wilcoxon Signed-Rank test does not, both of which are commonly used in identifying the significant ratings difference between two systems. Results indicate a significant difference (p < 0.05*) of our system’s performance in both topicality (blue and red bars in Figure 2) and serendipity (blue and red bars in Figure 3) using either test.

295

Figure 3 Ratings on Serendipity

Additionally, we examine potential factors that may

influence the ratings on topicality and serendipity: days of usage data, unique usage pages, and the amount of page visits (considering page revisits). Results show that there is a correlation between topicality ratings and days of usage data (R = 0.4401, p <= 0.035); another one between serendipity ratings and unique usage pages (R = 0.4677, p <= 0.024). In Figure 2, the green triangles and the right vertical axis indicate the days of usage data for each user. Similarly, Figure 3’s yellow triangles indicate the number of unique usage pages.

Figure 4 Interest, Topicality, Serendipity Ratings & Diversity Index

Figure 4 displays, for each user, the system-calculated diversity index (DI) values on the left axis, ratings on categorical keywords predicted by our system as user interests, and ratings on topicality and serendipity on our system on the right axis. There is a correlation between topicality and interests ratings (R = 0.5339, p <= 0.009). This is expected since recommendations are generated based on predicted user interests. Other than that, there is no clear relation among other combinations of the four examined variables. More analysis will be explored in discussion next.

B. Discussion Hypotheses: H1 proposes a no difference of the topicality

ratings between our system and the VSM-based system. Both the paired t-test and Wilcoxon Signed-Rank test (two tail) indicate a significant difference among the ratings of those 23 participants. Therefore, H1 is rejected. Furthermore, H2 states a directional difference of the serendipity ratings between the two systems. Applying the same tests of H1 with one tail to serendipity ratings indicates a significant difference between the two. Therefore, H2 is accepted. Pictorially speaking, our system generally performs better than the VSM-based system, except for a few participants, which will be explored more next.

Possible Explanations for Lower Ratings: Among the

23 participants, we will explore those who evaluated our system’s recommendations with lower ratings than the VSM-based ones regarding topicality or serendipity In Figure 2, Participant Id 21 and 23 have higher ratings on VSM-based recommendations than on our system. Similarly, participant Id 18-20 rated VSM-based recommendation slightly higher than our system’s regarding topicality. In Figure 3, Participant Id 21-23 rated VSM-based recommendations higher than our system regarding serendipity. According to both figures, Id 21 and 23 seem to prefer VSM over ours. A closer look at their data that are not precisely shown in both figures reveals that Id 21 has only 11 page visits (all unique) over 7 days; Id 23 has 363 unique page visits (419 visits are considered revisits) over the 32 days of usage data, which is the highest visit count among the participants.

Most participants provided approximately one month of

usage data so the amount of usage data for Id 21 is obviously lower than others. Similarly, Id 22 only has 11 days of usage data. This may indicate that our system tends to perform worse than VSM in terms of topicality or serendipity in case of too little or too much usage data. Indeed, correlation analysis shows a relation between topicality ratings and days of usage data as well as a relation between serendipity ratings and unique usage pages. Possible explanations are underfitting or overfitting of the usage data to the user model, which can cause the lower ratings. Nevertheless, given the subjective ratings and limited sample size, it is difficult to generalize our speculation of the rating trend. Furthermore, the rating differences between participants using the same scale may cause certain biases.

Additionally, usage data may include a few biased

samples as case Id 21 shown. A further inquiry to Id 21 discloses that the usage data are not representative of his behavior. Data were not logged from his most preferred browser. Even though he provided one month of usage data, page visits that are relevant to the CS domain cover only 7 days. This is the danger of using self-provided data but we compromised and accepted it due to privacy concerns. Participants may not trust installing “spyware” on their computers to monitors their website browsing usage, despite the fact that only CS relevant pages are our focus.

296

User Interests Prediction: Our systems predicted 10 categories to represent a user’s interests. The orange triangles in Figure 4 display the average ratings of the predications for each participant. Our system seems predict user interests well because most of the ratings are above 3 except for Id 8 (2) and Id 21 (2.75). Therefore, Id 8 has a relative lower ratings compare to other participants. The case of Id 21 was discussed earlier. However, a closer look at the ratings from Id 8 found an inconsistency in his ratings. His self-reported interests include “data mining” and “database structure,” both of which were predicated but rated only as 3. A majority of Id 8’s ratings are at most 3 out of the 1-5 scale for either interests or recommendations, with only 2 exceptions. As a result, removing the two outlier cases of Id 8 and 21, our system generally predicts user interest with an average accuracy of 3.57 out of 5 for 21 participants. Nevertheless, the issue of rating inconsistency suggests the future work to design system functions for inconsistency detection, explanation (e.g. interest changes), and resolution.

An interesting issue is that interest prediction lies in the

specificity of WikiBase’s ontological structure. Almost all general or broader topics of user interests could be correctly identified by our system, but not those fine grained topics, due to specific categories were omitted during our extraction from Wikipedia. To reduce the evaluation complexity, we selected categories-subcategories or categories–neighbors only up to depth 2. Therefore, WikiBase does not include fine-grained categories for discerning specific user interests. This could also be due to the fact that the nature of taxonomy is to group finer information together as a general category. Despite of the lack of fine granularity, our system alternatively identifies a boarder category that better fit. For example, Id 11 is interested in “web application programming” while our system identified “software engineering or computer programming” and missed the “web application” scope. In view of this, metadata for categorical topics may be needed. Possible solutions include increasing the complexity of the ontological structure, or analyzing the co-occurring topical interests and then inferring a contextual metadata. They are potential future work.

Diversity Index and Interest Clusters: Figure 4 does not

disclose any relation between either topicality or serendipity ratings on our system and participants’ DI values. Generally speaking, most of the DI values and the identified clusters capture participants’ coverage and interests. Certain participants’ DI values may not truly reflect their interest coverage due to missing categories in our WikiBase. All participants have a DI value ranging from 50 to 91 out of 100, except for Id 14 (38) and 15 (35). Interestingly, Id 14 and 15 have relatively higher ratings on topicality compared to other participants. On the other hand, Id 21 has the highest DI value but the lowest topicality and serendipity ratings. The cases of Id 14, 15 and 21 suggest a possible relationship between DI and topicality ratings. Nevertheless, this phenomenon could be for the three cases only.

Page Topics: We studied the ratings regarding topicality and serendipity on our system of all participants’ 15 recommendations’ ratings regarding topicality and serendipity on our system and identified the following points. First, for relevant pages that provide fundamental information of a topic (e.g. SQL standard specification for SQL), there is a tendency of lower ratings regarding serendipity among participants, despite the fact that the topic is highly relevant to them. This is due to the same issue that our system is currently unable to identify page topics to a finer degree. Therefore, identifying the level (e.g. beginning, intermediate, advanced etc) of information provided in a page is another area for potential future work.

Second, participants’ rating patterns vary. There are four

participants whose ratings regarding topicality are at least 80% the same as ratings regarding serendipity. They understood the survey questions correctly and claimed that those topically relevant pages were also novel and interesting to them as well if they did not see those pages before. On the other hand, for pages discussing popular topics (e.g. cloud computing), ratings are low in topicality but high in serendipity for pages discussing popular topics (e.g. cloud computing). Some participants are not familiar with those hot topics; some others are interested in them only for future reference. However, they all feel good to know those popular topics. This pattern occurs occasionally among participants.

Third, in a few cases, it is reasonable to speculate that our

page pool does not provide any pages that cover a user’s specific interests. Excluding noisy topics, there are approximately 20 pages for each topic in Table 1. For example, Id 9 reported her interests as “GIS, geo-visualization, and multivariate.” Our page pool does not include any of them. However, the closest topic that is related to visualization is computer graphics, which is indeed covered in her recommendations. Although not self-reported in the online survey, Id 9 was confirmed to be interested in computer programming during our recruitment prescreening. In view of this issue, a further measurement on recall is needed to reveal if lower ratings are caused by missing topics in the page pool. However, it would be difficult for each participant to rate all 632 pages in the pool.

Participant Comments: we analyzed the participants’

comments and identified the following points. First, serendipity is somehow related to the timeliness of a web page, especially for those time-sensitive websites, such as technology gadget or news. Second, there is a perception difference of topical granularity among participants. Some participants define it as providing recommendations in very specific topics of their interests while others accepted recommendations in general or marginal topics as relevant. This could be attributed to the fact that topical relevance is subjective and individuals’ boundaries of relevance vary. Third, other page-related factors (e.g. timeliness, opinions, presentation style, authors/sources or context etc.) influence the ratings as well.

297

V. CONCLUSION In conclusion, our system’s performance on topicality

and serendipity is significant better than the classical VSM model. This is possibly due to the enhancement of the ontological vector space model in WikiBase and due to the utilization of the topology of Wikipedia’s categories to examine the coverage of user’s interests. Recommendations are diversified and the performance regarding serendipity is thus better than VSM-based system.

VI. FUTURE WORK Our discussion covers certain directions for future work

including increase the granularity of WikiBase ontology, consider the level (beginner or advanced etc.) of information provided in web pages in recommendations, detect and resolve user inconsistency or interest changes, conduct a recall measurement, and examine the representativeness of participants and page pool sampling. We are looking forward seeing the results of a larger scale evaluation.

REFERENCES [1] Chang, P.C. and L.M. Quiroga. Using Wikipedia Content to Derive

an Ontology for Modeling and Recommending Web Pages across Systems. in ACM RecSys'09 Workshop on Recommender Systems & the Social Web. 2009. New York.

[2] Ohkura, T., Y. Kiyota, and H. Nakagawa. Browsing System for Weblog Articles based on Automated Folksonomy. in WWW 2006. 2006.

[3] Xu, Y. and Z. Chen, Relevance judgment: What do information users consider beyond topicality? J. Am. Soc. Inf. Sci. Technol., 2006. 57(7): p. 961-973.

[4] Joachims, T., D. Freitag, and T. Mitchell. WebWatcher: A Tour Guide for the World Wide Web. in Fifteenth International Joint Conference on Artificial Intelligence. 1997: Morgan Kaufmann.

[5] Pazzani, M., J. Muramatsu, and D. Billsus. Syskill & Webert: Identifying Interesting Web Sites. in Thirteenth National Conference on Artificial Intelligence. 1996. Portland, Oregon, United States: AAAI Press.

[6] Chen, L. and K. Sycara. WebMate: a personal agent for browsing and searching. in second international conference on Autonomous Agents. 1998. Minneapolis, Minnesota, United States

[7] Oberle, D., et al. Conceptual User Tracking. in Advances in Web Intelligence. 2003.

[8] Dai, H. and B. Mobasher, Using Ontologies to Discover Domain-Level Web Usage Profiles, in 2nd workshop on Semantic Web Mining, PKDD. 2002: Helsinki, Finland.

[9] Smyth, B. and P. McClave, Similarity vs. Diversity, in Proceedings of the 4th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development. 2001, Springer-Verlag.

[10] Ziegler, C.-N., G. Lausen, and L. Schmidt-Thieme. Taxonomy-driven computation of product recommendations. in ACM international conference on Information and knowledge management. 2004. Washington, D.C.: ACM.

[11] Chang, P.C. and L.M. Quiroga. Using Wikipedia's Categories to Study the Coverage of a User's Interests for a Web Page Recommender. in Poster and Demo Adjunct Proceedings of the 18th International Conference on User Modeling, Adaption, and Personalization (UMAP 2010). 2010. Hawaii, USA.

[12] Salton, G., A. Wong, and C.S. Yang, A vector space model for automatic indexing. Commun. ACM, 1975. 18(11): p. 613-620.

298

Documents

[IEEE 2010 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) - Hsinchu City, TBD, Taiwan (2010.11.18-2010.11.20)] 2010 International Conference