View
284
Download
0
Embed Size (px)
DESCRIPTION
Ubiquitous Search
Citation preview
Ubiquitous Search
ByShatabdi Kundu (2010EET2553)
Computer Technology,M.TechIIT Delhi
Email ID:[email protected]
Project Guide:Prof.Santanu Chaudhury
Electrical Engineering DepartmentIIT Delhi
Email ID:[email protected]
September 18, 2011
Shatabdi Kundu :: 2010EET2553, 1of 19 1/19
Outline
Introduction
State of the Art
Problem Statement
Contexts
Sources of Information
Work Done
Results
Future Work
References
Shatabdi Kundu :: 2010EET2553, 2of 19 2/19
Introduction
An Adaptive Learning Model Based on Context AwareSystems.
Conducting searches for users without them having tomanually submit any query.
Inferring user’s intentions.
Inferring User Context for Information Access- goal is tocreate a unified model of user information goals through theseamless integration of semantic knowledge withautomatically learned user profiles
Derivation of User Profiles from Document ClustersRepresentation of Domain KnowledgeUtilizing User Context for Web Search
Experiments with User Profiles
Shatabdi Kundu :: 2010EET2553, 3of 19 3/19
State of the Art
Currently there are many systems that have been working on :
user profile,user interests,his past searcheshis location
None of them have worked on user-specific events or analysedthe user’s facebook/twitter status.
There is no such push application that understands the user’sneeds and the global events around him.
No model that has worked on multimedia information.
All these motivate us to work on user-specific events(hiscalendar information and facebook/twitter updates of a user)and model a filter that pushes relevant information to user.
Shatabdi Kundu :: 2010EET2553, 4of 19 4/19
Problem Statement
Given the following:
a user’s profile,his interests,global events to occur in near future or that happened in thepasthis facebook/twitter statusthe GPS location of the userhis past expressed searches
the objective is to model a filter that pushes informationrelevant to user needs.
Shatabdi Kundu :: 2010EET2553, 5of 19 5/19
Input Signals to infer user intent
ContextLocation
1. Latitude and Longitude2. Venue3. Venue relationship to user4. User Movement5. Inferred User Activity
Weather
Time of day and date
News events near the user
Taste (Interests)
Past expressed intent(searches)
Behavior of other people towards the user
Shatabdi Kundu :: 2010EET2553, 6of 19 6/19
Sources of Information
User information- from user profiles fed to the model
Geolocation- The GPS coordinates of a user’s current location
Global Events- Current events around the user.
User-specific Events- From user’s daily calendar. Informationfrom facebook, twitter, etc.
User interests- From his profile as well as previously madesearches.
Shatabdi Kundu :: 2010EET2553, 7of 19 7/19
Web Personalization based on User Interests
We score a page returned by GOOGLE based on the user’sinterests and then return the results to the user as shown inthe figure below
To build the User Interest Hierarchy(UIH), we use the webpages in the user’s bookmarks or his documents and DivisiveHierarchy Clustering (DHC) algorithm.
Shatabdi Kundu :: 2010EET2553, 8of 19 8/19
Example of a UIH
The above figure describes how a User Interest Hierarchy iscreated from the documents or web pages that are interestingto the user.
Shatabdi Kundu :: 2010EET2553, 9of 19 9/19
Work Done
I have worked on the following contexts:
User InterestsUser profileTemporal Context
I have tried to find out the activity of a user given a certainset of conditions:
weatherdaytime of the dayuser’s age and gendernumber of friends of a userhis morning routine(whether school or office)
Shatabdi Kundu :: 2010EET2553, 10of 19 10/19
User Interests
Found out user interests from the web pages in his bookmarksand documents.
Filtered the search results returned by GOOGLE based on theinterests found.
For scoring a page I have used two techniques:
Uniform ScoringWeighted Scoring
Calculated rank of a web page for three groups of pages(top5, 10 and 15 ) returned by GOOGLE and then compared withGOOGLE’s ranking.
The values in each cell are the average of 22 search terms’(11subjects * 2 search terms) precision values.
Shatabdi Kundu :: 2010EET2553, 11of 19 11/19
User Interests Results contd...
Precision in Top 5, 10 and 15 for interesting web pagesRanking Methods Top 5 Top 10 Top 15
Google 0.34 0.277 .285US 0.31 0.323 0.315WS 0.34 0.314 0.327
The results show that our WS method was more accuratethan Google in two Top links (Top 10 and 15), while WS tieswith Google for Top 5.
In terms of the highest precision, WS showed highestperformance in two columns; Google showed in only onecolumn and the value is equal to WS.
Compared to US, WS showed higher precision in two (Top 5and 15) of the three columns. These results indicate that WSachieves the highest overall precision.
Shatabdi Kundu :: 2010EET2553, 12of 19 12/19
User Profile and environmental conditions
I have considered the following attributes and tried to find outthe user’s current activity using three classification algorithms.
age(below20, between20to30, abv30)gender(male,female)day(weekday, weekend)time(morning, noon, evening, night)number of user’s friends(less<5, between5to10, abv10)weather(sunny, rainy, overcast, clear)work/school is off or not(yes, no)
The user’s activity can be any of the following:play, movies, clubs, sleep, food, work, shopping, party, study,travel
Shatabdi Kundu :: 2010EET2553, 13of 19 13/19
Bayesian Network for the Model
The diagram above shows that user activity i.e. the root nodeis determined by the given conditions(children nodes).
Shatabdi Kundu :: 2010EET2553, 14of 19 14/19
Results of Classifying Test Data
I have tested the an unknown dataset(41 instances) usingthree supervised learning algorithms:
Naive BayesDecision TreeSupport Vector Polynomial
The classification accuracy for the three algorithms is asfollows:
Algorithm Correctly classified instances Classification AccuracyNaive Bayes 31 75.6098%
Decision Tree 29 70.7317%Support Vector Polynomial 36 87.561%
Shatabdi Kundu :: 2010EET2553, 15of 19 15/19
Temporal Context
Temporal LDA(Latent Dirichlet Allocation) to captureinformation about the user that may temporally change(i.e.change with respect to time).
This is achieved by adding a variable,K, to the existing modeland using this variable to calculate the probability of a changein the topic given the following:
the hyperparameters α and β of the existing modelthe topicsthe words that make up the topics
This variable K, can be thought of as a vertical split in theLDA model, which is our temporal split in the document list.
The limitation that the TLDA model faces is that the listmust be sorted in temporal order.
Shatabdi Kundu :: 2010EET2553, 16of 19 16/19
Results of Temporal Context
I created a dataset with a known split. I defined a splitlocation at document 4 in the dataset and forced the split tohave a significantly different set of words for the topics.
The dataset was comprised of a total of 10 documentscontaining words from a list of travel and business terms.
The results of splitting the document list temporally is asfollows:
Shatabdi Kundu :: 2010EET2553, 17of 19 17/19
Future Work
Work on User-specific events based on his daily calendar andfacebook/twitter status and updates.
Work on Global events that may affect a user.
Merge all the contexts and create a model that filters outresults pertaining to the user’s needs.
I will create a model that pushes multimedia information tousers based on his intentions and needs.
Shatabdi Kundu :: 2010EET2553, 18of 19 18/19
References
Hyoung-rae Kim and Philip K. Chan: “Personalized Ranking of Search Results with Learned User Interest
Hierarchies from Bookmarks”, Livingstone College, Department of Computer Sciences, Florida Institute of
Technology
“Towards a Temporal Latent Dirichlet Allocation Model”, Kristopher W. Reese, Patrick Shafto ,Computer
Science and Engineering
“Query recommendation using query logs in search engines”. Baeza-Yates, R.A., et al. In EDBT 2004
Workshop on Clustering Information over the Web, pages 588596, 2004.
“Challenges and Opportunities in Building Socially Intelligent Machines” By Laurel D. Riek and Peter
Robinson
Kim, H., Chan, P. K.: Identifying variable-length meaningful phrases with correlation functions, IEEE
International Conference on Tools with Artificial Intelligence, IEEE press (2004) 30-38
Shatabdi Kundu :: 2010EET2553, 19of 19 19/19