Midsem presentation

Ubiquitous Search

ByShatabdi Kundu (2010EET2553)

Computer Technology,M.TechIIT Delhi

Email ID:[email protected]

Project Guide:Prof.Santanu Chaudhury

Electrical Engineering DepartmentIIT Delhi

Email ID:[email protected]

September 18, 2011

Shatabdi Kundu :: 2010EET2553, 1of 19 1/19

Outline

Introduction

State of the Art

Problem Statement

Contexts

Sources of Information

Work Done

Results

Future Work

References


Introduction

An Adaptive Learning Model Based on Context AwareSystems.

Conducting searches for users without them having tomanually submit any query.

Inferring user’s intentions.

Inferring User Context for Information Access- goal is tocreate a unified model of user information goals through theseamless integration of semantic knowledge withautomatically learned user profiles

Derivation of User Profiles from Document ClustersRepresentation of Domain KnowledgeUtilizing User Context for Web Search

Experiments with User Profiles


State of the Art

Currently there are many systems that have been working on :

user profile,user interests,his past searcheshis location

None of them have worked on user-specific events or analysedthe user’s facebook/twitter status.

There is no such push application that understands the user’sneeds and the global events around him.

No model that has worked on multimedia information.

All these motivate us to work on user-specific events(hiscalendar information and facebook/twitter updates of a user)and model a filter that pushes relevant information to user.


Problem Statement

Given the following:

a user’s profile,his interests,global events to occur in near future or that happened in thepasthis facebook/twitter statusthe GPS location of the userhis past expressed searches

the objective is to model a filter that pushes informationrelevant to user needs.


Input Signals to infer user intent

ContextLocation

1. Latitude and Longitude2. Venue3. Venue relationship to user4. User Movement5. Inferred User Activity

Weather

Time of day and date

News events near the user

Taste (Interests)

Past expressed intent(searches)

Behavior of other people towards the user


Sources of Information

User informationfrom user profiles fed to the model

Geolocation- The GPS coordinates of a user’s current location

Global Events- Current events around the user.

User-specific Events- From user’s daily calendar. Informationfrom facebook, twitter, etc.

User interests- From his profile as well as previously madesearches.


Web Personalization based on User Interests

We score a page returned by GOOGLE based on the user’sinterests and then return the results to the user as shown inthe figure below

To build the User Interest Hierarchy(UIH), we use the webpages in the user’s bookmarks or his documents and DivisiveHierarchy Clustering (DHC) algorithm.


Example of a UIH

The above figure describes how a User Interest Hierarchy iscreated from the documents or web pages that are interestingto the user.


Work Done

I have worked on the following contexts:

User InterestsUser profileTemporal Context

I have tried to find out the activity of a user given a certainset of conditions:

weatherdaytime of the dayuser’s age and gendernumber of friends of a userhis morning routine(whether school or office)


User Interests

Found out user interests from the web pages in his bookmarksand documents.

Filtered the search results returned by GOOGLE based on theinterests found.

For scoring a page I have used two techniques:

Uniform ScoringWeighted Scoring

Calculated rank of a web page for three groups of pages(top5, 10 and 15 ) returned by GOOGLE and then compared withGOOGLE’s ranking.

The values in each cell are the average of 22 search terms’(11subjects * 2 search terms) precision values.


User Interests Results contd...

Precision in Top 5, 10 and 15 for interesting web pagesRanking Methods Top 5 Top 10 Top 15

Google 0.34 0.277 .285US 0.31 0.323 0.315WS 0.34 0.314 0.327

The results show that our WS method was more accuratethan Google in two Top links (Top 10 and 15), while WS tieswith Google for Top 5.

In terms of the highest precision, WS showed highestperformance in two columns; Google showed in only onecolumn and the value is equal to WS.

Compared to US, WS showed higher precision in two (Top 5and 15) of the three columns. These results indicate that WSachieves the highest overall precision.


User Profile and environmental conditions

I have considered the following attributes and tried to find outthe user’s current activity using three classification algorithms.

age(below20, between20to30, abv30)gender(male,female)day(weekday, weekend)time(morning, noon, evening, night)number of user’s friends(less<5, between5to10, abv10)weather(sunny, rainy, overcast, clear)work/school is off or not(yes, no)

The user’s activity can be any of the following:play, movies, clubs, sleep, food, work, shopping, party, study,travel


Bayesian Network for the Model

The diagram above shows that user activity i.e. the root nodeis determined by the given conditions(children nodes).


Results of Classifying Test Data

I have tested the an unknown dataset(41 instances) usingthree supervised learning algorithms:

Naive BayesDecision TreeSupport Vector Polynomial

The classification accuracy for the three algorithms is asfollows:

Algorithm Correctly classified instances Classification AccuracyNaive Bayes 31 75.6098%

Decision Tree 29 70.7317%Support Vector Polynomial 36 87.561%


Temporal Context

Temporal LDA(Latent Dirichlet Allocation) to captureinformation about the user that may temporally change(i.e.change with respect to time).

This is achieved by adding a variable,K, to the existing modeland using this variable to calculate the probability of a changein the topic given the following:

the hyperparameters α and β of the existing modelthe topicsthe words that make up the topics

This variable K, can be thought of as a vertical split in theLDA model, which is our temporal split in the document list.

The limitation that the TLDA model faces is that the listmust be sorted in temporal order.


Results of Temporal Context

I created a dataset with a known split. I defined a splitlocation at document 4 in the dataset and forced the split tohave a significantly different set of words for the topics.

The dataset was comprised of a total of 10 documentscontaining words from a list of travel and business terms.

The results of splitting the document list temporally is asfollows:


Future Work

Work on User-specific events based on his daily calendar andfacebook/twitter status and updates.

Work on Global events that may affect a user.

Merge all the contexts and create a model that filters outresults pertaining to the user’s needs.

I will create a model that pushes multimedia information tousers based on his intentions and needs.


References

Hyoung-rae Kim and Philip K. Chan: “Personalized Ranking of Search Results with Learned User Interest

Hierarchies from Bookmarks”, Livingstone College, Department of Computer Sciences, Florida Institute of

Technology

“Towards a Temporal Latent Dirichlet Allocation Model”, Kristopher W. Reese, Patrick Shafto ,Computer

Science and Engineering

“Query recommendation using query logs in search engines”. Baeza-Yates, R.A., et al. In EDBT 2004

Workshop on Clustering Information over the Web, pages 588596, 2004.

“Challenges and Opportunities in Building Socially Intelligent Machines” By Laurel D. Riek and Peter

Robinson

Kim, H., Chan, P. K.: Identifying variable-length meaningful phrases with correlation functions, IEEE

International Conference on Tools with Artificial Intelligence, IEEE press (2004) 30-38