Upload
kenyon-craft
View
17
Download
0
Embed Size (px)
DESCRIPTION
incorporating personal information. brent chun sims296a-3. letizia. recommends web pages during browsing based on user profile learns user profile using simple heuristics passive observation, recommend on request provides relative ordering of link interestingness - PowerPoint PPT Presentation
Citation preview
incorporating personal information
brent chunsims296a-3
letizia
recommends web pages during browsing based on user profile
learns user profile using simple heuristics passive observation, recommend on request provides relative ordering of link interestingness assumes recommendations “near” current page are
more valuable than others
user letizia
user profile
heuristics recommendations
why is this useful?
tracks and learns user behavior, provides user “context” to the application (browsing)
completely passive: no work for the user consequences?
useful when user doesn’t know where to go no modifications to application: letizia interposes
between the web and the browser consequences?
consequences of passiveness
weak heuristics example: click through multiple uninteresting pages en
route to interestingness example: user browses to uninteresting page, heads to
nefeli for a coffee example: hierarchies tend to get more hits near root
cold start no ability to fine tune profile or express interest
without visiting “appropriate” pages
open issues
how far can passive observation get you? for what types of applications is passiveness sufficient?
profiles are maintained internally and used only by the application. some possibilities: expose to the user (e.g. fine tune profile) ? expose to other applications (e.g. reinforce belief)? expose to other users/agents (e.g. collaborative
filtering)? expose to web server (e.g. cnn.com custom news)?
personalization vs. closed applications others?
lifestreams
lifestream = time ordered stream of documents + filters + “agents”
filters provide views (like rdbms) called substreams “agents” attach to the ui, streams, and documents
provide (condition,action) pairs. no machine learning
A lifestreamsdocumentA lifestreams
documentA lifestreamsdocumentA lifestreams
documentA lifestreamsdocumentA lifestreams
document
new clone xfer find summ
A lifestreamsdocumentA lifestreams
document
lifestream operations
a lifestream
Oct 19, 1998
Oct 20, 1998
Oct 21, 1998
lifestreams assessment
linear stream of documents is a poor metaphor if used alone don’t tell me to abandon my hierarchies! problems: managing complexity, large “working sets”, etc.
stated problem: too many apps, too many file xfers, too many format xlations, too many hierarchies lifestreams don’t help with any of these and simply
replaces the fourth
most of techniques used apply equally well to hierarchies
no machine learning = more work for the user
lifestreams assessment cont.
filters are nice, but how do you write one? application-specific, but we already knew this example: “all the email I haven’t responded to”
agents are nice, but how do you write one? application-specific, but we already knew this
agents have limited applicability
open issues
new metaphors to manage complexity easy ways to create filters/agents allow “fuzzy” filters
lifestreams: filters need to be precisely specified use machine learning + user feedback to relax this
associate actions with filters tight integration of filters, agents w/ applications apply ideas in lifestreams to hierarchies others?
learning interface agents
add agents in the ui, delegate tasks to them use machine learning to improve performance
learn user behavior, preferences
useful when: 1) past behavior is a useful predictor of the future 2) wide variety of behaviors amongst users
examples: mail clerk: sort incoming messages in right mailboxes calendar manager: automatically schedule meeting
times?
advantages
1) less work for user and application writer compare w/ other agent approaches
no user programmingsignificant a priori domain-specific and user
knowledge not required
2) adaptive behavior agent learns user behavior, preferences over time
3) user and agent build trust relationship gradually claimed advantage: user constructs model of how agent
makes decision over time real users: do the right thing!
machine learning
1) learn by observation observe user, record (situation,action) pairs use “similar” past (situation,action) pairs to predict action
for new situations similarity = weighted difference of situation features weights assigned based on feature/action correlations algorithm
take n closest situations, compute scores for associated actions
recommend (or perform) action with highest score use (situation,action) pairs to explain recommendations
machine learning cont.
2) learn by user feedback indirect feedback (e.g. ignore recommendation) direct feedback (e.g. don’t do this again) database of priority ratings
3) learn by being trained train agent by giving examples of desired behavior e.g. save all messages from [email protected] in the
sims296a-3 mailbox
open issues
how far can black box treatment of apps get you? example: mail clerk integration w/ ui requires access to
application internals; what if this wasn’t the case? tight integration with application user interface access to internal events/state of significance easy way to enable third-party developers to write
personalization modules for applications?
chaining (situation,action) pairs to perform complex tasks e.g. monitor ACM digital library -> look for interesting
papers -> download them -> file them -> notify me via email -> print out.
others?
sonia
automatic construction of document clusters categorization based on full-text comparisons automatically classify new docs into existing clusters multiple cluster hierarchies imposed on same data examples: categorize search results into clusters,
categorize files in user’s home directory
classes
cs298-1 is290-2 is296a-3
project discussion
documents
featureselector
stemmer clusterer
documents
classifier
create clusters
classify documents
creating clusters
stemmer: e.g. walking, walked, walk -> walk feature selector
1) remove stopwords, e.g. the, and, is, ... 2) removes term with freq < 3 or freq > 1000
clusterer 1) hierarchical agglomerative clustering 2) iterative clustering technique document similarity based on term overlap cluster similarity = pairwise ave. of document
similarities
classifying documents
pachinko machine (bayesian classification) uses 50 “most informative features” for each
cluster significant reduction in computational cost claim: often sufficient for accurate classification
obvious trade-off between compute time vs. accuracy best case: compare new document with every
document in every cluster and assign, compute time may not justify gain in accuracy.
why is this useful?
useful to help understand contents of large collection of documents (e.g. results from a database query)
useful to automatically construct multiple categorizations of same data e.g. user may take the time to categorize personal files
in a single hierarchy, unlikely to do this in multiple ways
saves times by automatically classifying documents most applicable when consequences of error are low
open issues
adding importance, confidence to the system using document structure for weighting terms
(e.g. terms in abstract vs. terms in text) support for different document types (e.g. PS!) others?