Approaches to Modeling and Learning User Preferences

Approaches to Modeling and Learning User Preferences

Marie desJardins

University of Maryland Baltimore County

Presented at SRI International AI Center

March 10, 2008

Joint work with Fusun Yaman, Michael Littman, and Kiri Wagstaff

Overview

Representing Preferences Learning Planning Preferences Preferences over Sets Directions / Conclusions

Representing Preferences

What is a Preference? (Partial) ordering over outcomes

Feature vector representation of “outcomes” (aka “objects) Example: Taking a vacation. Features:

Who (alone / family) Where (Orlando / Paris) Flight type (nonstop / onestop / multistop ) Cost (low / medium / high) …

Languages: Weighted utility function CP-net Lexicographic ordering

Weighted Utility Functions Each value vij of feature fi has an

associated utility uij

Utility Uj of object oj = <v1j, v2j, …, vkj>: Uj = ∑i wj uij

Commonly used in preference elicitation Easy to model Independence of features is convenient Flight example:

U(flight) = .8*u(Who) + .8*u(Cost)+.6*u(Where) + .4*u(Flight Type) …

CP-Nets

Conditional Preference Network Intuitive, graphical representation of conditional

preferences under a ceteris paribus (“all-else-being equal”) assumption

who family > alone

where family : Orlando > Parisalone : Paris > Orlando

I prefer to take a vacation with my family, rather than going aloneIf I am with my family, I prefer Orlando to ParisIf I am alone, I prefer Paris to Orlando

Every CP-net induces a preference graph on outcomes:

The partial ordering of outcomes is given by the transitive closure of the preference graph

Induced Preference Graph

family > alone

family : Orlando > Parisalone : Paris >Orlando

who

where

alone Orlando

family Orlando

alone Paris

family Paris

Lexicographic Orderings Features are prioritized with a total ordering f1, …, fk

Each value of each feature is prioritized with a total ordering, vi1…vim

To compare o1 and o2: Find the first feature in the feature ordering on which

o1 and o2 differ Choose the outcome with the preferred value for that feature

Travel example: Who > Where > Cost > Where > Flight-Type > …

Family > Alone Orlando > Florida … Cheap > Expensive

Representation Tradeoffs Each representation has some limitations Additive utility functions can’t capture conditional

preferences, and can’t easily represent “hard” constraints or preferences

CP-nets, in general, only give a partial ordering, can’t model integer/real features easily, and can’t capture tradeoffs

Lexicographic preferences can’t capture tradeoffs, and can’t represent conditional preferences

Learning Planning Preferences

Planning AlgorithmsPlanning Algorithms

Domain-independent Inputs: initial state, goal state, possible actions Domain-independent but not efficient

Domain-specific Works for only one domain (Near-) optimal reasoning Very fast

Domain-configurable Use additional planning knowledge to customize the

search automatically Broadly applicable and efficient

Domain Knowledge for PlanningDomain Knowledge for Planning

Provide search control information Hierarchy of abstract actions (HTN operators) Logical formulas (e.g., temporal logic)

Experts must provide planning knowledge May not be readily available Difficult to express knowledge declaratively

Learning Planning KnowledgeLearning Planning Knowledge

Alternative: Learn planning knowledge by observation (i.e., from example plans)

Possibly even learn from a single complex example DARPA’s Integrated Learning Program

Our focus: Learn preferences at various decision points Charming Hybrid Adaptive Ranking Model

Currently: Learns preferences over variable bindings Future: Learn goal and operator preferences

HTN: Hierarchical Task NetworkHTN: Hierarchical Task Network

Objectives are specified as high-level tasks to be accomplished

Methods describe how high-level tasks are decomposed down to primitive tasks

travel(X,Y)

short-distance travel

long-distance travel

buyTicket(Ax,Ay) fly(Ax,Ay)travel(X,Ax)

getTaxi(X) rideTaxi(X,Y) payDriver

travel(Ay,Y)

Primitive actions

High-level taskstravel(X,Y)

travel(X,Y)

HTN operators

CHARM: Charming Hybrid CHARM: Charming Hybrid Adaptive Ranking ModelAdaptive Ranking Model

Learns preferences in HTN methods Which objects to choose when using a particular

method? Which flight to take? Which airport to choose?

Which goal to select next during planning? Which method to choose to achieve a task?

By plane or by train?

Preferences are expressed as lexicographic orderings A natural choice for many (not all) planning

domains

Summary of CHARMSummary of CHARM

CHARM learns a preference rule for each method. Given: an HTN, initial state, and the plan tree Find: an ordering on variable values for each decision point

(planning context) CHARM has two modes

Gather training data for each method Orlando = (tropical, family-oriented, expensive) is preferred to

Boise = (cold, outdoors-oriented, cheap) Learn preference rule in each method

Preference Rules Preference Rules

A preference rule is a function that returns <, =, or >, given two objects represented as vectors of attributes.

Assumption: Preference rules are lexicographic For every attribute there is a preferred value There is a total order on the attributes representing the

order of importanceA warm destination is preferred to a cold one. Among

destinations of the same climate, an inexpensive one is better than an expensive one….

Learning Lexicographic Preference Learning Lexicographic Preference ModelsModels

Existing algorithms return one of many models consistent with the data

The worst case performance of such algorithms is worse than random selection Higher probability of poor performance if there are fewer

training observations A novel democratic approach: Variable Voting

Sample the possible consistent models Implicit sampling: models that satisfy certain properties are

permitted to vote Preference decision is based on the majority of votes

Variable VotingVariable Voting

Given a partial order, <, on the attributes and two objects, A and B: D={ attributes that are different in A and B } D*={ most salient attributes in D with respect to < } The object with the largest number of preferred values for

the attributes in D* is the preferred object

X1 X2 X3 X4 X5

A 1 0 1 0 0

B 0 0 1 1 1

Learning Variable Ranks Initially, all attributes are

equally important Loop until ranks converge:

Given two objects, predict a winner using the current beliefs

If the prediction was wrong, decrease the importance of the attribute values that led to the wrong prediction

The importance of an attribute never goes beyond its actual place in the order of attributes

Mistake bounds algorithm, learns from its mistakes Mistake bound is O( n2 ),

where n is the number of attributes

Democracy vs. Autocracy

VariableVoting

Preferences Over Sets

Preferences over Sets

Subset selection applications:Remote sensing, sports teams, music playlists,

planning

Ranking, like a search engine?Doesn’t capture dependencies between items

Encode, apply, learn set-based preferences

+

Complementarity

+

Redundancy

User Preferences

Depth: utility function (desirable values)

Diversity: variety and coverage

Geologist: near + far views (context)

Example: prefer images with with more rock than sky

Rock: 25%Soil: 75%Sky: 0%

Rock: 10%Soil: 50%Sky: 40%

Encoding User Preferences

DD-PREF: a language for expressing preferred depth and diversity, for sets

utility.

Sky

SoilRock

Depth Diversity

or ?

Finding the Best Subset

Maximize

where

Depth

Diversity

utility of subset s

per-item utility

diversity value of s

per-feature diversity(1 - skew)

subset valuation

subsetpreference

Learning Preferences from Examples

Hard for users to specify quantitative values (especially with more general quality functions)

Instead, adopt a machine learning approach

1. Users provide example sets with high valuation

2. System infers:

• Utility functions

• Desired diversity

• Feature weights

• Once trained, the system can select subsets of new data (blocks, images, songs, food)

Depth: utility functions

Probability density estimation: KDE (kernel density estimation) [Duda et al., 01]

Diversity: average of observed diversities

Feature weights: minimize difference between computed valuation and true valuation

BFGS bounded optimization [Gill et al., 81]

Learning a Preference Model

% Sky

% Rock

Results: Blocks World

Compute valuation of sets chosen by true preference, learned preference, and random selection

As more training sets are available, performance increases (learned approximates true)

Mosaic Tower

Lower baseline

Rover Image Experiments

MethodologySix users: 2 geologists, 4 computer scientists

Five sets of 20 images each Each user selects a subset of 5 images from each set

EvaluationLearn preferences on (up to 4) examples,

select a new subset from a held-out setMetrics:

Valuation of the selected subset Functional similarity between learned preferences

Learned Preferences

Subset of 5 images, chosen by a geologist, from 20 total

Learned diversities:

Rock 0.8Soil 0.9Sky 0.5

Learned feature weights:

Rock 0.3Soil 0.1Sky 1.0

Learned utility functions:Sky

Soil

Rock

Subset SelectionSubset of 5 images, chosen by a geologist, from 20 total

5 images chosen from 20 images, using greedy DD-Select and learned prefs

5 images chosen by the same geologist from the same 20 new images

Future Directions

Future Directions Hybrid preference representation

Decision tree with lexicographic orderings at the leaves

Permits conditional preferences How to learn the “splits” in the tree?

Support operator, goal orderings for planning

Incorporate concept of set-based preferences into planning domains

Documents

Approaches to Modeling and Learning User Preferences