Upload
rosamund-ford
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Overcoming Temptation:Theory and Practice
Michael MozerComputer Science Dept. and Institute of Cognitive Science
University of Colorado Boulder
Adrian F. WardMcCombs School of Business, University of Texas Austin
John LynchLeeds School of Business, University of Colorado Boulder
Brett Israelsen, Ian SmithComputer Science, University of Colorado Boulder
Shruthi SukumarElectrical & Computer Engineering, University of Colorado Boulder
Shabnam HakimiInstitute of Cognitive Science, University of Colorado Boulder
Retirement Planning Fail
Among US 55-64 year old
62% have retirement assets
median savings for those who have assets: $42k
Pre-retirement defection in the US
For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans(Argento, Bryant, & Sabelhaus, 2014)
National Institute onRetirement Security
Can Financial Education Change Behavior?
US Government and nonprofits spent $670M on financial education in 2013.
Financial education explain 0.1% of variance in financial outcomes (Fernandes, Lynch, & Netemeyer, 2015)
Social Sciences Finance0.0
0.2
0.4
0.6
0.8
1.0
Effectiveness of Educational In-terventions (r2)
LargeMediumSmall
Domain
Effec
t Siz
e (r
2)
r2 = .0011
Behavioral Control Problem
Agent acts in the world
Some actions can lead to immediate pay offs
e.g., buy a new car
Other actions can lead to delayed pay offs
e.g., increase contributions to retirement account
How do you incentivize people to stay focused on the long-term?
Other Domains
Dieting
Exercise
Cleaning house
Waiting for bus / elevator
Listening to a research talk
Delay Discounting Paradigm
A way to quantify preference for now vs. later rewards
Find point of subjective indifference
Yields hyperbolic discounting
Would you rather have$100 now
or$X in Y days?
Delayed Gratification Paradigm
Marshmallow Test (Mischel and Ebbeson, 1970)
Delay Discounting Delayed Gratificationone shot decision continuous decisionreveals intrinsic future value
future value confounded with grit
Grit, Willpower, and Self Control
All refer to tendency to sustain interest and effort toward a goal
Grit
enduring personality trait
Willpower (= self control)
depends on grit but also varies as a function of mood, time of day, food and beverage intake, ego depletion
Formalization Of Delayed Gratification Task
Choice at every instant to
grab small reward ⟸ end
wait for later large reward ⟸ continue
Finite-state machine (FSM) representation
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc
What Is The Optimal Policy?Optimal policy chooses action at time t that maximizes cumulative summed reward
Or cumulative discounted reward
more discounting -> agent more likely to succumb to temptation
Form of discounting
Exponential vs. hyperbolic
Value Function = Policy
In state S1
Choose action A if V(S2) > V(S3)
Choose action B otherwise
S1
S3
S2A
B
Dynamic Programming
Efficient way of computing value function of optimal policy
The Delayed Gratification FSM has a particularly restricted structured leading to only a few possible state sequences ECECCECCCECCCCECCCCCECCCCCCECCCCCCCECCCCCCCCECCCCCCCCCE
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc
Dynamic Programming
Dynamic programming finds the value function that satisfies
Depending on discount rate γ, this yields policy that either
ends at time 1
continues untiltime τ
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc
Modeling Human Behavior
DP is not a good model of human behavior
People may wait a while and then succumb to temptation
If you test the same person in the same situation, they may not behave identically each time
What do we need to better model people?
Willpower!
W(t) ~ Gaussian(0,σ2)
σ2: grit
1 2 3 4 5 τ
κμe
small large
μc . . .
μe
μe
μe μe μe
μe
μc μc μc μc μc1 2 3 4 5 τ
κμe
small large
μc . . .
μe-w(1)μe-w(2) μe-w(3)
μe-w(t)
μc μc μc μc μc
Willpower Model
State consists of
t: current time
w: agent’s current willpower level
Agent plans optimally given state {t,w}
Takes deterministic action
However, variability in behavior each time task is performed due to fluctuations in w
Dynamic Programming With State Uncertainty
Agent can only partially predict future states
Value function is based on expectation over this uncertainty
Expectation Has A (Mostly) Intuitive Form
measure of temptation
Theory Predicts Agent’s Temptation Resistance
Two Limitations on Human Behavior
Stochastic fluctuations in willpower
parameter σ
Exponential discounting
parameter γ
Agent is optimal subject to these constraints
Canonical notion of grit: Small σ + large γ
Simulation
Finish line effect
Low grit moderates effect of discount rate
10 time steps (τ)delayed reward is 2 x immediate (κ)
1 2 3 4 5 . . .
Temptation Resistance as a Function of γ and σ
high low
What Magnitude Delayed Reward Leads To Temptation Resistance?
Given a wait time for the delayed reward, what relative magnitude does the reward have to be in order for there to be a 50% chance the agent will wait for it?
Effective discount rate is exponential, as reflected by the log-linear scaling of the delayed reward
Although γ determines the discount rate, σ determines a time-invariant multiplicative factor
γ = 0.89
γ = 0.95low grithigh grit
Prize-Linked Savings Accounts:Incentivizing A Long-Term Focus
“For every $100 you put in your retirement account, we’ll give you one ticket for a lottery for a $10000 prize.”
Potential of an immediate reward for focusing on long-term goal
One size fits all solution
Maybe different individuals would benefit from different reward structures frequent small rewards vs. infrequent large rewards
Simulating Prize-Linked Savings Account
Borrow η reward units from delayed reward as incentive
At each time t, hold a lottery for reward ω(t) obtained with probability ρ(t)
RiskOur reward-maximizing framework is risk neutral.
lottery(ρ,ω) is equivalent to lottery(ρ’,ω’) if ρω = ρ’ω’
Risk seeking vs. risk averse behavior
Prospect theory (Tversky & Kahneman, 1979)
When gains are being considered,people underestimate high probabilitiesand overestimate low probabilities
Risk-sensitive RL (Shen, Tobia, Sommer, Obermeyer, 2013)
replace ρ with a subjective probability,
Incorporating Lottery Into Model
Assumes lottery at every time step (TBD)
1 2 3 4 5 τ
κμe-η
small large
μc(1) . . .
μe-w(1)μe-w(2) μe-w(3)
μe-w(t)
μc(2) μc(3) μc(4) μc(τ-1) μc(τ)
Optimization Problem
Given an agent with discount rate γ and grit σ, what is the lottery
L = {ρ(t), ω(t): t = 1 …τ}
that maximizes agent’s temptation resistance?
varyingdiscountrate (γ)
otherparametersfixed(σ = .10,η = .40,ρ = .01,κ = 2)
γ = 0.950
γ = 0.942
γ = 0.932
γ = 0.921
γ = 0.907
γ = 0.892
γ = 0.874
γ = 0.853
γ = 0.829
γ = 0.800
Interesting Ideas
We can analyze delayed-gratification tasks as an MDP
Grit is helpful if agent does not heavily discount the future; but it can be harmful if the agent does.
behavioral noise can improve performance
Optimal incentive structures depend on an agent’s discount parameter and grit
Experimental Explorations of the Model
1. Develop a laboratory task for adults that
involves choice between smaller-sooner and larger-later rewards
requires continual decision making
induces impulsive behavior
2. Demonstrate that model accounts for human behavior
3. Use the model to optimize human behavior
i.e., resist temptation
Experiment
Demo
Reward per unit time
short: 1.0 points
long: 1.5 points
Experiment
Four minute duration
Mechanical turk participants
Up to 25% bonus payment depending on score
25 participants in control condition
Accumulated Points Over Time
Defection To Short Line
γ = 0.84σ = 0.25Model parameters
Two Versions Of Model
•Willpower at successive moments is independent
•Willpower follows a random walk
Current Directions
Now that we have a model that fits our population, can we determine incentive structure that boosts likelihood of waiting in long line?
Current Directions
Fit parameters of model to an individual’s data
Correlate model parameters with standard assessments like the delay discounting paradigm.
Extend theory to handle
uncertainty in the arrival time of the delayed reward (e.g., marshmallow task)
non-terminal temptations (e.g., Starbucks)
compounded interest (e.g., retirement savings)
human learning from experience (e.g., recency effects)
Thank You!
Game seems to be more interesting than we were intending.
Original intention was to simulate a series of independent episodes, but episodes are interdependent due to variations in line length from one episode to the next.
Retirement Planning Fail
Among US 55-64 year old
62% have retirement assets
median savings for those who have assets: $42k
Among Canadian 55-64 year olds
81% have retirement assets (RRSP or EEP)
median savings for those who have assets: $245k
24% contributed to RRSP in 2011
Pre-retirement defection in the US
For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans (Argento, Bryant, & Sabelhaus, 2014)
statcan.gc.ca
cbc.ca
National Institute onRetirement Security
Optimization Problem
Formalization of delayed gratification task
μe: reward for ending early
κ: relative magnitude of delayed reward
τ: wait time for delayed reward
η: expected lottery payout, Σρ(t)ω(t) = η
Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?
Experiment
short vs. long: 1.0 vs. 1.5 points per time step
controlcondition
bonuscondition
Hazard Function For Each Line Length
Defection increases with line length
Finishing line effect
Seems like fewer defections in bonus condition
Issue With The Game
Intention was to simulate a series of independent episodes but they are interdependent because
time limitation
information provided about next episode’s line length
Optimization Problem
Formalization of delayed gratification task
μc: reward for continuing
μe: reward for ending early
κ: relative magnitude of delayed reward
τ: wait time for delayed reward
η: expected lottery payout, Σρ(t)ω(t) = η
Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?
Simplified
ρ(i) = ρ(j) = ρω(i)= ω(j) = η/ρ
Varying η and ρ
γ = .92σ = .10