Upload
vuongkhue
View
215
Download
0
Embed Size (px)
Citation preview
ADAPTIVE HEDONIC UTILITY1
Arthur J. Robson and Lorne A. Whitehead
April, 2018
ABSTRACT
Recent research in neuroscience provides a foundation for a cardinal
utility function that is both hedonic and adaptive. We model such ad-
aptation as arising from a limited capacity to make fine distinctions,
where the utility functions adapt in real time. For minimizing the
probability of error, an optimal mechanism is particularly simple. For
maximizing expected fitness, a still simple mechanism is approxim-
ately optimal. The model predicts the hedonic treadmill and a variety
of behavioral anomalies related to “behavioral contrast”, including a
preference for rising consumption streams.
Keywords: Utility, hedonic, adaptation, choice, neuroscience, bounded
rationality. JEL Codes: A12, D11.
Arthur J. Robson, Department of Economics, Simon Fraser University,
Burnaby BC Canada V5A 1S6. Email: [email protected].
Lorne Whitehead, Department of Physics and Astronomy, University of
British Columbia, Vancouver BC Canada. Email: [email protected].
1We thank Paul Glimcher, Kenway Louie, Larry Samuelson, Michael Shadlen, Philippe To-
bler, Ryan Webb and Michael Woodford for helpful discussions. We also thank audiences at
the workshop “Biological Basis of Preferences and Strategic Behavior” at SFU, at the conference
“Economics and Biology of Contests” at the Queensland University of Technology, at the Warwick
Economic Theory Workshop, at the World Congress of the Game Theory Society in Maastricht,
at the Society for Neuroeconomics Conference in Berlin, at the Canada Series Seminars at the
Weatherhead Center at Harvard, at the Workshop on Information Processing and Behavioral
Variability at Columbia University, and at seminars at the University of Queensland, the Univer-
sity of Melbourne, SFU, SFU Human Evolutionary Studies Program, HESP, Stanford University,
Columbia University, Chapman University, and Penn State University. Robson thanks HESP, the
Canada Research Chairs Program and the Social Sciences and Humanities Research Council of
Canada for financial support.
1
2
1. Introduction
Jeremy Bentham is famous for, among other things, the dictum “the
greatest happiness for the greatest number”, which, as Paul Samuel-
son was fond of observing, involved one too many “greatests” to be
operationally meaningful. The happiness that Bentham described was
cardinal, and so capable of being summed across individuals to obtain
a basic welfare criterion. Conventional welfare economics, even allow-
ing for non-additivity, remains needful of some degree of cardinality.
However, in the context of individual decisions, of consumer theory, in
particular, economics has completely repudiated any need for cardin-
ality, on the basis of “Occam’s Razor,” a theme that culminates in the
theory of revealed preference.
On the other hand, there is persuasive neurological evidence that eco-
nomic decisions are actually orchestrated within the brain by a mech-
anism that relies on hedonic signals, which signals are measurable and
therefore cardinal. In particular, there is evidence that economic de-
cisions are mediated by means of neurons that produce dopamine, a
neurotransmitter that is associated with pleasure.2
Although there is no logical need for the utility used in consumer theory
to be hedonic and cardinal, it is so, as a brute fact.
Further, there is neurological evidence that this hedonic utility is ad-
aptive, so that dopamine-producing neurons adapt rapidly to changes
in the distribution of physical rewards.3 Thus, if the variance of re-
wards increases, the sensitivity of such a neuron to a given increase in
reward is reduced.
A primary motive of the present paper is then to harmonize the neur-
ological view of hedonic utility with economics. We develop a model of
hedonic adaptive utility that draws on neuroscience. The model is not
fundamentally at odds with conventional economic theory in that the
2See, for example, a key paper for the present purpose, Stauffer, Lak and Schultz, (2014).3See Tobler, Fiorillo and Schultz (2005), for example.
3
only reason for a divergence from economics is the inability to make
arbitrarily fine distinctions.
The key paper by Stauffer, Lak and Schultz (2014) is described in
the next section. It provides evidence linking increments in von Neu-
mann Morgenstern utility, in a cardinal sense, to the activity of the
dopamine neurons. These neurons evaluate economic options, in an
adaptive fashion. Our model also concerns how these evaluations feed
into a decision rule that is noisy. The rule involves “just noticeable
differences”—JND’s—in the activity of dopamine neurons.4 Adapta-
tion involves shifting the thresholds at which a just noticeable jump in
dopamine neuron activity occurs.5
It pays to shift the capacity to discriminate to the thick of the action,
so hedonic utility needs to adapt, and it needs to adapt rapidly. We
show simple neural adjustment mechanisms exist by demonstrating the
existence of a particular simple automatic mechanism.6
The present paper presents then a mechanism that generates rapid
adaptation to a entirely novel distribution. When the objective is to
minimize the probability of error, a particularly simple rule of thumb
yields optimal adaptation for an arbitrary number of thresholds.7 When
the objective is the more plausible one of maximizing the expected
outcome chosen, a different rule of thumb yields adaptation that is
approximately optimal for a large number of thresholds.
A common feature of all previous models is that the time frame for
adaptation is undefined.8 That is, adaptation to the distribution is
shown to be optimal, but it is left open how such adaptation occurs. It
4Matlin (1988) is a textbook account of JND’s.5This formulation was used by Laughlin (1981) to introduce the efficient coding hypothesis to
capture maximal informational transfer by neurons.6We abstract from the interesting but challenging question of how conscious inputs influence
automatic processing.7This rule of thumb generates efficient coding, as in Laughlin (1981). His criterion is the
formal one of informational transfer; ours is to minimize the probability of error in a concrete
binary choice problem.8This observation applies to Robson (2001), Rayo and Becker (2007), and Netzer (2009), for
example.
4
is crucial for most realistic applications that the time frame over which
adaptation occurs be short. This issue we address here.
We demonstrate the empirical power of this approach with an applica-
tion to the so called hedonic treadmill.9 The adaptation of utility here
predicts that the answers to surveys concerning life satisfaction would
revert to previous levels subsequent to a shift in circumstances.10
There are also behavioral implications of the model that derive from the
theory of “behavioral adaptation.”11 Individuals bid higher in auctions
if they were previously exposed to auctions with a low range of values.12
The model also suggests an explanation for the puzzling preference of
individuals for rising consumption streams.
The following effective but annoying sales strategy offers an informal
example of these behavioral implications of the model. When you are
buying a car, the salesman suggests that you need various more-or-
less-worthless add-ons, undercoating for example, that cost perhaps
hundreds of dollars. The salesman is relying on the hundreds of dollars
seeming insignificant relative to the thousands that the car costs. The
effectiveness of this sales technique is consistent with the shift in utility
that larger stakes induce in the present model.
2. Background from Neuroscience
A remarkable paper that grounds the current work in neuroscientific
fact is Stauffer, Lak, and Schultz (2014).13 They argue that von Neu-
mann Morgenstern utility is realized in the brain, in an hedonic fashion,
by the activity of dopamine-producing neurons. These neurons number
about a million in humans and are located in the midbrain, between
the ears and behind the mouth. Dopamine is a neurotransmitter, a
9See Schkade and Kahneman (1998), as a key example.10The model does not corroborate, however, the position of Schkade and Kahneman (1998)
that such a shift is evidence of a mistake.11See Crespi (1942).12See Vestergaard and Schultz, 2015, and Khaw, Glimcher and Louie, 2017, for example.13See Schultz, (2016), for a less formal treatment of the issues.
5
chemical that relays a signal from one neuron to the next, and it has
a number of functions in the brain, a key one of which is to generate
hedonic motivation. These dopamine producing neurons have forward
connections—“projections”—to all of the sites in the brain that are
known to implement decisions.
Most basically, perhaps, a burst of activity of the dopamine neurons
is associated with the arrival of an unanticipated physical reward.14
Furthermore, a larger reward generates a greater intensity of the burst
of activity in the neuron (as measured by the number of impulses per
second).
One of the most firmly established results in the neuroscience literature
that bears on decisions is the “reward prediction error”, which is as
follows.15 Suppose the individual is trained to anticipate a particular
reward by a cue, perhaps a unique visual signal. The dopamine neurons
then shift much of their firing activity back in time from the actual
reward to the cue. If the size of reward is as expected, indeed, there
is no further response by the neuron. If the reward is larger than
expected, however, there is a supplementary burst of activity upon the
receipt of the reward, which is larger the larger the discrepancy on the
upside; if the reward is smaller than expected, the firing rate of neuron
is reduced below the base rate.16
Stauffer, Lak, and Schultz argue that von Neumann Morgenstern util-
ity can be related rather convincingly to this reward prediction error,
14Many of these experiments are done on monkeys, and involve implanted electrodes reading
the activity of individual dopamine neurons. The rewards that the monkeys obtain are typically
food or fruit juice.15Caplin and Dean, (2008), present a model of this phenomenon, and sketch several applications
to economics.16The dopamine neuron system apparently represents unexpected upside shifts in rewards more
accurately than unexpected downside shifts. Different systems, sometimes involving another neur-
otransmitter, serotonin, may help generate appropriate responses to downside surprises. Rogers,
(2011), reviews experimental evidence involving drugs concerning the roles of dopamine and sero-
tonin. See also Weller, Levin, Shiv, and Bechara, (2007), for evidence that the neural systems
that deal with gains and losses may be partially dissociated.
6
by proceeding as follows. First they estimate von Neumann Morgen-
stern utility in a revealed preference manner, by deriving the certainty
equivalents for a variety of binary gambles that are presented to the
monkeys. This step makes no use of neural data, that is. The von
Neumann Morgentstern utility is convex at low levels of juice rewards,
but concave for higher levels, so monkeys adhere to this property of
prospect theory.
Next, Stauffer, Lak, and Schultz consider the response by dopam-
ine neurons to several binary gambles, where the absolute difference
between the high and the low reward is held constant. Each gamble
is signalled by an associated cue. Neural activity then occurs with the
arrival of the cue, but there is additional activity if the higher reward
from the binary gamble is obtained. The extra neural activity is low
for gambles involving low rewards and for those involving high rewards,
but is high for gambles involving intermediate rewards. This additional
dopamine neuron activity is then in close cardinal agreement with the
incremental (“marginal”) utility estimated from revealed preference.17
Finally, Stauffer, Lak and Schultz establish that the dopamine neuron
responses that anticipate a binary gamble reflect the expected utility of
the gamble. They consider a pair of gambles with a common mean,
but where one of these gambles has a higher expected von Neumann
Morgenstern utility. A cue announcing the first of these gambles then
generates higher dopamine neuron firing rates than does one announ-
cing the second.
Further enhancing the neuroscientific foundation of utility, Lak, Stauffer,
and Schultz (2014) show that relationship between dopamine neuron
firing rates and choice extends to multidimensional preferences. First
they examine monkeys preferences over multidimensional rewards from
a purely behavioral point of view. They show that such preferences ex-
ist, in a sense that allows for noise. Monkeys chose between rewards
that were either quantitatively safe, or risky. In addition, there could
17See their Figure 3, in particular.
7
be risk concerning the type of juice. They derived a value for each
gamble in terms of a certainty equivalent of a third type of juice, and
show that these values predict the choices made between the gambles.
Next, they tie these behaviorally constructed reward valuations to
dopamine neuron firing rates. They do this by looking at each reward
separately, rather than in a choice context, hypothesizing, that is, that
this context makes no difference. The behavioral choice preferences
reflect these separately observed dopamine neuron firing rates.
Tobler, Fiorillo, and Schultz (2005) help complete the background
needed. The important new twist arises from conditioning the mon-
keys to expect one of two 50-50 gambles, where these two gambles arise
in random order. One gamble pairs a low volume of juice with a me-
dium volume, the other pairs the same medium volume with a high
volume. In each gamble, the higher volume generated an increase in
neural activity, the lower a decrease. The identical medium volume
generates greater activity when paired with the low volume than it did
when paired with the high volume.18
Baddeley, Tobler, and Schultz, (2016), further investigate adaptation,
in humans. They now find partial rather than complete adaptation.
The rationale they advance for the desirability of this is that unlikely
signals still need to generate appropriate reactions.
The process by which rewards are encoded is then relatively well un-
derstood. Indeed, so are some of the precise ways in which choice is
implemented.19 Less well understood is how the encoded rewards are
18Adaptation is a pervasive property of neurons. Baccus and Meister, (2002), for example,
consider the adaptive properties of visual neurons. The full adaptation of dopamine neurons that
we hypothesize here is analogous to that under the efficient coding hypothesis of Laughlin (1981),
who illustrates the hypothesis with data for visual neurons.19Shadlen and Shohamy (2016), for example, discuss how sequential sampling drives the choice
of a physical action. A random walk arises in the premotor neurons of a monkey seeking a reward
for predicting the preponderant drift of a moving pattern of dots. (A premotor neuron, as the
name suggests, is immediately upstream from the motor neurons that cause the monkey to press
one button or another, for example.) The random walk is driven up or down by accumulating
evidence favoring one or the other of two choices. When the random walk hits an endogenously
determined barrier, the corresponding choice is implemented. These models work remarkably well,
8
compared prior to implementing a decision.20 However, there are tantal-
izing hints that the comparison of value is the comparison of dopamine
neuron activity 21
3. The Model
The neuroscientific evidence discussed in the previous section suggests
the following broad hypotheses. These hypotheses are based on the
evidence but also sharpen it.
1) Consumption of an unanticipated deterministic amount of a good
generates dopamine neuron firing rates that correlate with the quantity
of a good. Further, a cue that predicts the imminent arrival of such
an outcome or of a gamble, more generally, generates dopamine neuron
firing rates that linearly reflect expected utility. That is, a choice made
between several such outcomes or gambles is consistent with maximiz-
ing the expected utility based on these firing rates.22
2) Since neurons reflecting consumption (or reflecting a cue predict-
ing a gamble) produce dopamine, a supplementary hypothesis is that
decisions are made on the basis of hedonic utility.23
3) The encoded values of gambles adapt to the statistical circum-
stances. This adaptation is not always complete in reality, but we
with convincing details accounted for, but the issues they raise concerning motor implementation
of a choice are not central here.20Possibly, for example, there is further processing of value prior to comparison. It could even
be that the process of comparing values proceeds in parallel to the encoding of values. See Hare,
Schultz, Camerer, O’Doherty, and Rangel (2001), for example. Indeed, the hedonic interpretation
is not crucial to the validity of the model here, which could reflect processing by neurons that do
not produce dopamine. Nevertheless, the present formulation seems parsimonious in the light of
the empirical findings. That is, value is represented hedonically by dopamine neurons and this is
mirrored in comparison and decision.21Jocham, Klein, and Ullsperger, (2011), for example, administered a D2-selective antagonist,
amisulpride, to humans engaging first in learning values and then in exploiting these. (D2 is
a particular type of dopamine receptor.) Amisulpride did not affect reinforcement learning but
enhanced some subsequent choices made on the basis of the learned values.22However, the neurons that encode the value of each hypothetical option in the context of a
particular choice do not seem to be dopamine producing. See Schultz (2015, p. 920).23This is not strictly necessary. For example, if the model captured the activity of non-
dopamine-producing decision neurons, it could remain otherwise valid.
9
consider the limiting case of complete adaptation for simplicity and
to see most clearly the novel consequences of adaptation. It is hypo-
thesized that the ability to make fine perceptual distinctions between
outcomes is limited, and that it is this that drives adaptation.
The above suggests the following model. Suppose there are two pro-
spective rewards whose actual fitness values are yi ∈ [0, 1] for i = 1, 2.24
After processing, the prospect of each reward is encoded as a neural
firing rate given by h(yi) for i = 1, 2, where h : [0, 1] → [0, 1].25 This
formulation of h allows for noise in these neuron firing rates.26
Given the neural firing rates h(yi), a noisy choice is then made, where
J(h(y1) − h(y2)), say, is the probability of choosing y1. It is assumed
that
J(0) = 1/2; J(h)→ 1, as h→ 1,
and J(h)→ 0, as h→ −1.27
We wish to obtain a simple tractable form of this model that retains
the following desirable properties—
i) Choice is noisy, since this is one of the most salient and familiar
features of actual choice.28
ii) There is a simple explicit parametric formulation of adaptation,
which arises from a limited ability to make fine distinctions. This
24The approach can readily be generalized to allow the outcomes to be food, or something else
that is monotonically related to fitness.25The restriction of rewards to [0, 1] is without loss of generality, given only that there are
some bounds on rewards. The restriction of neural activity to [0, 1] is similarly mathematically
harmless, given bounds on neural activity. The existence of some such bounds on neural activity
is empirically clear and relevant. Such bounds imply that error in choice cannot be eliminated by
exploiting extreme neural activity levels.26All neural activity is noisy. See, for example, Tolhurst, Movshon, and Dean (1983), who
investigate noise in single visual neurons. They argue that behavior is, however, less noisy since
it may be driven by integrating signals from a (small) number of such neurons. See also Renart
and Machens, (2014), for a more recent survey of neuron noise and its effect on behavior.27A more general formulation would allow J to depend on h1 and h2 as independent arguments,
which would allow the absolute level of the arguments to matter.28Mosteller and Nogee, (1951), for example, were forced to allow noisy choice when evaluating
expected utility theory in the laboratory. Indeed, they describe this noise with a function akin to
J .
10
ability can be sharpened, at an implicit cost, so that, in the limiting
extreme, choice becomes perfect.
These goals can be attained without relying on multiple sources of
noise. Consider the matched pair of functions h and J as follows. For
each integer N take δ = 1/N and assume—
A. The deterministic function h : [0, 1] → {0, δ, 2δ, ..., (N − 1)δ, 1} is
nondecreasing. It is specified by the thresholds xn for n = 1, ..., N at
which h jumps from nδ to (n+ 1)δ, where we set x0 = 0 and xN+1 = 1.
Such a step function can approximate an arbitrary continuous function,
if δ is small.
B. The function J is given by
J(h1 − h2) = 1/2, for |h1 − h2| < δ,
J(h1 − h2) = 0, for h1 − h2 ≤ −δ, J(h1 − h2) = 1, for h1 − h2 ≥ δ.
That is, the functions h and J are described in terms of the same
parameter, δ > 0, interpreted as a “just noticeable difference”—JND.
A difference in firing rate of δ can be fully distinguished, but smaller
differences cannot.29
Choice represented in this way remains noisy. That is, if it is repeatedly
true that h(y1) = h(y2) then sometimes y1 is chosen and sometimes
y2. Adaptation is captured by the function h; in particular by the
parameters x1 < x2 < ... < xN . These parameters can advantageously
adapt to the environment because choice is made imperfectly, given the
JND δ > 0. As δ → 0, however, y1 will be chosen if and only if y1 > y2.
The foregoing motivates the following choice problem.30 The individual
must choose one of two outcomes, i = 1, 2. These are realizations
29That choice is driven by differences and that valuations attached to thresholds are fixed
leads to full adaptation. One way in which absolute values could be incorporated would be to
have the valuations attached to each interval update using estimated absolute expected fitness
from that interval. Absolute values would then also matter. Such an updating procedure entails
collecting more information and might be slow. The present model plausibly then captures the
short run effects.30This is now as in Robson (2001).
11
yi ∈ [0, 1], for i = 1, 2, that were drawn independently from the cu-
mulative distribution function, F . This has a continuous probability
density function, f > 0 on (0, 1). The cdf F represents the background
distribution of rewards to which the individual is accustomed.
As implied by the construction of h and the JND formulation of J , the
only precise information that the individual has prior to choosing one
of the arms is the interval [xn, xn+1] that contains each realization. If
the two realizations belong to different intervals, the gamble lying in
the interval further to the right is clearly better; if the two realizations
lie in the same interval, choice is noisy with an error being made with
probability 1/2.
We interpret the number of thresholds that an outcome surpasses as
utility, so that Un = n/N, for n = 0, ..., N , consistent with the evid-
ence in Stauffer, Lak, and Schultz, (2014). However, for the present
essentially deterministic choices here, we could assign any utility level
Un ∈ [0, 1] say to the outcomes lying in the interval [xn, xn+1), for
n = 0, ..., N , such that that 0 = U0 < U1 < ...Un < Un+1 < ...UN = 1.31
What are the optimal 0 ≤ x1 ≤ .... ≤ xN ≤ 1? Robson (2001) shows
that the thresholds that minimize the probability of error are equally
spaced in terms of probability. If N = 1, for example, the threshold
should be at the median of F . At the other extreme, when N →∞, it
follows that the limiting density of thresholds matches the pdf, f , and
that U(y) = F (y), where U(y) is the utility assigned to y in this limit.
This result is in striking agreement with the efficient coding hypothesis
proposed by Laughlin (1981). He considers a function precisely analog-
ous to h, where a continuous input intensity is mapped onto a finite set
of “responses”, spaced apart by the just noticeable difference. Laughlin
31Robson (2018) investigates the implications of the model for choice between gambles. This
entails setting utility Un = n/N, for n = 0, ..., N in cardinal sense. The model becomes more
complex in that it considers choice between gambles rather than deterministic outcomes; it does
not model the adaptation of the thresholds, on the other hand, but presumes they are set optimally.
Robson (2018) provides a basis for key aspects of Kahneman-Tversky (1979) and a resolution of
the puzzle raised by Rabin (2000).
12
then argues that the response function of a neuron to a single y should
match the cumulative density function in order to maximize the in-
formation content of the neural responses. (See Louie and Glimcher,
2012, for a recent review of this efficient coding hypothesis.) We replace
the abstract notion of information transfer with a more concrete binary
choice problem but arrive at precisely the same conclusion.32 However,
this agreement only holds for the probability of error criterion.
Woodford (2012) has a similar basic motivation to the present paper,
and produces related predictions, but adopts an alternative approach
based on informational transfer and rational inattention. Rather than
the constraint being the number of thresholds, the constraint is the
informational transmission capacity of the channel. Adaptation arises
to use the available resources to make the most accurate distinctions
possible and the key predictions arise from this adaptation. There is no
counterpart in Woodford, however, to the explicit process of adaptation
here.
If the criterion were instead to maximize the expected value of the yi
chosen, expected fitness, that is, and N = 1, the optimal threshold is
at the mean of F . With N thresholds, each threshold should be at the
mean of the distribution of outcomes, conditional on the outcome lying
between the next threshold to the left and the next threshold to the
right. This uniquely characterizes the optimal thresholds in this case.
Now, in the limit as N →∞, Netzer (2009) shows that the density of
thresholds is proportional to f 2/3(y), so that U(y) =∫ y f(y)2/3dy∫f(y)2/3dy
In either case, the thresholds are optimally concentrated where the ac-
tion is—where f is high. That is, if the distribution shifts, the pattern
of thresholds must shift to match.
Previous work has not considered the mechanism of adaptation. If
the thresholds were chosen by evolution, this would make adjustment
32The rate of informational transfer is a symmetric concave function of the probabilities of
each output. Hence maximizing this entails equalizing these probabilities.
13
painfully slow, too slow, indeed, to fit the stylized facts. How then
could the thresholds adjust to a novel distribution, F?
In order to study this question, suppose then that the thresholds react
to draws, are allowed to move, that is, but are confined to a finite grid
KG = {0, ε, 2ε, ..., Gε, 1}, for an integer G such that (G+ 1)ε = 1. This
restriction is for technical simplicity since it means that the adjustment
process for the thresholds will be a (finite) Markov chain. Define the
state space SG = (KG)N and let S = [0, 1]N .
At first we abstract from the choice between the two arms. Instead,
we focus on the process by which the thresholds adjust.33 Hence we
use y at first to represent either y1 or y2. Eventually, when considering
the performance of the limiting rule of thumb, we will again need to
consider the outcomes on both arms.
Suppose the thresholds are time dependent, given as xtn ∈ KG, where
0 ≤ xt1 ≤ ....xtN ≤ 1, at time t = 1, 2, ....
3.1. Minimizing the Probability of Error. The first of the two
criteria considered here is to minimize the probability of error. This
is less basic than maximizing expected fitness, but it leads to simpler
results, and is intuitively illuminating. It is also noteworthy that the
rule of thumb for the probability of error case generates efficient neural
coding.
Consider the rule of thumb for adjusting the thresholds—
(3.1) xt+1n =
xtn + ε with probability ξ if y ∈ (xtn, x
tn+1]
xtn − ε with probability ξ if y ∈ [xtn−1, xtn)
xtn otherwise
,
for n = 1, ..., N .
33It simplifies matters to suppose that most draws adjust the thresholds, but choices are made
only occasionally.
14
The parameter ξ ∈ (0, 1) represents additional idiosyncratic uncer-
tainty about whether each threshold will actually move, even if the
outcome lies in a neighboring subinterval.34
This is perhaps the simplest possible rule in this context. It moves
thresholds towards where the action is, roughly speaking. This seems
like a step in the right direction, at least. More than that, we will show
that, in the limit of the invariant distribution as the grid size, ε → 0,
the thresholds are in exactly the right place.
The rule may not provide the most rapid possible adjustment, but it
is sufficient for the present purpose.35
We have—
Theorem 3.1. In the limit as G → ∞, so that ε → 0, the invariant
joint distribution of the thresholds xtn converges to one with point mass
on the vector with components x∗n, where F (x∗n) = n/(N + 1), for n =
1, ..., N .
This theorem is a corollary of a more general result—Theorem 3.2—to
follow.
That is, this rule of thumb generates optimal adaptation of the utility
function to any unknown distribution, in a non-parametric way, for any
number of thresholds, N .
34This technical device simplifies the argument that the Markov chain is irreducible.35A Bayesian optimal updating rule entails a prior distribution over distributions F . Suppose,
for example, there is one threshold, and the pdf is either f1(y) = 2 with support [0, 1/2] or
f2(y) = 2 with support [1/2, 1], where these pdf’s are equally likely. The optimal initial threshold
should then be at 1/2. If the outcomes are to the left, the pdf must be f1 and the next position
of the threshold should be at 1/4; if the outcomes are to the right, the pdf must be f2 and the
threshold should be set next at 3/4. That is, there is rapid resolution of the uncertainty about the
distribution. On the other hand, this procedure would be wildly inappropriate for a different prior.
Even with a definite general prior, it is not obvious that placing the threshold at the median of
posterior is always fully optimal. Furthermore, if the distribution is subject to occasional change,
this will also affect the Bayesian optimal rule. Although the current rule can only be slower than
the optimal rule with a specified prior and mechanism for change, it ultimately yields optimal
placement of the thresholds, in a robust fashion, without a specified prior, and without a specified
mechanism for redrawing the distribution.
15
3.2. General Case—Maximizing Fitness. The most basic general
criterion is to maximize expected fitness. That is, individuals who
successfully do this should outperform those who do not.36
The situation is now more complicated than it was with the criterion of
minimizing the probability of error. There are no longer simple rules of
thumb that implement the optimum exactly. However, there do exist
simple rules of thumb that implement the optimum approximately, for
large N . These rules of thumb involve conditioning on the arrival of
a realization in the adjacent interval, as above, but also modify the
probability of moving using the distance to the next threshold, in a
symmetric way.
Although it is possible to accurately estimate the median of a distri-
bution from the limited information available to such a rule of thumb,
it is not possible to do this for the mean. Hence the result for the
probability of error case are sharper than the results for the expected
fitness case.
To see that simple rules of thumb like this cannot implement the op-
timum exactly, consider first the case that N = 1. Suppose that F has
median 1/2 but a mean that is not 1/2. Consider a symmetric rule of
thumb based on the arrival of an outcome to the left or the right of
the current position of the threshold at x, say, and the distance to the
ends—x or 1 − x. This will then generate a limiting position for the
threshold at 1/2, thus failing to implement the optimum. This is also
an issue for any number of thresholds, since this argument applies to
the position of any threshold relative its two neighbors.
It is important that this general rule of thumb uses only informa-
tion that is available—the location of the neighboring thresholds and
whether an outcome lies in the subinterval just to the right or just to
the left. It would contradict the interpretation of the model here to use
36This assumes that the risk is independent across individuals. See Robson (1996) for a
treatment of this issue. Another possibility would be that fitness depends on relative payoffs.
16
detailed information about the precise location of the outcome within
a subinterval.
At the same time, the general rule of thumb here makes greater de-
mands on neural processing than does the rule of thumb for the probab-
ility of error case. The need to utilize the position of adjacent thresholds
must entail a greater complexity cost.
The general rule of thumb considered here is—
(3.2)
xt+1n =
xtn + ε with probability ξ(xtn+1 − xtn)β if y ∈ (xtn, x
tn+1]
xtn − ε with probability ξ(xtn − xtn−1)β if y ∈ (xtn−1, xtn]
xtn otherwise
Again, the parameter ξ ∈ (0, 1) and the draws that are made with
probability ξ(xtn+1− xtn)β or ξ(xtn− xtn−1)β conditional on the outcome
lying in the subinterval just to the right or left, respectively, are made
independently across thresholds.37
If the parameter β = 0, we have the old rule of thumb. Formally, then,
Theorem 3.1 follows from Theorem 3.2.
If β > 0 this will encourage the closing up of large gaps that arise where
f is small, which is useful to maximize expected fitness. Consider, for
example, a threshold situated so that the probability of an outcome in
the adjacent interval to the left equals the probability of an outcome
37A few technical considerations are as follows. Given the Markov chain described here, it is
possible that the order of thresholds is reversed at some stage so that xt+1n+1 < xt+1
n , for example.
In such a case assume that the thresholds are renumbered so as to preserve the natural order.
It is also possible that the process superimpose one threshold on another so that xt+1n+1 = xt+1
n ,
for example. In this case the independence of the draws made conditional on an outcome lying in
a neighboring subinterval will eventually break such a tie. These possibilities become vanishingly
improbable as G→∞, so these issues are purely details.
The above considerations simplify the proof that the Markov chain defined here is irreducible.
That is, there exists a number of repetitions such that, for any initial configuration, x0, say, there
is positive probability of being in any final configuration, xT , say. There is therefore a unique
invariant distribution for this chain. See Footnote 61 in the Appendix.
17
just to the right. Suppose, however, that the distance to the next
threshold on the right exceeds the distance to the left, because the pdf,
f is lower to the right. It will then pay to move to right, since the
expected fitness stakes on the right exceed those on the left. Indeed, if
β = 1/2, the resulting rule will be shown to be approximately optimal
for large N .
We have—
Theorem 3.2. In the limit as G → ∞ so that ε → 0, the invariant
joint distribution of the thresholds xtn converges to one that assigns a
point mass to the vector with components x∗n, n = 1, ..., N . These are
the unique solutions to
(F (x∗n+1)− F (x∗n))(x∗n+1 − x∗n)β = (F (x∗n)− F (x∗n−1)(x∗n − x∗n−1)β,
for n = 1, ..., N .
Proof. See the Appendix.
The intuition for Theorem 3.2 straightforwardly extends that given for
Theorem 3.1. Again, the limiting position of each threshold is such
that the probability of moving to the left is equal to the probability
of moving to the right. The proof can be outlined more precisely as
follows—
Step 1. The Markov chain can be approximated by an ordinary dif-
ferential equation (ODE) system, when ε is small, given by
dxndt
= H(xn, xn+1)−H(xn−1, xn), n = 1, ...N,
where H(xn, xn+1) = ξ(xn+1 − xn)β(F (xn+1) − F (xn)), n = 0, ..., N.38
This is obtained by letting ε → 0 over a fixed small interval of time.
With no compensation for this, the system would grind to a halt. Hence
this shrinkage of ε is precisely offset by increasing the number of repe-
titions. This increase in the number of repetitions has no effect on the
38This is Eq 8.2 from the Appendix.
18
invariant distribution, however. The ODE system is then obtained by
shrinking the small interval of time.
Step 2. A Lyapunov function is then used to show that the ODE
system is globally asymptotically stable. The system converges to the
vector x∗n, n = 1, ..., N , as t→∞, from any initial vector.
Step 3. Finally, we suppose that some limit of the invariant dis-
tributions of the Markov chain does not put full weight on x∗n, n =
1, ..., N . The ODE system must then strictly lower the expected value
of the Lyapunov function, under this limit of the invariant distribu-
tions. Given that the requisite continuity properties can be shown to
hold, the Markov chain also lowers the expected value of the Lyapunov
function, using the invariant distribution of that Markov chain as the
initial distribution, for all small enough ε. This establishes the desired
contradiction.
3.3. Approximate Optimality of the Rule of Thumb. We now
consider the efficiency of the rule of thumb relative to the optimum con-
figuration of thresholds, for the expected fitness criterion. We demon-
strate approximate efficiency, as N →∞, for the particular case of f ’s
that are step functions, so that
f(y) =
α1 > 0 if y ∈ [0, 1/M)
...
αm > 0 if y ∈ [(m− 1)/M,m/M)
...
αM > 0 if y ∈ [(M − 1)/M, 1]
where∑M
m=1 αm = M .
For eachN , there exists a unique positioning of theN interior thresholds,
under the rule of thumb, in the limit as G→∞ so that ε→ 0. Suppose
that the expected deficit in y, for the limiting rule of thumb relative
to the full information ideal, is given by L(N). (The “full information
ideal” entails always choosing the higher outcome.)
19
Theorem 3.3. As N → ∞, the limiting efficiency of the rule of
thumb is characterized by N2L→∑
m α2β/(1+β)m (
∑m α
1/(1+β)m )2/(6M3).
This expression is uniquely minimized by choice of β = 1/2. Hence
the rule of thumb with the best limiting efficiency satisfies N2L →(∑α
2/3m )3/(6M3) as N →∞.
Proof. See the Appendix. The strict optimality of β = 1/2 follows
from the Holder Inequality.
Suppose the expected deficit in y, relative to the full information ideal,
for the optimal positioning of thresholds, is given by L∗(N).
Theorem 3.4. The optimal allocation of the thresholds has limiting
efficiency characterized by N2L∗ → (∑α
2/3m )3/(6M3), as N →∞.
Proof. See the Appendix.
Hence the rule of thumb, when β = 1/2, has the same limiting efficiency
as the optimal allocation of thresholds. That is, the rule of thumb is
approximately optimal for large N .39 Suppose the efficiencies here can
be represented as Taylor series in 1/N .40 The first nonzero term is the
term in 1/N2, which is the same for L∗ and L.41 They may then only
disagree for terms of higher order.42
39This approximation is additional to those already involved in i) the convergence of the
Markov chain to an invariant distribution and ii) taking the limit of the invariant distribution as
ε→ 0.40The approximation result here remains valid, as an “asymptotic expansion”, even if a Taylor
series does not exist.41The minimized probability of error is easily seen to be 1/(2(N + 1)). The efficiency loss for
maximum fitness has a leading term in 1/N2 instead because the size of an error is of order 1/N .42Netzer uses the same device of considering a step function f . However, he does not consider
the adjustment process that is the focus here as the underpinning of utility adaptation. There is,
then, no counterpart of the rule of thumb used here. His main result is to show that, in the limit
as M → ∞, the density of thresholds is proportional to f2/3, in contrast to the limiting density
of thresholds in the probability of error case which is simply f . This main result of Netzer is an
incidental by-product of the present approach. This observation does not, however, help extend
our approximation results to a general f .
20
4. Self-Reported Happiness
4.1. Hedonic Treamill. The model has immediate implications for
phenomena that rely on self-reported life satisfaction. Conspicuous
among these is the hedonic treadmill. Schkade and Kahneman (1981)
formulate a well-known version of this treadmill that concerned stu-
dents at the University of Michigan and at UCLA.43 Students in the
two locations reported similar degrees of life satisfaction, but Michigan
students projected that UCLA students would be significantly happier.
Schkade and Kahneman describe this as a conflict between “decision
utility”—which is applied when deciding whether to move from Michigan,
and which is based on a substantial increase in life satisfaction in
California—and “experienced utility”—which reflects the more mod-
est increase ultimately obtained once there. Schkade and Kahneman
argue then that “decision utility” is defective.
The model here entails the adaptation of utility, which implies such
a distinction between decision and experienced utilities. There is no
sense, however, in which either decision or experienced utility is defect-
ive, in contrast to Schkade and Kahneman.44 Either utility function is
optimal for the corresponding circumstances, in the light of the inabil-
ity to make arbitrarily fine distinctions.
We can demonstrate the hedonic treadmill by simulating the follow-
ing specific version of the model. Consider the class of cdf’s given by
F (x) = xγ with pdf’s f(x) = γxγ−1, with γ > 0, for all x ∈ [0, 1]. Sup-
pose ε = 0.0005. Consider the probability of error case, for example,
so that β = 0, with nine thresholds, so that these thresholds will be
optimally positioned at the deciles of the distribution. Take 100, 000
periods, where γ = 1 for the first 20, 000 periods and γ = 5 thereafter,
so that probability mass is shifted to the upper end of the interval [0, 1].
43See also Frederick and Lowenstein (1999).44Robson and Samuelson (2011) addressed these issues in the Rayo and Becker (2005) model
of a limited ability to make fine distinctions. However, this model, in common with other papers
in this small literature, lacks an explicit account of real time adaptation.
21
Suppose the thresholds are placed initially at 0.1, 0.2,...,0.9—that is, at
the deciles of the distribution for γ = 1. This is essentially equivalent
to supposing that the γ = 1 regime has been in effect for a long time.
The results of simulating this version of the model are presented in Fig-
ure 1.45 The distribution of thresholds quickly puts most mass near the
deciles, as shown by the restoration of the uniform empirical frequency
of outcomes in each interval.46
An outcome in Figure 1 near 0.5, for example, once generated utility
near 0.5; after the shift to a substantially more favorable distribution,
it generates a rock-bottom level of utility. But the distribution of
thresholds was optimal for the original distribution and became optimal
again for the new distribution.
For β = 0, as in Figure 1, average utility reverts completely to its ori-
ginal level, after a shift in the cdf F . This is a pure form of the “hedonic
treadmill”. For β = 1/2, however, reversion is generally incomplete
or can overshoot. Figure 2 illustrates partial reversion, presenting a
rolling average of utility, where utility is defined so that average utility
for γ = 1 is normalized to 0.5.47
Indeed, long run average expected utility generally depends on the dis-
tribution, when β > 0.48 To illustrate overshooting, consider before
and after distributions F and G, respectively, say, such that the av-
erage utility under G exceeds that under F . Suppose now that G is
45All the simulations here were done using Excel.46These simulations also serve to demonstrate the robustness of the limiting results Theorems
3.1 and 3.2 which concern limits of invariant distributions as the grid size ε tends to 0. That is,
these results hold approximately for finite time and reasonable positive grid sizes. Theorems 3.3
and 3.4 rely on taking the additional limit as N → ∞, then showing that β = 1/2 yields a rule
of thumb that is approximately optimal. Simulations also show that β = 1/2 is approximately
optimal even for small value of N . These results are also then robust. Further, although there is
a gain from β > 0, this gain is not overwhelming, relative to β = 0. The additional complexity
cost of rules of thumb with β > 0 might then outweigh the gain over the rule with β = 0.47Average utility needs to be smoothed to be meaningful. We use a rolling average of the last
1,000 periods.48Expected utility also depends on the distribution under the optimal allocation of a finite
number of thresholds.
22
Interval number
Freq
uency in last
30,000
steps
Figure 1. Rapid Adaptation of the Thresholds to a Novel Distribution.
scaled down so that F first-order stochastically dominates G.49 Such
scaling ensures that average utility will plunge with the advent of G.
However, in the long run, average utility is independent of this scaling-
down, no matter how dramatic this might be, and so average utility
will eventually climb back past its old level.50
From a welfare point of view, not merely is average utility an unreliable
guide as to the extent of the change in the distribution, it may reverse
the apparent direction.
49This is always possible if G has a strictly positive pdf.50That is, suppose the distribution is shifted to become Fk(x) = F (kx), for k ≥ 1. The new
thresholds are then xkn = xn/k since the unique characterization of thresholds in Theorem 3.2 is
satisfied with these xkn.
23
Figure 2. Rule of Thumb with β = 1/2. Modified Hedonic Treadmill.
4.2. Happiness Economics. Recently, there has been a upsurge of
interest within economics in using national surveys on self-reported life
satisfaction to at least partially substitute for Gross National Product.
It is clear that GNP has serious defects—it is highly incomplete, in
particular. Clark, et al.. (2018) make an eloquent case in favor of this
new strategy, for example. The approach they propose goes beyond
the present positive approach—where hedonic utility is used to predict
behavior—to embrace a normative view—where hedonic utility is used
to make welfare judgments.
The adaptive nature of happiness raises issues for such a normative
approach. In the model with complete adaptation (with β = 0) suppose
the distribution of material outcomes doubles across the board. Is it
desirable to claim that there is no increase in welfare?51 Indeed, as a
positive matter, wouldn’t potential immigrants, for example, strictly
prefer the doubled outcomes?
51Average utility is always independent of the distribution F when β = 0.
24
More generally, (when β > 0), average hedonic utility is not merely
an unreliable guide as to the extent of a welfare change derived from
a change in the distribution of material outcomes F , there may be
reversals. That is, a distribution G that is first-order stochastically
dominated by another distribution F may nevertheless generate higher
average utility, as discussed in the previous subsection.
These issues merit reexamination in a more general model that allows
incomplete adaptation. However, the scenarios described above seem
bound to remain as possibilities, and the interpretation of national
happiness surveys in terms of welfare seems problematic.
5. Behavioral Applications
There are behavioral implications as well that fall under the rubric of
the classic psychological phenomena of behavioral contrast effects. In
an early experiment, Crespi (1942), found that rats run faster towards
larger rewards. Further, rats trained to run towards a small reward
run faster towards a large reward than do rats trained on large rewards
throughout.
In the present model, once the rats were inured to low rewards, the
advent of higher rewards would occasion a (temporary) increase in
hedonic utility—as in Figures 1 and 2—and hence a greater willingness
to undergo exertion for the reward.
These effects are not limited to rats.52 Vestergaard and Schultz (2017),
for example, consider human bidding behavior in second-price auctions.
Lesser-valued options involving the growth of value were shown to be
systematically preferred to better options with declining time profiles.53
52Neither are the effects limited to the upside. Kahneman, Fredrickson, Schreiber, and Redel-
meier (1993) confront human subjects with a choice between a short series of painful experiences
and a longer series where the longer series is the shorter series augmented by the addition of a
somewhat less painful interval to conclude. They show that people may prefer the longer series
that ends less painfully. See also Kahneman, Wakker, and Sarin (1997).53Khaw, Glimcher, and Louie, (2017), also present experimental evidence in favor of such
maladaptation. They show that the subjective value of an option in an auction varies inversely
with the average value of recently observed items. Wittman, et al. (2016) investigate the neural
25
5.1. Behavioral Adaptation—Simulations. In the model, an in-
creasing sequence of rewards leads to a sequence of pleasant surprises
that are then associated with higher levels of utility than would arise
with a decreasing sequence of rewards.
Suppose, for example, that rewards are produced by an arbitrary cdf F
that shifts to the right with time in a linear fashion. That is, the time-
shifted cdf’s are given by Ft(y) = F (y−αt), for some α > 0. Consider
thresholds xn for n = 1, ..., N and redefine these as xn = xn − αt.
It follows then that the approximating ODE as in Eq 8.2 from the
Appendix becomes
dxndt
= H(xn, xn+1)−H(xn−1, xn)− α, n = 1, ...N,
where H(xn, xn+1) = ξ(xn+1 − xn)β(F (xn+1)− F (xn)), n = 0, ..., N, as
long as α is small enough.
The proof of Proposition 8.1 from the Appendix can readily be adapted
to show that xn → x∗n where x∗n are characterized by H(xn, xn+1) =
H(xn−1, xn) + α, n = 1, ...N.
It follows immediately that the long run effect of such increase in re-
wards is to shift all thresholds down, with each threshold being shifted
down relative to the one before. Hence average utility is unambiguously
shifted up, with this effect being more pronounced for larger α.54
An analogous argument applies if α < 0 so that average utility is shifted
down by a decreasing cdf.
This analysis is illustrated in Figure 3, where the cdf is the uniform
distribution, and there are three thresholds. This cdf shifts up in a
linear fashion over the first 50, 000 periods. During this phase, the
mechanisms by which humans track the time trends of a series of rewards. Blanchard, et al.
(2014) show that monkeys are also biased in favor of sequences of rewards that increase relative
to those that decrease.
54If F is the uniform distribution on [0, 1], for example, it follows that x∗n =n(1−N(N+1)α
2
)N+1
+n(n−1)α
2, for n = 1, ..., N, given that
N(N+1)α2
< 1. Hence the gap between successive x∗n grows
with n.
26
thresholds lag behind the quartiles of the current cdf, as the above
argument implies. This raises average utility.
At period 50, 000, there is an abrupt jump up in the cdf, which creates a
spike in average utility. After period 50, 000, the cdf declines linearly, at
the same rate that it rose before, the thresholds are above the quartiles,
and average utility falls, despite the higher average rewards in this
second phase.
Consider then a choice between rising consumption—as in the first
phase in Figure 3—and falling consumption—as in the second phase.
After a temporal reversal, the rewards in the second phase vector dom-
inate those in the first, but, except for the immediate aftermath of the
jump in rewards at the midpoint, which is an artifact of the precise
sequence here, utility levels in the first phase similarly vector domin-
ate those in the second. Applying an extended version of Hypothesis
2 of Section 3, to allow for vectors of utility levels, and for temporal
permutations of these vectors, this provides a firm basis for predicting
a preference for the first phase.55
5.2. Speed versus Accuracy—Simulations. In the current model,
when ε is small, convergence to the invariant distribution is slow, but
ultimately precise.56 This issue could be sharpened by assuming that
the underlying cdf, F , was subject to occasional change. Suppose, to
be more precise, that there is a (finite, say) set of cdf’s {Fj}. With a
Poisson arrival rate, the current cdf from this set is switched to a new
one, drawn at random from this set. It is intuitively compelling that
there should then be an optimal ε > 0 and that this should vary with
the rate of introduction of novelty, in particular.57
55This explanation obviates the need to invoke “recency bias”, which could also predict a
preference for the first phase over the second.56Increasing N must also slow convergence, if only because, although there are now more
thresholds, in general at most two of these are adjusted in each period.57In a similar spirit, the number of thresholds might be allowed to vary with the problem at
hand. That is, if a problem has particularly high stakes, N might be allowed to increase, but at
a cost.
27
Figure 3. Behavioral Adaptation
This tradeoff between speed and accuracy seems bound to be theoretic-
ally robust. That is, other models that differ in detail but still capture
rapid adaptation seem bound to also produce such a tradeoff.
Figure 4 illustrates these observations for the specific version of the
model in Section 4. It depicts the evolution of the three thresholds
over time, now contrasting two different values of the grid size ε; namely
0.002, and 0.000125, top and bottom, respectively. It is evident here
that a smaller value of ε slows down the speed of adjustment but im-
proves the precision of the ultimate allocation of thresholds.
Adaptation should be slow when circumstances change infrequently;
but fast when circumstances change frequently. (This would consider
the parameter ε as endogenous, tailored to the circumstances.) This
28
Figure 4. Speed versus Accuracy
is consistent with adaptation to living in a new locale taking several
years; but adaptation to playing a game of penny ante poker being
much faster.
5.3. A Long Run Behavioral Application. The analysis of Sec-
tions 5.1 and 5.2 provides an application to a puzzle in labor economics—
why do wage streams paid to an individual by a given firm typically
rise over time?
29
This question is posed forcefully by Frank and Hutchens (1993).58 They
illuminate the issues by considering airline pilots and intercity bus
drivers, occupations in which it is hard to see commensurate productiv-
ity gains.59
Frank and Hutchens propose that utility should be modified to include
a term in the rate of growth of consumption. The present model sug-
gests an alternative neurological explanation for increasing sequences
of consumption to be favored, ceteris paribus.
6. Sketch of Application to Choice over Gambles
The current model involves essentially deterministic choices. That is,
although the two outcomes are chosen from a common distribution,
choice is made subsequent to their realization. There are fruitful ap-
plications that involve choice between gambles. For this, it is desirable
that preferences should be shaped by choices between gambles, where
the gambles themselves come are drawn from a common distribution.
Robson (2018) develops such a model and shows that it accounts for
the adaptive S-shaped utility that is a key feature of Kahneman and
Tversky (1979). Furthermore, the paper resolves the puzzle posed most
forcefully by Rabin (2000) concerning the substantial attitudes to risks
that can be observed with small stakes.
58Dustmann and Meghir (2005) provide recent evidence of the returns to tenure. Perhaps
most relevantly, unskilled workers’ wages grow noticeably with tenure, but grow with experience
only for a short period, and not at all with sector tenure.59Frank and Hutchens also argue that other resolutions to this puzzle in the literature do not
seems convincing for airline pilots or intercity bus drivers. For example, Becker (1964) argues
that firm-specific human capital might mean that premium wages in later years might motivate
workers to stay with the firm. But most training of airline pilots is done by the military prior to
their being hired by an airline. Similarly, Frank and Hutchens argue that the bonding contract
explanation of Lazear (1981) is not well-suited to pilots.
30
7. Conclusions
A key motivation here was to develop a model that is consistent with
the burgeoning neuroscience evidence about how decisions are orches-
trated in the brain. There is good evidence that economic decisions
are made by a neural mechanism with hedonic underpinnings.
We present a model where the cardinal levels of hedonic utility shift in
response to changing circumstances, as is also consistent with neuros-
cience. This adaptation acts to reduce the error caused by a limited
ability to make fine distinctions, and is evolutionarily optimal.
There is no ultimate conflict with economics, however, since this lim-
ited ability is the only reason there are mistakes at all; as this ability
improves, behavior converges to that implied by a standard economic
model.
The model provides a direct explanation of the “hedonic treadmill”,
where reported utility levels tend to revert the former levels after a
positive (or negative) shock to the circumstances. More generally, this
reversion may be partial, or there may be overshooting. Accordingly,
it provokes skepticism about a welfare interpretation of national hap-
piness surveys.
The model generates predictions concerning observed behavior. The
most straightforward of these is that individuals should make more
mistakes over small stakes decisions when they are inured to larger
stakes (and vice versa).
There are behavioral implications of the theory related to “behavioral
contrast”. For example, rapid but incomplete adaptation explains why
bidding behavior in auctions would exhibit memory effects that appear
irrational. Finally, slow adaptation would explain why increasing con-
sumption profiles might be preferred to decreasing profiles even if the
latter dominate the former.
Department of Economics, Simon Fraser University and Department
of Physics and Astronomy, University of British Columbia
31
8. Appendix—Proofs
Proof of Theorem 3.2.
Suppose that the (finite) Markov chain described by Equation (3.2) is
represented by the matrix AG.60 This is a |SG| by |SG| matrix which is
irreducible, so that there exists a integer P such that (AG)P has only
strictly positive entries.61
Consider an initial state xtG ∈ SG where 0 ≤ xt1,G ≤ xt2,G ≤ ... ≤ xtN,G ≤1. Consider then the random variable xt+∆
G that represents the state
of the chain at t + ∆, where ∆ > 0 is fixed, for the moment. Suppose
there are R iterations of the chain, where R = b∆/εc. These iterations
arise at times t + rε for r = 1, ..., R. Suppose the process is constant
between iterations, so it is defined for all t′ ∈ [t, t+ ∆].
We consider the limit as G → ∞. Taking this limit implies ε → 0,
but also speeds up the process in a compensatory way, in that R→∞,
making the limit non-trivial. This speeding up is only a technical device
and has no effect on the invariant distribution, in particular.
We adopt the notational simplification that
H(xn, xn+1) = ξ(xn+1 − xn)β(F (xn+1)− F (xn)), n = 0, ..., N.
Indeed, the key results here only rely on the properties thatH1(xn, xn+1) <
0 and H2(xn, xn+1) > 0.
We have then that xt+∆n,G = xtn,G +
∑Rr=1 εr where
60See Mathematical Society of Japan, (1987, 260, p. 963), for example.61Consider any initial configuration, x0, say, and any desired final configuration, xT , say. First
move x01 to xT1 by means of outcomes just to the right or left, as required, that do not affect any
other thresholds. This might entail x1 crossing the position of other thresholds, but temporarily
suspend the usual convention of renumbering the thresholds, if so. This will take at most G + 1
periods. Then move x02 to xT2 in an analogous way. And so on. There is a finite time, (G+ 1)N ,
such that the probability of all this is positive, given the assumptions in Section 3.2.
32
(8.1) εr =
ε with probability H(xt+rεn,G , x
t+rεn+1,G)
−ε with probability H(xt+rεn−1,G, xt+rεn,G )
0 otherwise
.
It follows thatxt+∆n,G − xtn,G
∆=
∑Rr=1 εr/ε
R(∆/(εR)),
where (∆/(εR))→ 1, as G→∞.
Since
xt+rεn ∈ [xtn,G −∆, xtn,G + ∆], r = 1, ..., R,
it follows that
Pr{εr/ε = 1} ∈ [H(xtn,G+∆, xtn+1,G−∆), H(xtn,G−∆, xtn+1,G+∆)], r = 1, ..., R
and that
Pr{εr/ε = −1} ∈ [H(xtn−1,G+∆, xtn,G−∆), H(xtn−1,G−∆, xtn,G+∆)], r = 1, ..., R,
with probability 1, in the limit as G→∞, so that ε→ 0 and R→∞.
Hence, if, finally, ∆→ 0, it follows that
xt+∆n,G − xtn,G
∆→ H(xtn, x
tn+1)−H(xtn−1, x
tn),
with probability 1, so that, with probability 1—
(8.2)dxndt
= H(xn, xn+1)−H(xn−1, xn), n = 1, ...N.62
Lemma 8.1. There exist unique x∗n, n = 1, ..., N such that dxndt
= 0, n =
1, ..., N .
Proof. Choose any x1 > 0. Then there exist x2 < x3 < ... < xN+1
such that H(0, x1) = H(x1, x2) = ... = H(xN , xN+1). Clearly xn, n =
2, ..., N + 1 are strictly increasing and continuous in x1 with xN+1 → 0
if x1 → 0 and xN+1 → ∞ as x1 → ∞. Hence there exists a unique x1
62This expression is valid even if there are ties so that xn = xn+1, for example. In this case,
xn and xn+1 immediately split apart, relying on the convention that xn ≤ xn+1.
33
such that xN+1 = 1. This generates the x∗n, n = 1, ..., N as claimed in
Theorem 3.2.
Proposition 8.1. The differential equation system given by Equation
(8.2) is globally asymptotically stable. That is, given any initial x(0)
where 0 ≤ x1(0) ≤ x2(0) ≤ ... ≤ xN(0) ≤ 1, it follows that x(t) → x∗
as t→∞.
Proof. The proof proceeds by finding a Lyapunov function.63 Re-
versing the usual order of the thresholds, for expositional clarity, the
second derivatives are given by
d2xNdt2
= (HN1 −HN−1
2 )dxNdt−HN−1
1
dxN−1
dt, ...,
(8.3)d2xndt2
= Hn2
dxn+1
dt+ (Hn
1 −Hn−12 )
dxndt−Hn−1
1
dxn−1
dt, ...,
d2x1
dt2= H1
2
dx2
dt+ (H1
1 −H02 )dx1
dt,
where Hn1 = H1(xn, xn+1) < 0 and Hn
2 = H2(xn, xn+1) > 0, for com-
pactness of notation.
Shifting to vector notation and using “dot” notation for derivatives,
for further compactness, Equations (8.2) and (8.3) can be written as
(8.4) x = D(x), and x = E(x)x respectively,
where vectors are by default column vectors and “T” denotes transpose
so that, for example, xT = (xN , xN−1, ..., x1).
The vector D(x) is implied by Equation (8.2); the matrix E(x) is given
as follows—
E(x) =
HN
1 −HN−12 −HN−1
1 0 ...
HN−12 HN−1
1 −HN−22 −HN−2
1 ...
0 HN−22 HN−2
1 −HN−32 ...
... ... ... ...
.
63See Mathematical Society of Japan, (1987, 126F, p. 492), for example.
34
Define Bn, n = 1, ..., N as the n-th principal nested minor of E(x),
where these minors are defined relative to the lower right corner of
E(x).
Lemma 8.2. The matrix E(x) is negative definite because the sign of
Bn is (−1)n for n = 1, 2, ..., N .
Proof. From the definition of Bn, it follows that
Bn = (Hn1 −Hn−1
2 )Bn−1 +Hn−11 Hn−1
2 Bn−2,
so that, rearranging,
Bn −Hn1Bn−1 = −Hn−1
2 (Bn−1 −Hn−11 Bn−2).
It also follows that B1 = H11 −H0
2 < 0 and B2 −H21B1 = H1
2H02 > 0.
Hence the sign of Bn −Hn1Bn−1 is (−1)n.
Suppose, as an induction hypothesis, that the sign of Bn−1 is (−1)n−1.
Since Bn = (Bn −Hn1Bn−1) +Hn
1Bn−1, it follows that that the sign of
Bn is (−1)n, as required to complete the proof of Lemma 8.2.
Global asymptotic stability now follows. Define a Lyapunov function
as
V (x) = xT x = D(x)TD(x), so that V (x) ≥ 0 and V (x) = 0 iff x = x∗.
Hence, since E(x) is negative definite,
V = 2xT x = 2xTE(x)x ≤ 0, and V = 0 iff x = x∗.
It follows that the ordinary differential equation system given by Equa-
tion (8.2) is globally asymptotically stable. That is, given any initial
x(0) where 0 ≤ x1(0) ≤ x2(0)... ≤ xN(0) ≤ 1, it must be that x(t)→ x∗
as t→∞. This completes the proof of Proposition 8.1.
We can now complete the proof of Theorem 3.2. Suppose that FG is the
cdf representing the unique invariant distribution of the Markov chain
with transition matrix AG. Extend FG to be defined on the entire space
S. By compactness, it follows that there exists a subsequence of the
FG that converges weakly to a cdf F defined on S (Billingsley, 1968,
35
Chapter 1). That is, FG ⇒ F as G → ∞. We will show that F puts
full weight on the singleton x∗. Once this is shown, it follows that the
full sequence must also converge to this F .
Suppose, then by way of contradiction, that F does not put full weight
on x∗, so that∫V (x)dF (x) > 0.
Reconsider then the construction that led to the differential equation
system that approximates the Markov chain, as described from the
beginning of this Appendix. Choose any x ∈ S, where x 6= x∗ and
0 ≤ x1 < x2 < ... < xN ≤ 1. Now let xG be any of the points in SG
that are closest to x. Let x∆G(x) be the random variable describing the
Markov chain at t = ∆ that starts at xG at t = 0. Consider now the
limit as G → ∞, so that the number of repetitions in the fixed time
∆, given by R = b∆/εc also tends to infinity. Suppose x∆(x) is the
solution to Equation (8.2), that is, to x = D(x), at t = ∆, given it has
initial value x at t = 0.
Given that x 6= x∗ and 0 ≤ x1 < x2 < ... < xN ≤ 1, it follows
that V (x∆(x)) < V (x), since we showed that V (x) < 0 on [0,∆]. By
hypothesis,∫V (x)dF (x) > 0. It follows that
(8.5)
∫V (x∆(x))dF (x) <
∫V (x)dF (x).
That this inequality holds in the limit implies that it must hold for
large enough G, as follows.
First, the derivation of the approximating system x = D(x) implies, in
particular, that
(8.6) EV (x∆G(x))→ V (x∆(x)) as G→∞.
It now follows that
|∫EV (x∆
G(x))dFG(x)−∫V (x∆(x))dF (x)| ≤
|∫EV (x∆
G(x))dFG(x)−∫V (x∆(x))dFG(x)|+
36
|∫V (x∆(x))dFG(x)−
∫V (x∆(x))dF (x)|.
The first term on the right hand side tends to zero, as G→∞, by the
Lebesgue dominated convergence theorem, given Equation (8.6). The
second term on the right hand side also tends to zero as G→∞ since
FG ⇒ F and the integrand is continuous. Hence
(8.7)
∫EV (x∆
G(x))dFG →∫V (x∆(x))dF, as G→∞.
Secondly, since FG ⇒ F , as G → ∞, and V is continuous, it follows
that
(8.8)
∫V (x)dFG(x)→
∫V (x)dF (x).
Altogether, then Equations (8.5), (8.7) and (8.8) imply that, whenever
G is sufficiently large
(8.9)
∫EV (x∆
G(x))dFG(x) <
∫V (x)dFG(x),
which is a contradiction, since FG is the invariant distribution.
To show this explicitly, revert to matrix notation for the finite Markov
chain with transition matrix AG. Suppose then that fG is the column
vector describing the associated invariant distribution, so that fTG =
fTGAG. As before, let R = b∆/εc. We have
EV (x∆G(x)) =
∑x∈SG
e(x)(AG)R(x)V (x),
where e(x) is the unit vector that assigns 1 to x ∈ SG and 0 to all other
elements of SG. It follows that Equation (8.9) becomes∑x∈SG
fTG(AG)R(x)V (x) <∑x∈SG
fTG(x)V (x),
which is a contradiction, since fTG(AG)R = fTG . This completes the
proof of Theorem 3.2.
Proof of Theorem 3.3.
37
Lemma 8.3. Consider a uniform distribution with pdf 1/s on the in-
terval [0, s]. The loss from choosing at random relative to the full in-
formation ideal is s/6.
The expected payoff from choosing randomly between the two arms is
clearly s/2. The expected payoff from choosing the higher of the two
arms, on the other hand, as would be the full information ideal, is 2s/3.
To see this, suppose
K(y) = Pr{max{y1, y2} < y} = Pr{y1&y2 < y} = (y/s)2.
Hence∫ s
0ydK(y) = 2s/3. It follows that the expected loss from choos-
ing at random is s/6, proving Lemma 8.3.
Define now the expected fitness loss, relative to the full information
ideal, for the step function pdf as in the statement of Theorem 3.4, to
be L(N). It follows that
(8.10) L = (1/6)M∑m=1
(nm − 1)sm(αmsm)2 +M∑m=1
dm.
Here, nm is the number of thresholds that lie in the subinterval [(m−1)/M,m/M ], which must be evenly spaced apart with a distance sm
between them, except at the ends of the subinterval. In the intervals
that overlap m/M , the expected loss is dm, say.
This expression for L(N) holds because the loss between each pair of
thresholds in [(m−1)/M,m/M ] is sm/6, conditional on both outcomes
being in that range, there are nm − 1 such ranges, and the probability
of both outcomes lying in each range is (αmsm)2.
The limiting equilibrium of the rule of thumb entails
H(xn−1, xn) = H(xn, xn+1) = H(xn+1, xn+2).
If xn ≤ (m/M) < xn+1 it follows that H(xn−1, xn) = H(xn+1, xn+2), so
that
αm(sm)1+β = αm+1(sm+1)1+β,
38
since sm = xn − xn−1 and sm+1 = xn+2 − xn+1. It follows that there
exists λ such that
sm = λ(αm)−1/(1+β),m = 1, ...,M.
It also follows that
dm ≤ (1/6)(α)2(s)3, where α = maxm
αm and s = 2 maxm
sm.
Furthermore,
(nm−1)sm ≤ 1/M and, since (nm+1)sm ≥ 1/M, (nm−1)sm ≥ 1/M−2sm
Each value of N induces a corresponding value of λ, further, λ→ 0 as
N →∞.
The foregoing implies that
L ≥ (1/6)M∑m=1
(1/M−2sm)(αmsm)2 and L ≤ (1/6)M∑m=1
(1/M)(αmsm)2+(M/6)s3α2.
There exists η such that s ≤ ηλ. Since it is also true that sm =
λ(αm)−1/(1+β),m = 1, ...,M , it follows that
L/λ2 →M∑m=1
α2β/(1+β)m /(6M) as λ→ 0.
Furthermore, since (nm − 1)sm ∈ [1/M − 2sm, 1/M ], it follows that
nm ≤ α1/(1+β)m /(Mλ) + 1 and nm ≥ α1/(1+β)
m /(Mλ)− 1.
Since∑
m nm = N , it follows that
Nλ ∈ [∑m
α1/(1+β)m /M −Mλ,
∑m
α1/(1+β)m /M +Mλ].
Thus
Nλ→∑m
α1/(1+β)m /M as λ→ 0.
Hence
N2L→∑m
α2β/(1+β)m (
∑m
α1/(1+β)m )2/(6M3) as N →∞.
39
Lemma 8.4. The expression∑
m α2β/(1+β)m (
∑m α
1/(1+β)m )2 is minimized
uniquely by choice of β = 1/2.
This follows from the Holder Inequality (Royden, 1988, p. 119) since∑m
α2β/(3(1+β))m α2/(3(1+β))
m =∑m
α2/3m ≤ (
∑m
α2β/(1+β)m )1/3(
∑m
α1/(1+β)m )2/3.
Furthermore, equality can only hold here if α2β/(1+β)m = α
1/(1+β)m ; that
is, only if β = 1/2.
It follows that, when β = 1/2,
N2L→ (∑
α2/3m )3/(6M3).
This completes the proof of Theorem 3.3
Proof of Theorem 3.4
We will now show that the optimal rule has the same leading term as
the rule of thumb with β = 1/2. Suppose that the expected loss from
the optimal placement of the thresholds, relative to the full information
ideal, is given by L∗. An entirely similar argument to that used for L
shows, upon multiplying by N2, that
N2L∗ ∈
[(1/6)M∑m=1
(1/M−2sm)(αmNsm)2, (1/6)M∑m=1
(1/M)(αmNsm)2+(MN2/6)s3α2].
Consider the vector (n1/N, ..., nm/N, ..., nM/M). By compactness, there
must exist a convergent subsequence such that nm/N → γm, N → ∞.We will characterize the γm uniquely, so that the entire sequence must
then converge to these values as well.
It must be that sm → 0 for all m, as N → ∞, since otherwise it
would not be true that L∗ → 0, contradicting the optimality of L∗. It
follows from (nm − 1)sm ≤ 1/M and (nm − 1)sm ≥ 1/M − 2sm that
nmsm → 1/M .
40
If γm = 0, it follows from (nm/N)(Nsm) ≥ 1/M − sm that Nsm →∞.
This implies that N2L∗ → ∞ which is not optimal. Hence we have
Nsm → 1/(Mγm).
It follows now that Ns is bounded above and that s ≤ 2 maxm sm → 0,
as G→∞. Hence
N2L∗ → (∑m
α2m/γ
2m)/(6M3).
The optimal rule must minimize this expression over the choice of the
γm ≥ 0,m = 1, ...,M where∑
m γm = 1. Since this function is convex
in the γm ≥ 0,m = 1, ...,M the first-order conditions are necessary and
sufficient for a global minimum. There must then exist a λ such that
α2m/γ
3m = λ3 so γm = α
2/3m /λ. It follows that λ =
∑m α
2/3m . Hence
N2L∗ → (∑
α2/3m )3/(6M3).
This completes the proof of Theorem 3.4.
References
[1] Stephen A. Baccus and Markus Meister. Fast and slow contrast adaptation in retinal circuitry.
Neuron, 36:909–919, 2002.
[2] Gary S. Becker. Human Capital. Columbia University Press, New York, 1964.
[3] Patrick Billingsley. Convergence of Probability Measures. John Wiley and Sons, New York,
1968.
[4] Tommy C. Blanchard, Lauren S. Wolfe, Ivo Vlaev, Joel S. Winston, and Benjamin Y. Hayden.
Biases in preferences for sequences of outcomes in monkeys. Cognition, 130:289–299, 2014.
[5] Christopher J. Burke, Michelle Baddeley, Philippe Tobler, and Wolfram Schultz. Partial
adaptation of obtained and observed value signals preserves information about gains and
losses. Journal of Neuroscience, 36(39):10016–25, 2016.
[6] Andrew Caplin and Mark Dean. Dopamine, reward prediction, and economics. Quarterly
Journal of Economics, 123(2):663–701, 2008.
[7] Andrew E. Clark, Sarah Fleche, Richard Layard, Nattavudh Powdthavee, and George Ward.
The Origins of Happiness: The Science of Well-Being Over the Life Course. Princeton
University Press, Princeton and Oxford, 2018.
[8] Leo P. Crespi. Quantitative variation of incentive and performance in the white rat. The
American Journal of Psychology, 55(4):467–517, 1942.
[9] Christian Dustmann and Costas Meghir. Wages, experience and seniority. Review of Eco-
nomic Studies, 72:77–108, 2005.
[10] Robert H. Frank and Robert M. Hutchens. Wages, seniority, and the demand for rising
consumption profiles. Journal of Economic Behavior and Organization, 21:251–276, 1993.
41
[11] Shane Frederick and George Loewenstein. Hedonic adaptation. In Edward Diener
Daniel Kahneman and Norbert Schwartz, editors, Well-Being: The Foundations of Hedonic
Psychology, pages 302–329. Russell Sage Foundation Press, New York, 1999.
[12] Todd A. Hare, Wolfram Schultz, Colin F. Camerer, John P. O’Doherty, and Antonio Rangel.
Transformation of stimulus values into motor commands during simple choice. Proceedings
of the National Academy of Sciences of the USA, 108:18120–25, 2011.
[13] Gerhard Jocham, Tilmann A. Klein, and Markus Ullsperger. Dopamine-mediated reinforce-
ment learning signals in the striatum and ventromedial prefrontal cortex underlie value-based
choices. Journal of Neuroscience, 2:1606–13, 2011.
[14] Daniel Kahneman, Barbara L. Fredrickson, Charles A. Schreiber, and Donald Redelmeier.
When more pain is preferred to less: Adding a better end. Psychological Science, 4(6):401–
405, 1993.
[15] Daniel Kahneman and Amos Tversky. Prospect theory: An analysis of decision under risk.
Econometrica, 47(2):263–291, 1979.
[16] Daniel Kahneman, Peter P. Wakker, and Rakesh Sarin. Back to Bentham? Explorations of
experienced utility. Quarterly Journal of Economics, 112(2):375–406, 1997.
[17] Mel W. Khaw, Paul W. Glimcher, and Kenway Louie. History-dependent adaptation in sub-
jective value: A waterfall illusion for choice. Center for Neural Science, New York University,
2017.
[18] Simon Laughlin. A simple coding procedure enhances a neuron’s information capacity. Zeits-
chrift fur Naturforschung C, 36:910–912, 1981.
[19] Edward Lazear. Agency, earnings profiles, productivity, and hours restrictions. American
Economic Review, 71:606—620, 1981.
[20] Kenway Louie and Paul W. Glimcher. Efficient coding and the neural represntation of value.
Annals of the New York Academy of Sciences, 1251:13–32, 2012.
[21] Mathematical Society of Japan, Kiyoshi Ito, ed. Encyclopedic Dictionary of Mathematics,
Third Ed. MIT Press, Cambridge, MA, 1987.
[22] Frederick Mosteller and Philip Nogee. An experimental measurement of utility. Journal of
Political Economy, 59(5):371–404, 1951.
[23] Nick Netzer. Evolution of time preferences and attitudes towards risk. American Economic
Review, 99(3):937–55, 2009.
[24] Matthew Rabin. Risk aversion and expected-utility theory: A calibration theorem. Econo-
metrica, 68:1281–1292, 2000.
[25] Luis Rayo and Gary Becker. Evolutionary efficiency and happiness. Journal of Political Eco-
nomy, 115(2):302–337, 2007.
[26] Alfonso Renart and Christian K. Machens. Variability in neural activity and behavior. Cur-
rent Opinion in Neurobiology, 25:211–220, 2014.
[27] Arthur J. Robson. A biological basis for expected and non-expected utility. Journal of Eco-
nomic Theory, 68(2):397–424, 1996.
[28] Arthur J. Robson. The biological basis of economic behavior. Journal of Economic Literature,
39(1):11–33, 2001.
[29] Arthur J. Robson. Adaptive hedonic utility and risk-taking. Working paper, Simon Fraser
University, 2018.
[30] Arthur J. Robson and Larry Samuelson. The evolutionary optimality of decision and experi-
enced utility. Theoretical Economics, 6:311, 2011.
42
[31] Robert D. Rogers. The roles of dopamine and serotonin in decision making: Evidence from
pharmacological experiments in humans. Neuropsychopharmacology, 36:114–132, 2011.
[32] H. L. Royden. Real Analysis. Prentice Hall, Englewood Cliffs, NJ, 1988.
[33] David A. Schkade and Daniel Kahneman. Does living in California make people happy? A
focusing illusion in judgments of life satisfaction. Psychological Science, 9(5):340–346, 1998.
[34] Wolfram Schultz. Neuroan reward and decision signals: From theories to data. Physiological
Reviews, 95:853–951, 2015.
[35] Wolfram Schultz. Dopamine reward prediction error coding. Dialogues in Clinical Neuros-
cience, 18(1), 2016.
[36] Michael N. Shadlen and Daphna Shohamy. Decision making and sequential sampling from
memory. Neuron, 90:927–39, 2016.
[37] William R. Stauffer, Armin Lak, and Wolfram Schultz. Dopamine reward prediction error
responses reflect marginal utility. Current Biology, 90:927–39, 2014.
[38] Philippe N. Tobler, Christopher D. Fiorillo, and Wolfram Schultz. Adaptive coding of reward
value by dopamine neurons. Science, 307:1642–1645, 2005.
[39] D.J. Tolhust, J. A. Movshon, and A. F. Dean. The statistical reliability of signals in single
neurons in cat and monkey visual cortex. Vision Research, 23(8):775–785, 1983.
[40] Martin Vestergaard and Wolfram Schultz. Choice mechanisms for past, temporally extended
outcomes. Proceedings of the Royal Society: Series B, 282:20141766, 2015.
[41] Joshua A. Weller, Irwin P. Levin, Baba Shiv, and Antonio Bechara. Neural correlates of
adaptive decision making for risky gains and losses. Psychological Science, 18(11):958–64,
2007.
[42] Marco K. Wittman, Nils Kolling, Rei Akaishu, Bolton K. H. Chau, Joshua W. Brown, Natalie
Nelissen, and Matthew F. S. Rushworth. Predictive decision making driven by multiple time-
linked reward representations in the anterior cingulate cortex DOI 10.1038/ncomms12327.
Nature Communications, 2016.
[43] Michael Woodford. Inattentive valuation and reference-dependent choice. Department of Eco-
nomics, Columbia University, 2012.