75
The Short-Run and Long-Run E/ects of Behavioral Interventions: Experimental Evidence from Energy Conservation Hunt Allcott and Todd Rogers May 30, 2013 Abstract Interventions to a/ect behaviors such as smoking, exercise, workplace e/ort, or pro-social behaviors can often have large short-run impacts but uncertain or disappointing long-run e/ects. We study the three longest-running sites of a larger program designed to induce energy conser- vation, in which home energy reports containing personalized feedback, social comparisons, and energy conservation information are being repeatedly mailed to more than six million house- holds across the United States. We show that treatment group households reduce electricity use within days of receiving each of their initial few reports, but these immediate responses decay rapidly in the months between reports. As more reports are delivered, the average treat- ment e/ect grows but the high-frequency pattern of action and backsliding attenuates. When a randomly-selected group of households has reports discontinued after two years, the e/ects are much more persistent than they had been between the initial reports, implying that households have formed a new "capital stock" of physical capital or consumption habits. We show how assumptions about long-run persistence can be important enough to change program adoption decisions, and we illustrate how program design that accounts for the capital stock formation process can signicantly improve cost e/ectiveness. JEL Codes: D03, D11, L97, Q41. Keywords: Energy e¢ ciency, habit formation, persistence, social comparisons. We thank Ken Agnew, David Cesarini, Gary Charness, Paul Cheshire, Xavier Gabaix, Francesca Gino, Uri Gneezy, Judd Kessler, Sendhil Mullainathan, Karen Palmer, Charlie Sprenger, sta/ at the utilities we study, and a number of seminar participants for feedback and helpful conversations. Thanks also to Tyler Curtis, Lisa Danz, Rachel Gold, Arkadi Gerney, Marc Laitin, Laura Lewellyn, Elena Washington, and many others at Opower for sharing data and insight with us. We are grateful to the Sloan Foundation for nancial support of our research on the economics of energy e¢ ciency. Stata code for replicating the analysis is available from Hunt Allcotts website. Allcott: NYU Department of Economics. [email protected]. Rogers: Harvard Kennedy School. [email protected]. 1

The Short-Run and Long-Run Effects of Behavioral ......The Short-Run and Long-Run E⁄ects of Behavioral Interventions: Experimental Evidence from Energy Conservation Hunt Allcott

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • The Short-Run and Long-Run E¤ects of Behavioral Interventions:

    Experimental Evidence from Energy Conservation

    Hunt Allcott and Todd Rogers�

    May 30, 2013

    Abstract

    Interventions to a¤ect behaviors such as smoking, exercise, workplace e¤ort, or pro-social

    behaviors can often have large short-run impacts but uncertain or disappointing long-run e¤ects.

    We study the three longest-running sites of a larger program designed to induce energy conser-

    vation, in which home energy reports containing personalized feedback, social comparisons, and

    energy conservation information are being repeatedly mailed to more than six million house-

    holds across the United States. We show that treatment group households reduce electricity

    use within days of receiving each of their initial few reports, but these immediate responses

    decay rapidly in the months between reports. As more reports are delivered, the average treat-

    ment e¤ect grows but the high-frequency pattern of action and backsliding attenuates. When a

    randomly-selected group of households has reports discontinued after two years, the e¤ects are

    much more persistent than they had been between the initial reports, implying that households

    have formed a new "capital stock" of physical capital or consumption habits. We show how

    assumptions about long-run persistence can be important enough to change program adoption

    decisions, and we illustrate how program design that accounts for the capital stock formation

    process can signicantly improve cost e¤ectiveness.

    JEL Codes: D03, D11, L97, Q41.Keywords: Energy e¢ ciency, habit formation, persistence, social comparisons.

    We thank Ken Agnew, David Cesarini, Gary Charness, Paul Cheshire, Xavier Gabaix,

    Francesca Gino, Uri Gneezy, Judd Kessler, Sendhil Mullainathan, Karen Palmer, Charlie Sprenger,

    sta¤ at the utilities we study, and a number of seminar participants for feedback and helpful

    conversations. Thanks also to Tyler Curtis, Lisa Danz, Rachel Gold, Arkadi Gerney, Marc

    Laitin, Laura Lewellyn, Elena Washington, and many others at Opower for sharing data and

    insight with us. We are grateful to the Sloan Foundation for nancial support of our research

    on the economics of energy e¢ ciency. Stata code for replicating the analysis is available from

    Hunt Allcotts website.

    �Allcott: NYU Department of Economics. [email protected]. Rogers: Harvard Kennedy [email protected].

    1

  • 1 Introduction

    When making choices such as whether or not to smoke, exercise, eat healthfully, work or study

    hard, pay bills on time, and conserve resources, our choices sometimes di¤er from those that would

    maximize social welfare, or perhaps even our own long-run welfare. Individuals, employers, and

    government agencies have therefore experimented with many di¤erent kinds of interventions to

    "improve" behaviors, including nancial incentives, information provision, commitment contracts,

    appeals to the public good, and social comparisons. In an e¤ort to produce useful and timely

    insights about these interventions, evaluations often only examine short-run e¤ects. The studies

    that do examine long-run e¤ects often nd that it is di¢ cult to sustainably change behaviors.1

    Given this, several related questions are crucial to designing and evaluating programs aimed at

    changing human behaviors. First, do people continue to respond to repeated intervention, or do we

    eventually stop paying attention? Second, how persistent are e¤ects after the intervention ends?

    Third, do longer interventions cause more persistent post-intervention e¤ects? These questions

    often determine whether an intervention is cost-e¤ective, and understanding the answers can help

    optimize program design.

    This paper is a comprehensive study of the short-run and long-run e¤ects of an intervention

    that aims to induce energy conservation by sending home energy reports that feature personalized

    feedback, social comparisons, and energy conservation information. The reports are mailed to

    households monthly or every few months for an indenite period of time. The program, which

    is managed by a company called Opower, is typically implemented as a randomized control trial,

    giving particularly credible estimates of its e¤ects on energy use. Opowers programs are being

    implemented at 85 utilities across the United States, and there now 8.9 million households in their

    experimental populations, including 6.2 million in treatment. Utilities hire Opower to implement

    the intervention primarily because the resulting energy savings help to comply with state regulations

    requiring energy conservation.

    We study three Opower programs with three important features. First, these are the three

    longest-running Opower programs, having begun between early 2008 and early 2009. By the end of

    1For example, see Cahill and Perera (2009) for a review of the long-run e¤ects of interventions to encouragesmoking cessation, as well as a number of studies of exercise, weight loss, school performance, and other behaviorsthat we discuss later in the introduction.

    2

  • our sample, some treatment group households have received home energy reports for 60 consecutive

    months, and one might wonder whether people attenuate to this repeated intervention over such a

    long time horizon. Second, a randomly-selected subset of treatment group households was dropped

    from treatment after about two years, allowing us to measure the persistence of e¤ects for two

    to three years after the intervention stops. Third, while most utilities manually record household

    electricity use on a monthly basis, one of our three utilities uses advanced meters that record

    consumption each day. Although in recent years, millions of households have been outtted with

    similar "smart meters" (Joskow 2012, Joskow and Wolfram 2012), the granularity of these data has

    generated privacy concerns that make them especially di¢ cult to acquire for research. At this site

    alone, we have 225 million observations of daily electricity use at 123,000 de-identied households.

    The data reveal a remarkable pattern of "action and backsliding": consumers reduce electricity

    use markedly within days of receiving each of their initial reports, but these immediate e¤orts

    decay at a rate that might cause the e¤ects to disappear in well under a year if the treatment

    were not repeated. However, this cyclical pattern of action and backsliding attenuates after the

    rst few reports: the immediate consumption decreases after report arrivals become much smaller,

    and the decay rate between reports becomes statistically indistinguishable from zero. Although

    the short-run pattern of cyclical responses attenuates, the average treatment e¤ects do not: for the

    groups that continue to receive reports at the same frequency throughout our samples, the e¤ects

    continue to grow - even after 60 consecutive months.

    Compared to behavioral interventions in other domains, the e¤ects are remarkably persistent.

    For the groups whose reports are discontinued after about two years, the e¤ects begin to decay at

    about 10 to 20 percent per year. Furthermore, this decay rate is at least six times slower than the

    decay rate between the initial reports. This di¤erence implies that as the intervention is repeated,

    people gradually develop a new "capital stock" that makes the e¤ects persistent. This capital stock

    might be physical capital, such as new energy e¢ cient lightbulbs, or "consumption capital" - a

    stock of energy use habits in the sense of Becker and Murphy (1988).

    Tangibly, what are consumers doing in response to the intervention? To answer this, we analyze

    detailed surveys of 6000 households in treatment and control groups across ve sites. The results

    suggest that the intervention makes very little di¤erence on the "extensive margin": that is, the set

    3

  • of energy conservation actions that households report taking. Although these self-reports should

    be interpreted very cautiously, one interpretation is that some of the behavior changes are on the

    "intensive margin," by which we mean that the program motivates households to take more of the

    same actions that they were already taking.

    The surveys do suggest that the intervention increases participation in other utility-run energy

    e¢ ciency programs, which involve improvements to physical capital stock, such as insulation or

    refrigerators, that would mechanically generate persistent energy savings. To corroborate this,

    we analyze two utilitiesadministrative data on program participation by treatment and control

    groups. We show that the intervention does cause statistically signicant increases in utility pro-

    gram participation, but these di¤erences are not economically large: they explain only a small

    share of the treatment e¤ects. Our interpretation of the treatment e¤ect patterns, the surveys, and

    the program participation data is that instead of responding in one or two major ways, di¤erent

    households respond by taking a wide variety of actions, no one of which constitutes an important

    share of the total reductions by the treatment group.

    After presenting the empirical results, we turn to the economic implications. We show how

    long-run persistence can play an important role in whether or not it is cost-e¤ective to adopt an

    intervention. In these three sites, di¤erent assumptions about persistence would have changed the

    ex-ante predicted cost e¤ectiveness by a factor of two or more. One broader implication is that

    when deciding whether or not to scale up other interventions when long-run results such as these

    are not available, it will in some cases be optimal to rst measure whether short-run e¤ects are

    persistent - even if this measurement delays the decision-making process. In our context, it turns

    out that analysts had systematically made assumptions that were too conservative, but in other

    settings this may not be the case.

    We also show how the dynamics of persistence can play an important role in designing behavioral

    interventions. In doing this, we conceptually distinguish two separate channels through which

    repeated intervention changes cumulative outcomes. The additionality e¤ect reects the extent

    to which repeated intervention increases e¤ects during the treatment period, holding constant

    the post-intervention decay rate. The persistence e¤ect captures the extent to which repeated

    intervention induces households to change physical capital or consumption habits, which in turn

    4

  • causes the changes in outcomes to last longer after the intervention ends. We quantify these e¤ects

    in the context of the Opower program, showing that the persistence e¤ect is the primary reason

    why repeated intervention improves cost e¤ectiveness. For Opower and for other interventions in

    other contexts, this suggests that it is important to repeat an intervention until participants have

    developed a new "capital stock" of habits or other technologies. After that point, it may be optimal

    to reduce the frequency of intervention, unless the additionality e¤ect e¤ect is very powerful.

    Our results are related to several di¤erent literatures. The action and backsliding in response

    to home energy reports is reminiscent of evidence that consumers "learn" about late fees and other

    charges as we incur them, but we act as if we forget that knowledge over time (Agarwal et al. 2011,

    Haselhuhn et al. 2012). For some consumers, the home energy report acts simply as a reminder

    to conserve energy, making this related to studies of reminders to save money (Karlan, McConnell,

    Mullainathan, and Zinman 2010) or take medicine (Macharia et al. 1992). Ebbinghaus (1885),

    Rubin and Wenzel (1996), and others have quantied the decay of memory and the functional

    form of "forgetting curves." Our results are novel in that they provide a clear illustration of how

    consumersattention waxes and wanes in response to repeated stimuli, but this cyclical action and

    backsliding eventually attenuates.

    There are also studies of the medium- and long-run e¤ects of interventions to a¤ect exercise

    (Charness and Gneezy 2009, Royer, Stehr, and Snydor 2013), smoking (Gine, Karlan, and Zinman

    2010, Volpp et al. 2009), weight loss (Anderson et al. 2010, Burke et al. 2012, John et al.

    2011), water conservation (Ferraro, Miranda, and Price 2011, Ferraro and Price 2013), academic

    performance (Jackson 2010, Levitt, List, and Sado¤ 2010), voting (Gerber, Green, and Shachar

    2003), charitable donations (Landry et al. 2010), labor e¤ort (Gneezy and List 2006), and other

    repeated choices. Compared to these studies, we document relatively persistent changes in outcomes

    over a relatively long time horizon.

    Aside from being of scientic interest, these results have very concrete practical importance.

    Each year, electric and natural gas utilities spend several billion dollars on energy conservation

    programs in an e¤ort to both reduce energy use externalities and ameliorate other market failures

    that a¤ect investment in energy-using durable goods (Allcott and Greenstone 2012). Traditionally,

    one signicant disposition of these funds has been to subsidize energy e¢ cient investments, such

    5

  • as Energy Star appliances or home energy weatherization. Recently, there has been signicant

    interest in "behavioral" energy conservation programs, by which is meant information, persuasion,

    and other non-price interventions.2 The Opower programs are perhaps the most salient example

    of this approach. One of the foremost questions on practitionersminds has been the extent to

    which behavioral interventions have persistent long-run e¤ects: while capital stock changes like new

    insulation are believed to reduce energy use for many years, it was not obvious what would happen

    after several years of home energy reports. Our results give some initial evidence on this issue.

    The paper proceeds as follows. Section 2 gives additional background on the program and

    describes the data. Sections 3 presents short-run analysis using high-frequency data, while Section

    4 presents the long-run analysis. Section 5 presents evidence on the physical actions that consumers

    take in response to the intervention. Section 6 carries out the cost e¤ectiveness analysis, and Section

    7 concludes.

    2 Experiment Overview

    2.1 Background

    Aside from selling energy, most large electric and natural gas utilities in the U.S. also run energy

    conservation programs, such as home energy audit and weatherization programs and rebates for

    energy e¢ cient light bulbs and appliances. While energy conservation can reduce revenues for

    private investor-owned utilities, many states require utilities to fund conservation programs out

    of small surcharges called System Benets Charges, and in recent years many states have passed

    Energy E¢ ciency Resource Standards that require utilities to cause consumers to reduce energy

    use by some amount relative to counterfactual, often 0.5 to 1 percent per year.

    Opower is a rm that contracts with utilities to help meet these energy conservation require-

    ments. Their "technology" is unusual: instead of renovating houses or subsidizing energy e¢ ciency,

    they send two-page Home Energy Report letters to residential consumers every month or every

    several months. Figure 1 reproduces a home energy report for an example utility. The rst page

    2Abrahamse et al. (2005) is a useful literature review of behavioral interventions centered around energy conser-vation, and Allcott and Mullainathan (2010b) cite some of the more recent work.

    6

  • features a "Social Comparison Module," which compares the households energy use from the pre-

    vious month to that of 100 neighbors with similar house characteristics. The second page includes

    personalized energy consumption feedback, which varies from report to report. This feedback might

    include individualized comparisons to usage from the same season in previous years or trends in

    usage compared to neighbors. The second page also includes an "Action Steps Module," which

    provides energy conservation tips. These tips are drawn from a large library of possible tips, and

    they vary with each report. Some tips are targeted at specic types of households: for example, a

    household with relatively heavy summer usage is more likely to see information about purchasing

    energy e¢ cient air conditioners.

    The initial proof of concept that social comparisons could a¤ect energy use was developed in

    pair of papers by Nolan et al. (2008) and Schultz et al. (2007). There is also a body of evidence that

    social comparisons a¤ect choices in a variety of domains, such as voting (Gerber and Rogers 2009),

    retirement savings (Beshears et al. 2012), water use (Ferraro and Miranda 2013, Ferraro and Price

    2013) and charitable giving (Frey and Meier 2004), as well as a broader literature in psychology

    on social norms, including Cialdini, Reno, and Kallgren (1990) and Cialdini et al. (2006). There

    is now a mini-literature focusing on the Opower programs. Allcott and Mullainathan (2012) show

    that the average treatment e¤ects across the rst 14 Opower sites range from 1.4 to 2.8 percent of

    electricity use. Opower programs have also been studied by Allcott (2011), Ayres, Raseman, and

    Shih (2012), Costa and Kahn (2013), Davis (2011), and a number of industry practitioners and

    o¢ cial program evaluators such as Ashby et al. (2012), Integral Analytics (2012), KEMA (2012),

    Opinion Dynamics (2012), Perry and Woehleke (2013), and Violette, Provencher, and Klos (2009).

    Within this literature on Opower, our contributions are clear. First, we are the rst to document

    consumers"action and backsliding" using high-frequency data.3 Second, this is the rst (publicly-

    available) study that compares long-run persistence across multiple sites. Third, we bring together

    the short-run and long-run analyses to analyze how the persistence a¤ects cost e¤ectiveness and

    optimal program design.

    3Although we are the rst to be able to document it, this potential e¤ect has been of previous interest. For example,Ayres, Raseman, and Shih (2012) test for what they call a "staleness e¤ect" using monthly data for recipients ofquarterly reports, but nd no evidence that e¤ects vary with the time since reports. It would have been unlikelyfor Ayres, Raseman, and Shih (2012) or others to even nd suggestive evidence in monthly data because the reportarrival dates do not match up well with the monthly billing cycles.

    7

  • 2.2 Experimental Design

    We study the three longest-running Opower home energy report programs, which take place at

    three utilities which we have been asked not to identify directly. Table 1 outlines experimental

    design and provides descriptive statistics. The three sites are in di¤erent parts of the country: Site

    1 is in the upper Midwest, with cold winters and mild summers, while Sites 2 and 3 are on the

    West coast.

    The initial experimental populations across the three sites comprise 234,000 residential electric-

    ity consumers. The populations have several common features: all households must be single-family

    homes, have at least one to two years of valid pre-experiment energy use data, and satisfy some

    additional technical conditions.4 Some aspects of the populations di¤er across sites. Site 1 is a

    relatively small utility, and its entire residential customer population was included. Site 2 is a larger

    utility, with about 930,000 residential customers in 2007 (U.S. Energy Information Administration

    2013). The utility decided to limit the program to households in particular parts of their service

    territory that purchase both electricity and natural gas, and they included only relatively heavy

    energy users (more than 80 million British thermal units per year). Site 3 is also a relatively large

    utility, with about 520,000 residential customers. In Site 3, Opower selected census tracts within

    the customer territory to maximize the number of eligible households.

    The experimental populations were randomly assigned to treatment and control. In Site 3,

    which was Opowers rst program ever, households were grouped into 952 geographically-contiguous

    "block batch" groups, each with an average of 88 households. This was done because of initial

    concern over geographic spillovers: that people would talk with their neighbors about the reports.

    No evidence of this materialized, and all programs since then, including Sites 1 and 2, have been

    randomized at the household level. In Sites 1 and 2, treatment group households were randomly

    assigned to receive either monthly or quarterly reports. In Site 3, heavier users were assigned to

    receive monthly reports, while lighter users were assigned to quarterly.

    4Typically, households in Opowers experimental populations need to have valid names and addresses, no negativeelectricity meter reads, at least one meter read in the last three months, no signicant gaps in usage history, exactlyone account per customer per location, and a su¢ cient number of neighbors to construct the neighbor comparisons.Households that have special medical rates or photovoltaic panels are sometimes also excluded. Utility sta¤ and"VIPs" are sometimes automatically enrolled in the reports, and we exclude these non-randomized report recipientsfrom any analysis. These technical exclusions eliminate only a small portion of the potential population.

    8

  • The three experiments began between early 2008 and early 2009. After about two years, a subset

    of treatment group households were randomly selected to stop receiving reports. We call this group

    the "dropped group." The remainder of the treatment group, which we call the "continued group,"

    is still receiving reports. In Sites 2 and 3, the entire continued group is still receiving reports at

    their original assigned frequency. In Site 1, however, the utility wished to taper the frequency for

    the continued group, so this group is now receiving biannual reports.

    2.3 Data for Long-Run Analysis

    In the "long-run analysis," we analyze monthly billing data from three sites and estimate treatment

    e¤ects over the past four to ve years. The three utilities in our analysis bill customers approxi-

    mately once a month, and our outcome variable is mean electricity use per day over a billing period.

    We therefore have about 12 observations per household per year, or 16.7 million total observations

    in the three sites.

    In each site, we construct baseline usage from the earliest one-year period when we observe

    electricity bills for nearly all households. In each site, average baseline usage is around 30 kilowatt-

    hours (kWh) per day, or between 11,000 and 11,700 kWh per year. These gures are close to

    the mean residential electricity usage for 2007 that that the utilities reported to the U.S. Energy

    Information Administration (2013), although in Sites 2 and 3 the slightly higher means reect the

    fact that the populations were drawn from heavier users. The gures are also close to the national

    average of 11,280 (U.S. Energy Information Administration 2011). For context, one kilowatt-

    hour is enough energy to run either a typical new refrigerator or a standard 60-watt incandescent

    lightbulb for about 17 hours. In the average American home, space heating and cooling are the

    two largest uses of electricity, comprising 26 percent of consumption. Refrigerators and hot water

    heaters use 17 and 9 percent of electricity, respectively, while lighting also uses about 9 percent

    (U.S. Energy Information Administration 2009). Appendix Figure A1 provides more detail on

    nationwide household electricity use.

    The three utilities also have fairly standard pricing policies. Site 1 charges 10 to 11 cents/kWh,

    depending on the season. Sites 2 and 3 have increasing block schedules, with marginal prices of 8

    to 11 cents/kWh and 8 to 18 cents/kWh, respectively, depending again on the season.

    9

  • While there appear to be very few errors in the dataset, there are a small number of very high

    meter reads that may be inaccurate. We exclude any observations with more than 1500 kilowatt-

    hours per day. In Site 2, for example, this is 0.00035 percent of observations. In all three sites,

    baseline energy usage is balanced between treatment and control groups, as well as between the

    dropped and continued groups within the treatment group.

    We also observe temperature data from the National Climatic Data Center, which are used to

    construct heating degree-days (HDDs) and cooling degree-days (CDDs). The heating degrees for a

    particular day is the di¤erence between 65 degrees and the mean temperature, or zero, whichever

    is greater. Similarly, the cooling degree days (CDDs) for a particular day is the di¤erence between

    the mean temperature and 65 degrees, or zero, whichever is greater. For example, a day with

    average temperature 95 has 30 CDDs and zero HDDs, and a day with average temperature 60

    has zero CDDs and 5 HDDs. HDDs and CDDs vary at the household level, as households are

    mapped to di¤erent nearby weather stations. Because heating and cooling are such important

    uses of electricity in the typical household, heating and cooling degrees are important correlates of

    electricity demand.

    There is one source of attrition from the data: households that become "inactive," typically

    when they move houses. If a customer becomes inactive, he or she no longer receives reports

    after the inactive date, and in most cases we do not observe the electricity bills. In our primary

    specications, we do include the households that become inactive, but we exclude any data observed

    after the inactive date. As Table 1 shows, 20 to 26 percent of households move in the four to ve

    years after treatment begins, or about ve percent per year. The table presents six tests of balanced

    attrition: treatment vs. control and dropped vs. continued in each of the three sites. One of those

    six tests rejects equality: in Site 1, dropped group households are slightly more likely to move than

    continued households. We are not very concerned that this could bias the results, because the two

    groups are balanced on pre-treatment usage, Figure 4a shows that the treatment e¤ects during the

    joint treatment period are almost visually indistinguishable, and Appendix Table A5 conrms that

    the treatment e¤ects are statistically indistinguishable during the rst and second years of joint

    treatment.

    There is also a source of attrition from the program: people in the treatment group can contact

    10

  • the utility and opt out of treatment. In these sites, about two percent of the treatment group has

    opted out since the programs began. We continue to observe electricity bills for households that opt

    out, and we of course cannot drop them from our analysis because this would generate imbalance

    between treatment and control. We estimate an average treatment e¤ect (ATE) of the program,

    where by "treatment" we more precisely mean "receiving reports or opting out." Our treatment

    e¤ects could also be viewed as intent-to-treat estimates, where by the end of the sample, the Local

    Average Treatment E¤ect on the compliers who do not opt out is about 1/0.98 larger than our

    reported ATE. Because the opt-out rate is so low, we do not make any more of this distinction in

    our analysis. However, when calculating cost e¤ectiveness, we make sure to include costs only for

    letters actually sent, not letters that would have been sent to households that opted out or moved.

    2.4 Data for Short-Run Analysis

    In Sites 1 and 3, electricity meters are read each month by utility sta¤, who record the total

    consumption over the billing period. By contrast, Site 2 has Advanced Metering Infrastructure

    (AMI), which records daily electricity consumption. In the "short-run analysis," we exploit these

    daily energy use data to test for high frequency variation in the average treatment e¤ects (ATEs).

    Table 2 presents Site 2s daily electricity use data. We separate the households into three

    di¤erent groups based on the frequency of the intervention. The rst two are the monthly and

    quarterly groups from the initial experimental population of about 79 thousand households which

    began in October 2008 and is discussed above. The third group is a "second wave" of about

    44 thousand households from a nearby suburb that were added in February 2011. Half of these

    households were assigned to bimonthly treatment and half to control. Within each of the three

    frequency groups, all households were scheduled to receive reports on the same days. For this part

    of the analysis, we exclude the dropped group households in the monthly and quarterly groups

    after their reports are discontinued. This reduces the sample size somewhat after September 2010

    but does not generate imbalance because the households were randomly selected.

    While pre-treatment usage is balanced between treatment and control for the monthly and

    quarterly groups, this is not the case for the bimonthly group that begins in February 2011: the

    treatment groups average pre-treatment usage is lower than its control group by 0.69 kWh/day,

    11

  • with a robust standard error of 0.20 kWh/day. The reason is that the partner utility asked Opower

    to allocate these households to treatment and control based on odd vs. even street address numbers.

    Thus, it is especially important that we control for baseline usage when analyzing this group of

    households. After controlling appropriately, we have found no evidence that the imbalance biases

    our results. We include this group to provide additional supporting evidence, although readers may

    feel free to focus on the results from the monthly and quarterly groups.

    In order to analyze how daily average treatment e¤ects respond to the receipt of home energy

    reports, we must predict when the reports actually arrive. In this experiment, all reports delivered

    in a given month for any of the three frequency groups are generated and mailed at the same time.

    Opowers computer systems generate the reports between Tuesday and Thursday of the rst or

    second week of the month. The computer le of reports for all households in each utility is sent

    to a printing company in Ohio, which prints and mails them on the Tuesday or Wednesday of the

    following week. According to the U.S. Postal Service "Modern Service Standards," the monthly

    and quarterly groups are in a location where expected transit time is eight USPS "business days,"

    which include Saturdays but not Sundays or holidays. The bimonthly group is in a nearby suburb

    where the expected transit time is nine business days. Of course, reports may arrive before or after

    the predicted day, and people may not open the letters immediately.

    3 Short-Run Analysis

    3.1 Graphical

    We begin by plotting the average treatment e¤ects for each day of the rst year of the Site 2

    experiment for the quarterly groups, using a seven-day moving window to smooth over idiosyncratic

    variation. These ATEs are calculated simply by regressing Yit, household is electricity use on day

    t, on treatment indicator Ti, for all days t within a seven-day window around day d. We control

    for a vector of three baseline usage variables Ybi : average baseline usage (January-December 2007),

    average summer baseline usage (June-September 2007), and average winter baseline usage (January-

    March and December 2007). We also include a set of day-specic constants �t. For each day d, the

    regression is:

    12

  • Yit = �dTi + �Y

    bi + �t + "it; 8t 2 [d� 3; d+ 3] (1)

    Figure 2 plots the quarterly groups ATEs �d with 90 percent condence intervals, while Ap-

    pendix Figure A2 plots both the monthly and quarterly groups. In this regression and all others in

    the paper, standard errors are robust and clustered at the household level to control for arbitrary

    serial correlation in "it, per Bertrand, Duo, and Mullainathan (2004). Here and everywhere else

    in the paper, superscripts always index time periods; we never use exponents.

    Figure 2 shows that the �d coe¢ cients increase rapidly around October 24th, 2008, the date

    when the rst report is predicted to arrive. Two weeks before the predicted arrival date, the

    quarterly groups ATE is -0.03 kWh/day, with a 90 percent condence interval of (-0.17,0.11).

    By November 3rd, 10 days after the predicted arrival date, the ATE is -0.45 kWh/day, with a

    condence interval of (-0.31, -0.60). This ATE is about 1.5 percent of average electricity use, which

    from Table 1 was about 30 kWh/day. Tangibly, this is equivalent to each treatment group household

    turning o¤ seven standard 60-watt lightbulbs for an hour, every day. Between early November and

    early January, however, the treatment e¤ect weakens by about 0.2 kWh/day. About half of the

    conservation e¤orts that were taken initially were abandoned within two months.

    The quarterly groups second report arrives on approximately January 23rd, 2009. In the 14 days

    between January 19th and February 2nd, the point estimates of the quarterly groups treatment

    e¤ect increase in absolute value from -0.25 to -0.58. These e¤ects similarly decay away until mid-

    April, when the third report arrives. Around this and the fourth report, the e¤ects similarly jump

    and decay, although these cyclical responses appear to become less pronounced.

    This initial presentation of raw data makes clear the basic trends in householdsresponses to

    the treatment. However, the standard errors are wide, and the point estimates uctuate quite a

    bit on top of this basic potential pattern of jumps and decays. Furthermore, holidays and weather

    are likely to inuence the treatment e¤ects. Collapsing across multiple report arrivals help to

    address this, by reducing standard errors and smoothing over idiosyncratic variation. In addition,

    controlling for weather can both increase e¢ ciency and, if weather is correlated with report arrivals,

    remove bias when estimating trends in e¤ects associated with report arrivals. For example, if the

    13

  • second or third report happened to arrive in an extremely cold week, the treatment e¤ects would

    likely have been stronger in that week even if the report had arrived a week later. Failing to control

    for weather could cause us to falsely attribute such treatment e¤ect uctuations as immediate

    responses to report arrivals.

    We therefore control for weather and estimate the treatment e¤ects in "event time," meaning

    days before and after predicted report arrival. We include a vector of indicators �t for the periods

    around each individual report arrival. The interaction of �t with T controls for the fact that the

    treatment e¤ect could be weaker or stronger over the entire window around each particular report,

    due to seasonality or other factors. We index the 48-hour periods before and after report arrival by

    a and dene an indicator variable Aat that takes value 1 if day t is a 48-hour periods after a report

    arrival date. Denote �a as the ATE for each period a. The omitted period is the day of and the

    day after predicted report arrival, so the �a coe¢ cients represent the di¤erence between the ATE

    for period a and the ATE for those two days. The vectorMit includes heating degrees and cooling

    degrees on day t at the weather station closest to household i. The event time regression is:

    Yit = �tTi +Xa

    �aAat Ti + �1TiMit + �2Mit + �Ybi + �t + "it (2)

    Figure 3a plots the �a coe¢ cients and 90 percent condence intervals for the monthly, quarterly,

    and bimonthly groups using data around each groups rst four reports. Within each group, the

    e¤ects follow remarkably similar patterns in event time. The treatment group decreases consump-

    tion by about 0.2 kWh/day in the several days around the predicted arrival date. Some reports

    arrive and are opened before the predicted date, which causes consumption to decrease before

    a = 0. The absolute value of the treatment e¤ect reaches its maximum eight to ten days after the

    report arrives. After that point, the treatment e¤ect decays, as the treatment groups conservation

    e¤orts diminish. This decay is di¢ cult to observe for the monthly group, because before much

    decay happens, the next report arrives, causing event time to re-start. For the quarterly group, the

    treatment e¤ect decays by 0.2 kWh/day between 10 days and 80 days after the report arrival.

    Figure 3b is analogous to Figure 3a, except that it uses data for all reports beginning with the

    fth report. The treatment e¤ects are very close to constant in event time. Coe¢ cients for the

    14

  • bimonthly group are very imprecisely estimated because there are only six reports delivered to this

    group, meaning that this graph is estimated o¤ of the event windows around only two reports. The

    treatment group does appear to decrease consumption just after the arrival of these fth and sixth

    reports.

    3.2 Empirical Strategy

    We now estimate formally the "action and backsliding" patterns suggested in the gures. First,

    we estimate the magnitude of the increases in the absolute value of the treatment e¤ect around

    the report arrival window. Second, we estimate the rate of decay in the treatment e¤ect between

    reports.

    Dene S0t as an indicator variable for the "arrival period": the seven days beginning three days

    before and ending three days after the predicted arrival date. S1t is an indicator for the seven day

    period after that, which we call the "post-arrival period," and S�1t is an indicator for the seven-day

    period before. Dene Sat = S�1t +S

    0t +S

    1t as an indicator for all 21 days in that window. As above,

    �t is a vector of indicators for the window around each individual report,Mit captures heating and

    cooling degrees near household i on day t, Ybi is the three seasonal baseline usage controls, and �t

    are day-specic dummies. The regression is:

    Yit =��tS

    at + �

    0S0t + �1S1t + �

    �� Ti + �1TiMit + �2Mit + �Ybi + �t + "it (3)

    The coe¢ cient �1 reects the change in the treatment e¤ect in period S1 relative to period S�1,

    while � captures the treatment e¤ect for days not included in the 21-day windows where Sat = 1.

    To estimate the decay rate, we dene an indicator variable Swt to take value 1 if day t is in a

    window beginning eight days after a predicted arrival date and ending four days before the earliest

    arrival of a subsequent report. The variable dt is an integer reecting the number of days past the

    beginning of that period, divided by 365. For example, for a t that is 18 days after a predicted

    arrival date, d takes value (18-8)/365. Thus, the coe¢ cient on dt, denoted �, measures the decay

    of the treatment e¤ect over the window Sw in units of kWh/day per year. The regression is:

    15

  • Yit = (�tSwt + �dtS

    wt + �) � Ti + �1TiMit + �2Mit + �Ybi + �t + "it (4)

    In this model, treatment e¤ects decay linearly over time. One might hypothesize that the decay

    process could be convex or concave, and it is almost certainly unrealistic to extrapolate beyond the

    time when the predicted treatment e¤ect reaches zero. However, the time between reports is not

    long enough for the e¤ects to return to zero or to precisely estimate non-linearities. We therefore

    use the linear model for simplicity.

    3.3 Results

    Analogously to Figures 3a and 3b, we run the above regressions separately for the initial set of

    four reports and all reports after that initial set. Table 3a presents the estimates of Equation (3)

    for the windows around the rst four reports. There are three pairs of columns, for the monthly,

    quarterly, and bimonthly groups. Within each pair, the regression on the right includes the degree-

    day controls �1TiMit and �2Mit from the weather station nearest to household i, while the one on

    the left does not. In all cases where they are statistically signicant, the b�2 coe¢ cients on heatingand cooling degrees are positive, as less mild weather causes more electricity use, while the b�1coe¢ cients are negative, as more counterfactual electricity use is associated with stronger (more

    negative) treatment e¤ects. These weather controls would have even more explanatory power if the

    day-specic dummies �t and the report-specic treatment e¤ects �t were excluded.

    Across the six regressions, the coe¢ cient �1 on the TS1 interaction ranges from -0.129 to -0.201

    kWh/day. This means that between the week before the seven-day arrival windows and the week

    after those windows, the average household that receives a letter reduces consumption relative to

    counterfactual by the equivalent of about three 60-watt lightbulbs for one hour. The coe¢ cients are

    very robust to the weather controls, implying that the treatment groups immediate conservation

    is caused by the report arrivals, not by the weather simultaneously becoming less mild.

    These results are especially remarkable in that they highlight how much of consumersresponses

    to the intervention happen almost immediately after receiving reports. Consider rst the monthly

    group. Multiplying the incremental post-arrival period e¤ect b�1 by four gives a total decrease of16

  • 0.72 kWh/day - the equivalent of turning o¤ a standard 60-watt lightbulb for an additional 12

    hours. This means that if the interventions only e¤ect were to generate immediate action in the

    post-arrival period, and if that immediate action were sustained over time, the treatment e¤ect

    after the rst four reports would be -0.72 kWh/day. However, the average treatment e¤ect just

    before the fth report (the monthly groups b�d estimated by Equation (1) for February 13, 2009)is -0.52. The reason for this discrepancy is that the treatment groups immediate action is not

    sustained over time - the e¤ects decay in the intervening days between the seven-day post arrival

    period S1 and the arrival of the next report.

    The story is the same for the quarterly group. Multiplying b�1 by four gives a total decreaseof 0.8 kWh/day. Thus, if these immediate actions were sustained, the treatment e¤ect after the

    rst four reports would be -0.8 kWh/day. In contrast, the ATE just before the fth report is -0.35.

    This discrepancy is larger than for the monthly group because the quarterly group has three times

    longer to backslide on its immediate actions.

    Table 4a formally measures this backsliding using Equation (4). The estimates of � vary across

    the three groups, but the coe¢ cients are positive in all regressions and statistically positive in all

    but one. The quarterly and bimonthly estimates are more highly robust to weather controls. For

    the monthly group, the point estimates di¤er somewhat when weather is included. This is because

    relative to the quarterly and bimonthly groups, the monthly group has shorter event windows Sw

    that can be used to estimate b�, and because the sample period is limited to the rst four reports,there are fewer days that can be used to estimate the weather controls b�. The monthly estimatesfor the rst four reports in both Tables 3a and 4a do not include cooling degrees because all days

    in this period between October 2008 and February 2009 have zero cooling degrees.

    To put the magnitudes of � in context, focus on the estimates for the quarterly group, controlling

    for weather. A b� of 0.7 means that a treatment e¤ect of -0.7 kWh/day would decay to zero in oneyear, if the linear decay continued to hold. Thus, the jump in treatment e¤ects of b�1 =-0.2 fromcolumn (4) of Table 3a would decay away fully within just over three months. This never happens,

    because the next report arrives less than three months after the window Sw begins.

    Tables 3b and 4b replicate Tables 3a and 4b for the remainder of the samples beginning with

    the fth report. Table 3b shows that in the monthly and quarterly groups from the rst wave, the

    17

  • �1 coe¢ cients are still statistically signicant, but they are only about one-fth the magnitude of

    b�1 for the initial four reports. The coe¢ cient for the bimonthly group, however, is relatively large.Because this is estimated o¤ of only the fth and sixth reports, it is di¢ cult to infer much of a

    pattern. It could be that there are unobserved moderators of the treatment e¤ects that coincide

    with these reports, or that the information included in these particular reports was di¤erent in a

    particularly compelling way. The standard errors are wider, and the estimates are not statistically

    di¤erent from the estimates from the rst four reports.

    Table 4b shows that there is no statistically signicant decay of the e¤ects after the rst four

    reports. Interestingly, all of the point estimates of � are positive, suggesting that there may still

    be some decay, but the event windows are not long enough for precise estimates. This highlights

    the importance of the next section, in which we exploit the discontinuation of reports to estimate

    a decay rate over a much longer period: two years instead of two to ten weeks.

    Appendix Tables A1a-b and A2a-b replicate Tables 3a-b and 4a-b with two sets of additional

    robustness checks. The left column of each pair includes more exible functions of weather in place

    of the originalMit.5 The right column of each pair excludes outliers: all observations of Yit greater

    than 300 kWh/day and all households i with average baseline usage greater than 150 kWh/day,

    which is ve times the mean. Based on our inspection of the data, these high-usage observations

    appear to be correct, not measurement errors. However, they implicitly receive signicant weight in

    the OLS estimation, so a small number of high-usage households could in theory drive the results.

    Neither of these alternative specications meaningfully changes the results, and the exact point

    estimates change only slightly.

    All households in all treatment groups receive reports around the same day of the month, typi-

    cally between the 19th and the 25th. One might worry that our results could somehow be spuriously

    driven by underlying monthly patterns in the treatment e¤ect. Of course, these underlying patterns

    would have to take a very specic form: they would need to generate cycles in treatment e¤ects that

    begin in October 2008 and eventually attenuate for the monthly and quarterly groups, then appear

    beginning in February 2011 for second wave households but do not re-appear for the monthly and

    5This function was based on inspection of the relationship between ATEs and degree days for this site. Therelationship is close to linear in heating and cooling degree days, as assumed for the primary regressions, but notexactly. For this robustness check, Mit contains six variables: 1(CDDit) > 0, CDDit, 1(0 < HDDit � 5), 1(5 <HDDit � 35), HDDit � 1(5 < HDDit � 35), and 1(HDDit > 35).

    18

  • quarterly groups. We can explicitly test for spurious monthly patterns by exploiting the di¤erences

    in report frequencies to generate placebo report arrivals. We focus on the monthly vs. quarterly

    frequencies, because they are randomly assigned, and consider only the period after the rst four

    reports, because before that, the quarterly ATE changes signicantly in the time between reports.

    If there were spurious day-of-month e¤ects, the quarterly groups treatment e¤ects would jump in

    absolute value at the times when the monthly group receives reports but the quarterly group does

    not. Appendix Table A3 shows that the b�0 and b�1 coe¢ cients for these placebo report arrival datesare statistically zero and economically small relative to those estimated in Tables 3a and 3b.

    4 Long-Run Analysis

    In the long-run analysis, we ask two questions. First, do treatment e¤ects grow as the interven-

    tion continues, or do people eventually attenuate? Second, do treatment e¤ects persist after the

    intervention is discontinued?

    4.1 Graphical

    For the long-run analysis, we analyze the household-by-month billing data at each of the three sites.

    We do not study the bimonthly treatment group from Site 2, because this "second wave" began

    much later, so long-run outcomes are not yet available.

    We rst plot the time path of the average treatment e¤ects over the sample for both the

    continued and dropped groups, for each of the three sites. For this graph, we combine the monthly

    and quarterly groups. Analogously to the short-run graphical analysis, we use a three month moving

    window to smooth over idiosyncratic variation. The variables Di and Ei are indicator variables

    for whether household i was assigned to the dropped group and the continued group, respectively.

    Both variables take value 0 if the household was assigned to the control group, meaning that

    Di + Ei = Ti. The sets of coe¢ cients �Dn and �En are the average treatment e¤ects for the three-

    month window around month n for the dropped and continued groups, respectively. The variable

    m indexes calendar months. We include month-by-year controls for baseline usage, denoted �mY bim,

    19

  • where Y bim is household is average usage in the same calendar month during the baseline period.

    The �m are month-by-year intercepts.

    For each month n, the regression is:

    Yim = �DnDi + �

    EnEi + �tY

    bim + �m + "im; 8m 2 [n� 1; n+ 1] (5)

    In this regression, standard errors are clustered over time at the level of randomization, per

    Bertrand, Duo, and Mullainathan (2004). We cluster by household in Sites 1 and 2, and by block

    batch in Site 3.

    Figures 4a-4c present the results for Sites 1-3. The y-axis is the treatment e¤ect, which is

    negative because the treatment causes a reduction in energy use. The three gures all illustrate

    the same basic story. To the left of the rst vertical line, the intervention has not yet started, and

    the treatment e¤ects are statistically zero. The e¤ects grow fairly rapidly over the interventions

    rst year, after which the growth rate slows. Until the second vertical line, both the continued and

    dropped groups receive the same treatment, and the e¤ects for the two groups are indistinguishable,

    as would be expected due to random assignment. The average treatment e¤ects in the second year

    range from 0.7 to 1.0 kWh/day, or about three percent of total consumption. After the dropped

    groups last report, the e¤ects begin to decay relative to what they had been during the intervention,

    but the e¤ects are remarkably persistent. The dropped groups ATEs seem to diminish by about

    0.1 to 0.15 kWh/day each year.

    The e¤ects are highly seasonal. In all three sites, e¤ects are stronger in the winter compared

    to the adjacent fall and spring. Although the great majority of households in the populations

    primarily use natural gas instead of electricity for heat, the fans for natural gas heating systems

    use electricity, and many homes also have portable electric heaters. In Sites 1 and 3, the e¤ects are

    also stronger in the summer compared to the fall and the spring. This suggests that an important

    way in which people respond to the treatment is to reduce heating and cooling energy, either

    through reducing utilization or perhaps changing to more energy e¢ cient physical capital stock. In

    Site 2, the average daily temperature in July is a mild 67 degrees, so air conditioner use is more

    limited, and the treatment e¤ects are relatively weak in the summer. In Site 3, the monthly point

    20

  • estimates jump around more because of the block batch-level randomization, but they do not move

    more than we would expect given the condence intervals and underlying seasonality.

    The graphs also illustrate that the continued groups do not attenuate to treatment, even after

    four years. In Sites 2 and 3, continued group households continue to receive reports at their original

    assigned monthly or quarterly frequencies, and their e¤ects strengthen throughout the sample.

    In Site 1, however, the continued groups e¤ects begin to diminish in 2011. This dimunition is

    associated with the reduction in frequency to biannual reports from [tk] the monthly and quarterly

    mix of the rst two years.

    In Sites 1 and 2, households were randomly assigned to monthly and quarterly frequencies. Does

    the monthly group attenuate more quickly to the intervention? Does the more frequent contact

    induce more capital stock formation and thus more persistence? Figure 5a presents the ATEs over

    time for the monthly and quarterly groups in Site 2, with standard errors omitted for legibility.

    The solid line is the monthly group, while the dot-dashed line is the quarterly group. As treatment

    ends for the dropped group, a second line splits o¤ for each frequency, representing the ATEs for

    the subset of households that were dropped.

    Remarkably, even after more than four years of home energy reports, the monthly continued

    groups ATE continues to strengthen. Households receiving quarterly reports also continue to

    respond more intensely as long as treatment is continued. By January 2013, the ATEs for the

    continued monthly and quarterly groups are -1.22 and -1.08 kWh/day. The fact that the monthly

    ATE is much less than three times the quarterly ATE means that there are sharply diminishing

    marginal returns to treatment intensity in this range. The point estimates suggest that the two

    e¤ects are gradually converging, suggesting that the marginal returns might actually be close to

    zero.

    During the rst two years after treatment is discontinued, the dropped groups for each frequency

    have their e¤ects decay at similar proportions to the initial treatment e¤ect. Although the standard

    errors on these point estimates are wide, the dropped monthly and quarterly group ATEs also may

    be converging. This again suggests sharply diminishing marginal returns to treatment intensity.

    Figure 5b replicates the same analysis for Site 1. Here the results are even more stark: throughout

    the sample, households assigned to monthly vs. quarterly treatment frequencies have statistically

    21

  • indistinguishable e¤ects.

    4.2 Empirical Strategy

    For the long-run analysis, we break the samples into four periods. Period 0 is the pre-treatment

    period, period 1 is the rst year of treatment, and period 2 runs from the beginning of the second

    year to the time when treatment is discontinued for the dropped group. Period 3 is the post-drop

    period: the remainder of the sample after the dropped group is discontinued. We denote P pm as

    indicator variables for whether month m is in period p. The variable rm measures the time (in

    years) since the beginning of period 3. Analogous to the short-run analysis, Mim represents two

    weather controls: average heating degrees and average cooling degrees for household i in month m.

    The primary estimating equation is:

    Yim = (�0P 0m + �

    1P 1m + �2P 2m) � Ti (6)

    + DiP3m + �EiP

    3m

    + �LRrmDiP3m + �rmEiP

    3m (7)

    + 1Mim(P2m + P

    3m) � Ti + 2Mim(P 2m + P 3m)

    + �mYbim + �m + "im

    In the rst line of this equation, the coe¢ cients �0, �1, and �2 are ATEs for the treatment

    groups (the continued and dropped groups combined) for the pre-treatment period and the two joint

    treatment periods. The second and third lines parameterize the treatment e¤ects for the continued

    and dropped groups in the post-drop period. The coe¢ cient �LR captures the treatment e¤ect

    decay rate for the dropped group, while � measures the change in the continued group treatment

    e¤ect. Because rm has units in years, the units on �LR and � are kWh/day per year. Because rm

    takes value zero at the beginning of the post-drop period, and � reect tted treatment e¤ects

    for the dropped and continued groups, respectively.

    The fourth line includes controls for the interaction of�P 2m + P

    3m

    �� Ti with heating and cooling

    22

  • degrees Mim. This means that the �2, , and � coe¢ cients reect tted treatment e¤ects for a

    month when the mean temperature each day is 65 degrees. Similarly, the �LR and � coe¢ cients

    reect changes in e¤ects after controlling for weather-related uctuations. Controlling for weather

    is important because if temperatures were more (less) mild later in the post-drop period, this

    would likely make the treatment e¤ects weaker (stronger), which would otherwise load onto �LR

    and �. In other words, when we estimate �LR and � unconditionally, we measure all causes of

    persistence, while conditioning on changes in the economic environment such as weather, we more

    closely capture changes in underlying consumer behaviors.

    4.3 Statistical Results

    Table 6 begins by simply estimating the average treatment e¤ects for each period. The estimating

    equation includes only the rst, second, and fth lines of Equation (6), omitting the growth rate

    parameters and weather controls. The estimates closely map to the illustrations in Figures 4a-4c.

    Treatment e¤ects are statistically zero in the pre-treatment period, rising to 0.45 to 0.64 kWh/day

    in the rst year and 0.66 to 0.87 kWh/day for the remainder of the joint treatment period. In

    tangible terms, a treatment e¤ect of -0.66 kWh/day means that the average treatment group

    household took actions equivalent to turning o¤ a standard 60-watt lightbulb for about 11 hours

    each day. Recalling that average usage is around 30 kWh/day, this corresponds to two percent of

    electricity use.

    Table 5 demonstrates basic evidence of persistence: in all three sites, treatment e¤ects for the

    dropped group are at least somewhat weaker in the post-drop period compared to the second year.

    This also provides basic evidence that people do not attenuate to the intervention: treatment e¤ects

    for the continued group are at least somewhat stronger in the post-drop period than before.

    Table 6 presents estimates of the �LR and � parameters. For each site, the left column presents

    estimates without conditioning on weather, corresponding to Equation (6) with the fourth line

    omitted. The right column presents estimates conditional on weather, corresponding exactly to

    Equation (6). The �LR and � coe¢ cients are the bottom two coe¢ cients in each column. As in

    the short-run analysis, the coe¢ cients are are highly insensitive to the weather controls.

    In Sites 2 and 3, people do not attenuate to the treatment during the post-drop period: the

    23

  • negative � coe¢ cient means that the treatment e¤ect is getting stronger (more negative) as time

    moves into the future. The � parameter is not quite statistically signicant for Site 2 (p=0.127),

    although the point estimate is negative and Figure 4b seems to illustrate that the continued groups

    treatment e¤ect becomes more negative through 2011, 2012, and 2013. In Site 3, b� =-0.08 impliesthat every year, the average continued group household does the equivalent of turning o¤ one

    additional 60-watt lightbulb for another 80 minutes. In this site, the treatment e¤ect is becoming

    about 10 percent stronger each year. This result is quite remarkable: even as the quarterly group

    receives their 20th home energy report, and the monthly group receives their 60th, consumers have

    not attenuated to the intervention.

    In Site 1, the � coe¢ cient is positive. This matches the graphical result in Figure 4a. As we

    have discussed, this is associated with the decrease in treatment intensity. Of course, a decrease

    in e¢ cacy does not necessary imply a decrease in cost e¤ectiveness. In Section 6, we analyze this

    issue in more detail.

    The �LR parameters are our primary measure of persistence. They range from 0.09 kWh/day

    per year in Site 3 to 0.18 kWh/day per year in Site 1. If the linear trend continues, the e¤ects would

    not return to zero until ve to ten years after treatment was discontinued. Based on Figures 4a to

    4c, we hypothesize that the decay rate will tail o¤ over time, meaning that the e¤ects will be more

    persistent than the linear decay model predicts, but not enough time has passed to meaningfully

    test this formally. To the extent that the linear model understates persistence, our cost e¤ectiveness

    projections in the next section are conservative.

    One of the basic questions we asked in the introduction was whether longer interventions led to

    more persistent post-intervention e¤ects. We can answer this for Site 2 by comparing the estimated

    long run decay rate b�LR to the decay rate between each of the rst four reports, the b� estimated inthe previous section. In ve of the six specications, this b� was between 0.7 and 1.4 kWh/day peryear, which is six to 12 times faster than b�LR. This implies that between the rst four reports andthe time when treatment is discontinued, the dropped group forms some kind of "capital stock"

    which causes substantially more persistence. In the next two sections, we discuss the potential

    causes and consequences of this process.6

    6We note that the long-run persistence is measured from one to four years after the period when the short rundecay rate is measured. It is possible that changes in macroeconomic conditions or other time-varying factors might

    24

  • In Figures 5a and 5b, we saw that the monthly frequency generates at best only slightly larger

    e¤ects than quarterly, but it was not clear whether the e¤ects decay at di¤erent rates. Relatedly,

    Allcott (2011) documents that the program causes more conservation by heavier baseline users.

    Appendix Table A4 formally conrms these results for Sites 1 and 2, but the standard errors are

    too large to infer much about whether e¤ects decay proportionally faster or slower for the di¤erent

    frequency groups or for heavier baseline users.

    The online appendix also presents several robustness checks. Appendix Table A5 estimates

    the di¤erences in ATEs between continued and dropped groups for all periods. In all three sites,

    the e¤ects do not di¤er statistically in the pre-treatment and joint treatment periods, and the

    point estimates are almost exactly the same. Tables A6 and A7 replicate Tables 5 and 6, except

    excluding all data for households that move at any point. These are important because by the end

    of the sample, 20 to 26 percent of households have moved. Even though this is balanced between

    treatment and control, if the movers had systematically di¤erent treatment e¤ects, this could cause

    the estimated treatment e¤ects to change over time. This would in turn bias our inference about

    persistence. However, the coe¢ cients are statistically and economically the same when limiting to

    this balanced panel.

    5 Mechanisms

    Concretely, what actions underlie the observed e¤ects? In many behavioral interventions, this

    question can be di¢ cult to answer. For example, if a program incentivizes people to lose weight,

    it may not be clear how much of the observed weight loss comes from exercise vs. reduced calorie

    intake, and within these categories, what form of exercise is the most useful and what foods have

    been cut out of their diets. In our setting, we can use utility program participation data and

    householdsself-reported actions to shed some light on two questions. First, what do treatment

    group households report that they are doing di¤erently than control group households? Second,

    how much of the e¤ect comes from large observed changes to physical capital stock?

    cause di¤erences in these decay rates. Ultimately, however, we are not very concerned with this issue.

    25

  • 5.1 Surveys of Self-Reported Actions

    In the past two years, Opower has surveyed about six thousand people in treatment and control

    groups in six sites nationwide, including 800 people in Site 2. The surveys are conducted via

    telephone on behalf of the utility, and for practical reasons no e¤ort is made to obscure the fact

    that the surveys are about energy use and conservation behaviors. Completion rates are typically

    between 15 and 25 percent of attempts. Because these are self-reported actions and the majority

    of people refuse, the results should be interpreted with caution.

    In these surveys, respondents are rst asked if they have taken any steps to reduce energy use in

    the past 12 months. Those who report that they have are asked whether or not they have taken a

    series of specic actions in the past twelve months. The particular actions vary by survey, but many

    of the same actions are queried in multiple sites. We group actions into three categories: repeated

    actions such as switching o¤ power strips and turning computers o¤ at night, physical capital

    changes such as purchasing Energy Star appliances, and intermittent actions such as replacing air

    lters on air conditioning or heating systems.

    Table 7 presents the survey data. Column (1) presents the share of all respondents that report

    taking the action in the past 12 months. Column (2) presents the di¤erence between the shares of

    respondents in treatment vs. control that report taking the action. As a robustness check, column

    (3) presents the di¤erence in shares in treatment vs. control after controlling for ve respondent

    characteristics: gender, age, whether homeowner or renter, education, and annual income.

    There is little di¤erence in self-reported actions across treatment and control groups. The only

    di¤erence that is consistent across di¤erent surveys is that treatment group households are more

    likely to report having a home energy audit. Audits often include installation of new compact

    uorescent lightbulbs, which typically last several years until they must be replaced, and also are

    typically required before moving forward with larger investments such as weather sealing or new

    insulation. Data from one utility in a relatively warm climate show that their treatment group is

    more likely to report using fans to keep cool instead of running air conditioners. Across all sites, the

    treatment group is more likely to report participating in utility energy e¢ ciency programs, although

    the di¤erence is not statistically signicant in the specication that conditions on observables.

    For each of the three categories, the rst row (in bold) presents a test of whether the average

    26

  • probability of taking actions in each category di¤ers between treatment and control. When ag-

    gregating in this way across actions and across sites, our standard errors are small enough to rule

    out with 90 percent condence that the intervention increases the probability of reporting energy

    conservation actions by more than one to two percent. Throughout Table 8, this lack of statisti-

    cal signicance would only be further reinforced by adjusting the p-values for multiple hypothesis

    testing.

    There are multiple interpretations of these results. First, the true probabilities of taking actions

    might di¤er between treatment and control, but the surveys might not pick this up due to demand

    e¤ects, respondentsgeneral tendency to systematically over-report, biased non-response, and the

    fact that di¤erent respondents might interpret questions di¤erently. Second, the treatment could

    cause small changes in the true probability of taking a wide variety of actions. Such changes could

    potentially add up to the observed e¤ects on electricity use even though no one action accounts

    for much on its own. Third, it is possible that the intervention does not change whether or not

    people take specic actions to conserve energy, which is what Table 7 reports, but instead changes

    the intensity with which some people take actions they were already taking. In other words, an

    important impact of the intervention could be not to give information about new ways to conserve,

    but instead to increase attention and motivation to conserve in more of the same ways.

    5.2 Utility Energy E¢ ciency Program Participation

    Perhaps the only area where the survey data suggest di¤erences between treatment and control is

    participation in utility-sponsored energy e¢ ciency programs. Like many utilities across the country,

    the utilities we study run series of other energy conservation programs that subsidize or directly

    install energy e¢ cient capital stock. The utilities maintain household-level program participation

    data, which are primarily used to estimate the total amount of energy conserved through each

    program. These household-level data are also useful in estimating whether the Opower intervention

    a¤ects energy use through an increase in program participation. For this study, we examine the

    administrative data from Sites 2 and 3.

    In Site 3, the utility o¤ers rebates or low-interest loans in three categories: appliances, "home

    improvement," and HVAC (heating, ventilation, and air conditioning). For appliances, the utility

    27

  • mails $50 to $75 rebate checks to consumers who purchase energy e¢ cient clothes washers, dish-

    washers, or refrigerators. To claim the rebate, the homeowner needs to ll out a one-page rebate

    form and mail it with the purchase receipt and a current utility bill to the utility within 30 days of

    purchase. For "home improvement," the utility o¤ers up to $5,000 in rebates for households that

    install better insulation or otherwise retrot their homes in particular ways. For HVAC, the utility

    o¤ers $400 to $2000 for energy e¢ cient central air conditioning systems or heat pumps, or $50 for

    energy e¢ cient window air conditioners. Most home improvement and HVAC jobs are done by

    contractors. Some consumers probably buy energy e¢ cient appliances and window air conditioners

    without claiming the utility rebates, and thus these capital stock changes might be unobserved in

    the data. However, because the home improvement and HVAC rebates are larger, and because the

    contractors coordinate with the utility and facilitate the rebate process, consumers who undertake

    these large physical capital improvements are very likely to claim the rebates and thus be observed

    in the administrative data.

    The top panel of Table 8 presents Site 3s program participation statistics for the rst two years

    of the program, from April 2008 through June 2010. In total, 3855 households in the experimental

    population participated in one of the three programs, a relatively high participation rate compared

    to other utilities. In column (1), we list relatively optimistic estimates of the savings that might

    accrue for the average participant. Column (3) presents the di¤erence in participation rates between

    treatment and control, in units of percentage points ranging from 0 to 100. The results conrm

    the qualitative conclusions from the household surveys: the treatment group is 0.417 percent more

    likely to participate in energy conservation programs. Out of every 1000 households in control, 44

    participated, while about 48 out of 1000 treatment group households participated.

    How much of the treatment e¤ect does this explain? Table 5 shows that by the programs second

    year, it is causing an 865Watt-hour/day (0.865 kilowatt-hour/day) reduction in energy use. Column

    (4) multiplies the di¤erence in participation rate by the optimistic estimate of savings, showing that

    the di¤erence in program participation might cause energy use to decrease by 14 Watt-hours per

    day. Thus, while there are statistically signicant changes in program participation, this explains

    less than two percent of the treatment e¤ect.

    The short-run and long-run treatment e¤ects suggest a shift in the composition of responses

    28

  • over time toward capital stock changes. For this reason, it is also important to examine program

    participation data later after program inception. Therefore, we also examine similar administrative

    data for 2011 in Site 2. This utility o¤ers a similar set of programs as in Site 3, except that the

    exact rebate amounts may vary, and some rebate forms can be submitted online instead of in the

    mail. In these data, we observe more precisely the action that the consumer took, as well as the

    utilitys estimate of the electricity savings.

    The bottom panel of Table 8 presents these data. The most popular programs are clothes washer

    rebates, insulation, removal of old energy-ine¢ cient refrigerators and freezers, installation of low-

    ow showerheads, energy e¢ cient windows, and compact uorescent lightbulbs (CFLs). Savings

    are zero for insulation and windows because for regulatory purposes, the utililty deems that that

    these programs reduce natural gas use but not electricity.

    Columns (3) and (4) compare the takeup rates and implied electricity savings between the

    control group and the continued treatment group, which is still receiving home energy reports

    during 2011. There is a statistically signicant di¤erence for only one program: CFL replacement,

    which generates 2.25 Watt-hours/day incremental savings in the continued treatment group. Using

    the estimates in the bottom row, which combine the savings across all programs, the upper bound of

    the 90 percent condence interval on savings is about 6 Watt-hours/day. By contrast, the continued

    groups treatment e¤ect in the post-drop period was (negative) 870 Watt-hours/day (0.870 kilowatt-

    hours/day), which was an increment of 181 Watt-hours/day compared to the year before. Thus,

    as in Site 3, only a small fraction of the savings are due to participation in utility energy e¢ ciency

    programs.7

    5.3 Summary: The Dynamics of Household Responses

    Taken together, our analyses of short-run and long-run treatment e¤ects, along with information

    on program participation and self-reported actions, start to paint a picture of how consumers

    respond to the Opower intervention. As the initial reports arrive, some consumers are immediately7Several recent consulting reports and o¢ cial program evaluations have also examined the interventions e¤ect on

    utility program participation at these sites and others, including Integral Analytics (2012), KEMA (2012), OpinionDynamics (2012), and Perry and Woehleke (2013). Their ndings are very similar to ours: the Opower interventionsometimes causes increases in program participation, but this accounts for only a small fraction of the overall reductionin energy use.

    29

  • motivated to conserve. They take actions that are feasible within a short period of time, probably

    changing utilization choices by adjusting thermostats, turning o¤ lights, and unplugging unused

    electronics. However, behaviors soon begin to "backslide" toward their pre-intervention levels. In

    this intervention, subsequent reports arrive before behaviors fully return to their pre-intervention

    state, and these additional reports repeatedly induce at least some of the same households to

    conserve. This process of action and backsliding is a repeated version of the phenomena documented

    by Agarwal et al. (2011) and Haselhuhn et al. (2012), who show that consumers learn to avoid

    credit card and movie rental late fees after incurring a fee, but they act as if this learning depreciates

    over time.

    Several mechanisms might cause this repeated action and backsliding. Repeated learning and

    forgetting could generate this pattern, although it seems unlikely that people would forget this

    information at this rate. Alternatively, it could be that consumers learn that energy conservation

    actions are more di¢ cult or save less energy than they had thought. However, to generate repeated

    cycles in energy use, consumers must be experimenting with di¤erent energy conservation actions

    after receiving each new report, and it is not clear whether this is the case. A third mechanism is

    what we call malleable attention: reports immediately draw attention to energy conservation, but

    attention gradually returns to its baseline allocation.

    After the rst few reports, this repeated action and backsliding attenuates, and decay of treat-

    ment e¤ects can only be observed over a longer period after reports are discontinued for part of

    the treatment group. This means that in the intervening one to two years between the initial

    reports and the time when the reports are discontinued, consumers form some type of new "capital

    stock" that decays at a much slower rate. The program participation data shows that very little of

    this capital stock is large changes to physical capital such as insulation or home energy retrots.

    However, consumers may make a wide variety of smaller changes to physical capital stock, such as

    installing energy e¢ cient compact uorescent lightbulbs or window air conditioners.

    Much of this capital stock may also reect changes to consumers utilization habits, which

    Becker and Murphy (1988) call "consumption capital." This stock of past conservation behaviors

    lowers the future marginal cost of conservation, because the behavior has become automatic and

    can be carried out with little mental attention (Oullette and Wood 1998, Schneider and Shi¤rin

    30

  • 1977, Shi¤rin and Schneider 1977). However, just as in the Becker and Murphy (1988) model and

    most models of capital stock, consumption capital also decays. This story is consistent with the

    results of Charness and Gneezy (2009), who show that nancial incentives to exercise have some

    long-run e¤ects after the incentives are removed, suggesting that they induced people to form new

    habits of going to the gym.

    6 Cost E¤ectiveness and Program Design

    In this section, we assess the importance of persistence for cost e¤ectiveness and for program design.

    We use a simple measure of cost e¤ectiveness: the cost to produce and mail reports divided by the

    kilowatt-hours of electricity conserved. Although cost e¤ectiveness is a common metric by which

    interventions are assessed, we emphasize several of the reasons why this is not the same as a welfare

    evaluation. First, consumers might experience additional unobserved costs and benets from the

    intervention: they may spend money to buy more energy e¢ cient appliances or spend time turning

    o¤ the lights, and they might be more or less happy after learning how their energy use compares

    to their neighbors. Second, the treatment causes households to reduce natural gas use along with

    electricity; we have left this for a separate analysis. Third, this measure does not take into account

    the fact that electricity has di¤erent social costs depending on the time of day when it is consumed.

    Of course, this distinction between the observed outcome and welfare is not unique to this

    domain: with the exception of DellaVigna, Malmendier, and List (2012), most studies of weight

    loss, smoking, charitable contributions, and other behaviors are only able to estimate e¤ects on

    behaviors, not on welfare. In our setting, however, the focus on cost e¤ectiveness is still extremely

    relevant: rightly or wrongly, utilities are mandated to cause cost-e¤ective energy conservation, with

    little explicit assessment of the consumer welfare e¤ects.

    We assume that the cost per report is $1 and that there are no xed costs of program im-

    plementation. In our rst analysis below, we study the "extensive margin" policy decision: how

    di¤erent assumptions about persistence can a¤ect cost e¤ectiveness, which can in turn a¤ect a

    policymakers decision of whether or not to adopt an intervention. In our second analysis, we show

    how the dynamics of persistence can inform "intensive margin" policy decisions: how to optimize

    31

  • program design to maximize cost e¤ectiveness.

    6.1 Persistence Matters for Cost E¤ectiveness

    When assessing the cost e¤ectiveness of Opower home energy reports and other "behavioral" en-

    ergy conservation programs, utilities have typically assumed zero persistence, either implicitly or

    explicitly. For example, when calculating cost e¤ectiveness for regulatory purposes, utility ana-

    lysts have typically tallied energy conserved and program costs incurred in a given year, without

    accounting for the fact that the reports delivered during that year also cause incremental energy

    conservation in future years. This is a sharp contrast to how utilities evaluate traditional energy

    conservation programs that directly replace physical capital such as air conditioners or lightbulbs.

    For these programs, utilities include all future savings from capital stock changes in a given year

    using estimated "measure lives." The reason for this di¤erence is that until now, there was sig-

    nicant concern that "behavioral" interventions like the home energy reports would not generate

    persistent savings. When evaluating interventions still in progress, academic studies such as Ayres,

    Raseman, and Shih (2012) and our own past work (Allcott 2011) have similarly calculated cost

    e¤ectiveness by considering only the costs accrued and energy savings up to a given date.

    How important are these assumptions? Table 9 presents electricity savings and cost e¤ectiveness

    for the two-year intervention in Site 2, under alternative assumptions about long-run persistence

    after treatment is discontinued. The table uses Section 4s empirical estimates for the "dropped

    group" and provides a stylized example of the calculations that a policymaker might go through in

    order to decide whether to adopt a similar two-year intervention. To keep the results transparent

    and avoid extrapolating out of sample, we assume no time discounting and limit the time horizon

    to the observed 4.5 years after treatment began.

    Scenario 1 in Table 9 assumes zero persistence: after the treatment stops, the e¤ects end

    immediately. This parallels the most common assumption. Scenario 2 makes the opposite extreme

    assumption: that the full e¤ect will persist over the remaining 30 months of the sample. Scenario

    3 uses the estimated actual persistence.

    The electricity savings estimates are simply the average treatment e¤ects for each period es-

    timated in column (2) of Table 5 multiplied by the length of each period. For example, the 405

    32

  • kWh of conservation during the rst two years of treatment is (b�1 =0.450 kWh/day)�365 days +(b�2 =0.659 kWh/day)�365 days. Under the full persistence assumption, post-treatment savingsequals the ATE from the second year of treatment (b�2 = 0:659 kWh/day) multiplied by the 910days in the observed post-treatment period. Post-treatment savings under observed persistence are

    (b = 0:584 kWh/day)�(910 days). Total electricity cost savings for the two-year intervention wouldsum to $6.76 million if applied to all households in the experimental population. Standard errors

    are calculated using the Delta method.

    Under the zero persistence assumption, dividing a total cost of $18 per household by the 405

    kWh of observed savings gives cost e¤ectiveness of 4.44 cents/kWh. By contrast, the observed

    post-treatment savings show that the true cost e¤ectiveness to date is 1.92 cents/kWh. This

    underscores the importance of these empirical results: the program is more than twice as e¤ective

    as policymakers had often previously assumed.

    Appendix Tables A9 and A10 re-create this table for Sites 1 and 3, respectively. These two

    sites have slightly larger treatment e¤ects and are thus slightly more cost e¤ective, but because the

    decay rates are similar across programs, the relative cost e¤ectiveness under the three assumptions

    is about the same: cost e¤ectiveness is about twice as good under observed persistence compared

    to the zero persistence assumption.

    Extrapolating further into the future would magnify the importance of long-run persistence

    assumptions, while time discounting would diminish their importance. Appendix Table A10 re-

    creates Table 9 under a ten-year time horizon, extrapolating future treatment e¤ects in Scenario 3

    using the estimated linear decay parameter b�LR through the point when the e¤ects are projectedto hit zero. Total savings are about 20 percent larger than in Scenario 3 of Table 9, and cost

    e¤ectiveness is a thus about 20 percent better.

    In places comparable to Site 2, assuming the observed persistence instead of zero persistence

    could impact program adoption decisions. Opower competes against other energy conservation

    programs, and cost e¤ectiveness is one of the most important metrics for comparison. There are

    some benchmark numbers available, although they are controversial (Allcott and Greenstone 2012).

    Using nationwide data, Arimura et al. (2011) estimate cost e¤ectiveness to be about 5.0 cents/kWh,

    assuming a discount rate of ve percent. A research and advocacy group c