19
1 Information Technology in the Property Market Yong Suk Lee a*1 , Yuya Sasaki b a Freeman Spogli Institute for International Studies, Stanford University b Department of Economics, Vanderbilt University Abstract Information technology is increasingly being utilized in the property market. This paper examines how sensitive house transaction prices are to online price estimates using data collected from Zillow. We find that online property price estimates strongly predict transaction prices even when observable and unobservable house and neighborhood characteristics are controlled for. In addition, we find evidence that suggests that online price estimates may have a direct impact on transaction prices. Keywords: information technology, online price estimates, house price JEL Codes: L86, R30 * Correspondence to: Freeman Spogli Institute for International Studies, Stanford University, Stanford, CA 94305. email address: [email protected].

Information Technology in the Property Marketyongslee/ITPM.pdfInformation Technology in the Property Market Yong Suk Leea*1, ... Zillow does not disclose the formula that it uses to

  • Upload
    lamphuc

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

   

1

Information Technology in the Property Market

Yong Suk Leea*1, Yuya Sasakib

a Freeman Spogli Institute for International Studies, Stanford University b Department of Economics, Vanderbilt University

Abstract Information technology is increasingly being utilized in the property market. This paper examines how sensitive house transaction prices are to online price estimates using data collected from Zillow. We find that online property price estimates strongly predict transaction prices even when observable and unobservable house and neighborhood characteristics are controlled for. In addition, we find evidence that suggests that online price estimates may have a direct impact on transaction prices.

Keywords: information technology, online price estimates, house price JEL Codes: L86, R30

                                                                                                                         *Correspondence to: Freeman Spogli Institute for International Studies, Stanford University, Stanford, CA 94305. email address: [email protected].

   

2

1. Introduction

With the advent of information technology, researchers have examined how information

technology affects various aspects of our economy, including the labor market, firm productivity,

intellectual property, and social networks to name a few (Bailey et al. 2016, Cardona et al. 2013,

Bresnahan et al. 2002, Bloom et al. 2012, Varian et al. 2004). The property market has also seen

a surge in the use of information technology, especially in the form of price information. Due to

the location specific nature and uniqueness of each property, obtaining information on real estate

had been relatively costlier than other types of assets or commodities. Now potential sellers and

buyers can easily examine property markets online. Not only can one examine the sales prices

for a large number of comparable properties, one can now easily find out the price estimates for a

specific property. In this paper, we examine how well online price estimates predict actual

transaction prices using data we collect from Zillow, a major real estate information provider in

the U.S.

In order to examine the impact of online house price information on transaction prices,

we develop a reduced-form pricing equation as the convex combination of the online price

estimate and the hedonic valuation of a property. The main challenge for estimation is

controlling for the unobserved house and neighborhood attributes in the model. We present a

method that proxies for unobserved house specific attributes using the difference between the

listing price and the online price estimate at the time of listing. Quasi-experimental research

designs, and in particular, boundary discontinuities (Black 1999, Bayer et al. 2005) have been

used to control for unobservable area specific attributes in the literature. On the other hand,

Bajari et al. (2012) propose a method that relies less on the research design but on the structural

assumption that prior house sales prices can be used to control for time-varying unobservable

attributes in a hedonic regression. Better data, such as, repeat-sales house transaction data has

also enabled researchers to deal with unobservable house and neighborhood attributes (Bayer et

al. 2013). This paper incorporates weak structural assumptions into a hedonic model to examine

the impact of online house information on transaction prices.

We collect home value estimates, list prices, sales prices, and house and neighborhood

attributes from Zillow.com for 1,200 houses across 30 Metropolitan Statistical Areas (MSAs) in

the US. We find that the elasticity of house sales prices with respect to Zillow estimates is large

   

3

and quite close to one. We also find that the change in Zillow estimates impacts how sales prices

adjust from the list prices a month before sales.

In addition to the large literature that examines the impact of information technology on

markets and productivity (Cardona et al. 2013, Martínez et al. 2010, Bresnahan et al. 2002,

Bloom et al. 2012, Varian et al. 2004), our paper is also related to the literature that examines

how information affects asset prices. The role of information in the equity market has been

widely studied. For example, Fang and Peress (2009) find that mass media reduces informational

frictions and affect security prices. Easley and O’Hara (1987, 2004) show that large trades in the

securities market reflect better information and impact security prices, and that investors demand

higher returns on stocks for which there is less public information. In the real estate market,

Levitt and Syversson (2008) show that informational advantage translates to higher house sales

prices. They find that properties owned by real estate brokers sell at about a 4 percent premium

relative to non-realtor owned comparable properties. Finally, this paper is related to the literature

that examines how information technology affects market activity. Bailey et al. (2016) examine

how online social networks affect people's perceptions on property investments, and ultimately

find that friend networks on Facebook significantly influence individual housing investment

decisions and aggregate housing prices. Also, mobile phones have been found to facilitate

economic development in developing countries by facilitating the exchange of information in the

labor and agriculture markets (Aker and Mbiti, 2010).

2. The Econometric Model and Estimation Strategy

In order to estimate how sensitive transaction prices are to online price estimates, we

consider a reduced-form pricing equation as the convex combination of the online price estimate

and the hedonic valuation of the property, that is,

(eq.1) 𝑌!" = 𝛽𝑍!" + (1− 𝛽)[𝛼 + 𝑋!"𝛾 + 𝛿! + 𝜇! + 𝜀!"],

where 𝑌!" is the sales price of home i in neighborhood n, 𝑍!" the Zillow estimate at time of sales,

𝑋!" the set of observable home characteristics, e.g., square footage, number of rooms, etc., 𝛿! the

neighborhood fixed effect, and 𝜇! the home specific fixed effect. We subsume time subscripts for

simplicity but include season fixed effects to capture seasonal trends in sales prices.

The expression in the square brackets in the second term, 𝛼 + 𝑋!"𝛾 + 𝛿! + 𝜇! + 𝜀!",

constitute the traditional hedonic pricing model. We add the first term 𝑍!" to reflect the potential

   

4

effects of Zillow's estimate 𝑍!" on transaction prices 𝑌!" . As such, the parameter 𝛽 may be

interpreted as the degree to which Zillow's estimate predicts sales price. The null hypothesis that

𝑍!" does not impact actual transaction prices, i.e., 𝛽=0, can be readily tested once a consistent

estimate of 𝛽 is obtained. The OLS estimators of the parameters 𝛼,  𝛽, and 𝛾 would be consistent

if (𝛿!, 𝜇!) were mean independent of both 𝑍!" and 𝑋!". However, this statistical independence

assumption is hard to justify at least for two reasons. First, the unobserved house-specific

amenities 𝜇! are likely to be correlated with the observed house-specific amenities 𝑋!". Second,

more importantly in our study, the introduction of 𝑍!" in the extended hedonic pricing model

causes another source of endogeneity. To see this, it may help to think of how the home price

information 𝑍!" is generated by online real estate information providers. Although these service

agencies do not disclose their formulas, the estimates 𝑍!" are constructed using recent transaction

data in the neighborhood n of house i. As such, the statistical independence between the online

price estimate and unobserved neighborhood characteristics will probably not hold even if we

control for the observed house specific amenities 𝑋!". We therefore propose approaches to

handle these two sources of endogeneity.

First, to control for the endogenous unobserved house-specific amenities 𝜇!, we use a

proxy variable. Specifically, we construct a proxy variable using listing prices, denoted by 𝐿!.

The seller can perceive house-specific amenities 𝜇! that the econometrician cannot observe.

Denote the Zillow estimate of the house at the time of listing as 𝑍!". 𝑍!" and 𝐿! are both public

information, while 𝜇! is unobserved to prospective buyers. In order to send signals about the

amenities 𝜇! to prospective buyers, sellers may add their values to 𝑍!" when setting listing prices,

i.e.,

𝐿! = 𝑍!" + 𝑔(𝜇!).

List prices 𝐿!  may differ from the online estimates 𝑍!" for various reasons. List prices tend to

start high since the seller predicts that the negotiation process will ultimately result in a lower

sales price. How quickly the seller needs to sell the property could also impact the list price. The

function 𝑔  thus captures the seller's adjustment of the self-valuation of 𝜇!. Note that the identity

function 𝑔 𝜇! = 𝜇! implies that there is no markup or markdown in the listing prices above the

observed and unobserved value of the house.2 Finally, to take this structure into estimation, we

                                                                                                                         2 We find that initial list prices are higher than the Zillow estimates in about 68 percent and lower in about 32 percent of the observations in our sample.

   

5

assume that 𝑔 is strictly increasing so that its inverse 𝑔!! exists. With this inverse function, we

can recover the unobserved house-specific amenities 𝜇! by

𝜇! = 𝑔!!(𝐿! − 𝑍!").

We use the linear difference (𝐿! − 𝑍!"), as well as a third order nonparametric approximation,

which we denote 𝑓(𝐿 − 𝑍!"). We use a nonparametric proxy in addition to the linear proxy to be

as flexible in estimating the unobservable component. We have no ex-ante reason to believe that

it should be linear or follow a specific structural form and hence we wanted to allow flexibility in

our estimation method.

This operation removes one of the two sources of endogeneity, namely 𝜇!. We still need

to handle the other unobserved variable 𝛿! for estimation. If we have multiple observations per

neighborhood, we can take first differences within a neighborhood to cancel out the 𝛿! terms.

Our data collection strategy thus intentionally collects multiple houses per neighborhood. Hence,

we control for neighborhood attributes by differencing two properties i and j within the same

neighborhood. Ultimately, the main regression we run is

Δ𝑌!" = 𝛽Δ𝑍!" + ∆𝑋!"𝜋 + Δ𝑓(𝐿 − 𝑍!)!" + 𝜖!,

where Δ!" denotes the first difference between house i and j in the same neighborhood. We

estimate the above equation where Δ𝑓(𝐿 − 𝑍!)!" is simply the linear difference, i.e., (𝐿 − 𝑍!)! −

(𝐿 − 𝑍!)! or a third order nonparametric approximation. To estimate the parameters with this

additive nonparametric function we first partial out each observable (Δ𝑌!",  Δ𝑍!" ∆𝑋!") with a

nonparametric regression on Δ(𝐿 − 𝑍!)!" , and then performs OLS estimation of the

nonparametrically partialled out Δ𝑌!" on the nonparametrically partialled out Δ𝑍!" and ∆𝑋!".

3. Data

Zillow does not disclose the formula that it uses to generate their price estimates, but

Zillow's website mentions that it uses the physical attributes of the property, tax assessments, and

prior and current transactions of the property itself and the comparable properties nearby. In

addition to its current house price estimates, Zillow provides its past estimates, current and past

listing prices when available, and the most recent sales price, and past sales prices when

available. We collect the sales date and price, Zillow estimate at the time of sales, the estimates

   

6

up to three months before sales, the initial listing price, and historical sales and listing prices

when available.

We also collect the address of the house, square footage, number of bedrooms, number of

bathrooms, lot size, year built, and property tax. Zillow also provides nearby school names. In

constructing the sample, we collect data on multiple houses for each neighborhood to enable first

differencing within neighborhoods. We use the 30 MSAs that Zillow lists in explaining their

estimates (https://www.zillow.com/zestimate/) as the base sample, and replace a few MSAs that

did not have sales prices in the data. For each MSA we find 10 neighborhoods with median

Zillow estimates closest to the MSA median Zillow estimate and collect data on four houses per

neighborhood. If there are less than 10 neighborhoods in an MSA we additionally collect four

more houses from existing neighborhoods, starting with neighborhoods that have median Zillow

estimates closest to the MSA median Zillow estimate. Within each neighborhood, we restrict the

search to single family houses that are 2,000 sqft or above, have 3 bedrooms or more, 2

bathrooms or more, and were last sold between July 2012 and July 2013. For each neighborhood,

we narrow down to houses that have Zillow estimates that are closest to its Zip Code median

Zillow estimate, and to houses that list the same set of nearby public elementary schools. We

then randomly select the first four houses that have non-missing information on the Zillow

estimates at time of sales, sales price, initial listing price, number of beds and baths, house square

footage, lot size, and year built. This procedure returns 40 houses in 30 MSAs for a total of 1,200

observations. Our main motivation behind the above data collection strategy was to collect a

sample of single family houses that were representative of the median market segment in each

MSA. Table 1 Panel A summarizes the characteristics of these houses.

In order to explore possible factors that might explain the variation in the elasticity

estimates across the 30 MSAs, we also collect MSA level data from the US Census. We collect

data on household internet accessibility and the number of residential real estate agents by MSA.

In addition, we compile data on the population, land area, number of families, number of housing

units, median household income, educational attainment, and unemployment rate for each MSA.

The variables are for the year 2013, except for the number of real estate agents in the MSA,

which is for 2012. Table 1 Panel B presents the summary statistics for the MSA level variables.

[Table 1 here]

   

7

4. Empirical results

4.1 Main results

Table 2 presents the results. Column (1) controls for the unobserved house specific

characteristics by including the linear proxy, i.e., (𝐿 − 𝑍!). Column (2) and onwards use the

higher order nonlinear proxy, 𝑓(𝐿 − 𝑍!). Column (3) additionally controls for number of days on

the market. Column (4) uses higher order covariates of the observable house characteristics, and

column (5) adds season fixed effects.

[Table 2 here]

The coefficient estimate on log Zillow estimate is 0.89 in column (1). Once we use the

nonlinear proxy the estimates are stable across all specifications at around 0.94 to 0.95 and are

statistically significant. Panel A presents the p-value of the test that examines whether the

coefficient estimate on log Zillow estimate is equal to one. Except for column (1), we are unable

to reject the null that the coefficient estimate is equal to one at the 10 percent level. Panel B

reports the p-value of the test that jointly examines whether all the coefficient estimates on the

observable house characteristics are equal to zero. Again, except for column (1) we are unable to

reject the joint hypothesis that the estimates on the housing characteristics are jointly equal to

zero.

One may wonder whether the Zillow estimates, which are periodically updated may not

contain much information beyond the list price. However, as Table 2 column (1) indicates the

coefficient estimate on the Zillow estimate is significant when the linear proxy variable is

included. Recall that the list price linearly enters the regression when we use the linear proxy - it

is the difference between the list price and Zillow estimate at the time of listing. In other words,

the Zillow estimate is significantly related to the sales price even when list price is linearly

controlled for in the OLS regression. Moreover, the linear proxy itself is statistically significant.

The fact that this differential is statistically significant in explaining actual sales prices, even

when the observables and the Zillow estimate is controlled for, indicates that there is useful

information captured in the Zillow estimate beyond list prices.

   

8

The literature has found that Zillow estimates tend to overestimate sales prices (Hollas et

al. 2010), and is less accurate in lower priced homes (Corcoran and Liu 2014). However, these

studies have focused on either one or two specific regional markets. It is difficult to generalize

whether there are systematic biases in Zillow estimates across all MSAs. Our econometric

strategy assumes that the difference between the Zillow estimates and list prices contain some

information on unobserved house characteristics and we utilize that information. Hence, our

estimation strategy takes the Zillow estimate at its face value. The results in table 2 confirm that

there is informational value in the difference between the list price and Zillow estimate.

The main focus of our paper is to simply examine how well Zillow estimates predict sales

prices, and we do not aim to estimate the causal impact of Zillow estimate on transaction prices.

The latter would be a much more ambitious goal. However, future research that aims to estimate

the causal impact of online property estimates would need to further explore whether there are

systematic biases in the online estimates, and whether those are positively or negatively

correlated with unobserved house characteristics.

4.2 Do Zillow estimates predict sales prices around the time of sales?

Overall, Table 2 indicates that (1) Zillow's estimates are highly correlated with

transaction prices, even when observable and unobservable house and neighborhood

characteristics are controlled for, and (2) observable house characteristics have no additional

predictive power once the Zillow estimate is controlled for. This may not be surprising given

that Zillow uses a complex forecasting methods relative to the hedonic frameworks generally

used in the literature.3 Though we do not directly address causality in this paper, given such high

predictive power and the increasing use of information of technology, one could wonder whether

                                                                                                                         3 Zillow indicates that they use proprietary automated valuation models that apply advanced algorithms. Zillow analyzes home-related data and actual sales prices to identify relationships within a specific geographic area. Home characteristics, such as square footage, location or the number of bathrooms, are given different weights according to their influence on home sale prices in each specific geography over a specific period of time. This creates a set of valuation rules, or models that are applied to generate each home's Zillow estimate. Specifically, some of the data Zillow uses in the algorithm include (1) physical attributes including location, lot size, square footage, number of bedrooms and bathrooms and many other details; (2) tax assessments including property tax information, actual property taxes paid, exceptions to tax assessments and other information provided in the tax assessors' records; and (3) prior and current transactions including actual sale prices over time of the home itself and comparable recent sales of nearby homes. The above and additional description of Zillow's estimate can be found at https://www.zillow.com/zestimate/.

   

9

the price estimates provided by Zillow can actually influence sales prices. We are unable to

provide a definitive answer to this question in this paper, given that we do not have random

variation in the Zillow estimates. Instead, we examine whether the short term change in Zillow

estimates when sales prices are being negotiated relates to how the final sales price adjusts from

the list price one month prior to sales.

Table 3 presents the results from a regression where the dependent variable is sales price

minus list price one month prior to sales, and the main regressor is the difference in Zillow

estimate between 1 month and 2 month prior to sales. We control for the number of days the

house was listed on the market, time (year and month of sales) fixed effects, and the change in

MSA level Zillow estimates. We exclude the own house observation when calculating the MSA

level change.

[Table 3 here]

Columns (1) and (2) indicate that the Zillow estimates are positively related to how prices

adjust in the final month leading to sales, but the relationship is not statistically significant.

However, there were several observations where the Zillow estimates fluctuate drastically over a

short period of time, and may not be perceived as reliable. In columns (3) and (4) we drop

observations where the change in Zillow estimates over the last two months is greater than

$100,000. The coefficient estimates in the trimmed sample are positive and statistically

significant. The results indicate that a thousand dollar change in the Zillow estimate is related to

the final sales price adjusting from the list price one month prior to sales by 200 dollars. This

effect is not driven by regional trends as we do control for the MSA level change in Zillow

estimates. Though our evidence is not causal, it does suggest that online price estimates may

have a direct impact on prices.

Another indirect way of examining the information between listing prices and the Zillow

estimate is to explore how Zillow estimates are related to the movement of list prices. We

examine how Zillow estimates are related to the movement of list prices around the time of sales

in Table 4. Using our data we construct a panel data of list prices and Zillow estimates. Each

house has Zillow estimates from 1, 2, and 3 months prior to sales. We were able to collect

historical list prices for a subset of the initial sample. We perform a simple house level fixed

   

10

effects regression of list price on current or one month prior Zillow estimates. As the results

indicate, we find no significant relation between Zillow estimates and list prices in the months

before sales. The information contained in these two variables are not simply redundant and

people do not immediately adjust list prices based on the short term changes in Zillow estimates.

[Table 4 here]

4.3 What Explains the Variation in the Elasticity Estimates across MSAs?

We next examine how the elasticity estimates vary across MSAs. Table 5 presents the

impact of one month prior Zillow estimates on sales prices in each of the 30 MSAs using the full

model in Table 2 column (5). For each MSA we conduct hypotheses tests on whether the

elasticity estimate is one and whether all covariates are jointly equal to zero. Even with 40

observations per MSA, many of the estimates are statistically different from zero at the 10

percent level. The estimates vary considerably across MSAs, e.g. ranging from -0.9 in Las Vegas

to 2 in Boston. Many of the estimates are statistically indistinguishable from one even at the

MSA level.

[Table 5 here]

What might explain the variation in the elasticity estimates across MSAs? We

hypothesize that the availability of internet at home would increase the demand for online house

price information and the relative valuation of online house price information in determining

sales prices, i.e., the elasticity estimate. We use the household internet accessibility rate,

measured as one minus the share of households without internet and computer access in the

MSA, as our main proxy for accessibility to online house price information. Despite the sample

size of 30, the bivariate regression in Table 6 column (1) confirms that the relationship is

statistically significant at the 5 percent level. A 1 percent point increase in household internet

accessibility is associated with an increase in the elasticity estimate by 0.2. In column (2) we

include the number of residential real estate agents in the MSA, but this variable has no

significant impact and the coefficient estimate on household internet accessibility remains

unchanged. In column (3), we control for the size of the MSA by controlling for the land area

and population. The coefficient estimate on household internet accessibility decreases slightly

   

11

but is still significant. In column (4), we additionally control for the number of families. The

number of families conditional on the population captures the potential demand for housing in

the MSA. The coefficient estimate on the log number of families is positive and the coefficient

estimate on population is now negative. This likely suggests that cities with more families per

population demand more housing and puts more value on online price information. In column (5),

we control for the median household income and the number of adults with a college degree.

There is a weak negative relationship with median income but no significant relationship with

the size of the college educated population. We note that the coefficient estimate on household

internet accessibility increases to 32.57. Finally, we control for housing supply and the economic

condition in the city. The negative coefficient estimate on the log number of housing units

suggests that the supply of housing conditional on potential demand reduces the elasticity

estimate, potentially by reducing the relative demand for online house price information. The

coefficient estimate on household internet accessibility barely changes and is now statistically

significant at the 1 percent level.

[Table 6 here]

5. Conclusions

Our empirical results show that online property price estimates strongly predict

transaction prices. Moreover, we find suggestive evidence that indicates that such strong

predictive power may have an impact on determining actual transaction prices. Though the

literature has examined how informational advantage translates to higher home sales prices

(Levitt and Syversson 2008), we believe our paper is a novel attempt at understanding the role

information technology plays in the property market.

The high correlation between house sales prices and online price estimates potentially has

significant implications. How information is generated could have a non-negligible impact on

house prices. Also, the prevalence and accessibility of online house price information and

people's reliance on such information could influence house price dynamics. Future research that

can better identify the causal impact of information technology on property transaction prices

would be valuable.

   

12

Reference

Aker, Jenny C.; Mbiti, Isaac M. 2010. “Mobile Phones and Economic Development in Africa,”

Journal of Economic Perspectives 24(3): 207-232.

Bailey, Cao, Kuchler, and Stroebel. 2016. "Social Networks and Housing Markets," NBER

Working Paper, w22258.

Bajari, Patrick, Jane Cooley, Kyoo il Kim, and Christopher Timmins. 2012.“A Rational

Expectations Approach to Hedonic Price Regressions with Time-Varying Unobserved Product

Attributes: The Price of Pollution,” American Economic Review 102(5): 1898-1926.

Bayer, Patrick, Fernando Ferreira, and Robert McMillan. 2007. “A Unified Framework for

Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy 114(4):

588-638.

Bayer, Patrick, Marcus Casey, Fernando Ferreira, and Robert McMillan. 2013. “Estimating

Racial Price Differentials in the Housing Market,” mimeo.

Black, Sandra. 1999. “Do Better Schools Matter? Parental Valuation of Elementary

Education,” Quarterly Journal of Economics 114(2): 577-599.

Bloom, Sadun, and Van Reenen. 2012. "Americans Do IT Better: US Multinationals and the

Productivity Miracle." American Economic Review 102(1): 167-201.

Bresnahan, Brynjolfsson, Hitt. 2002. "Information Technology, Workplace Organization, and

the Demand for Skilled Labor: Firm-Level Evidence." Quarterly Journal of Economics 117 (1):

339-376.

Cardona, M, T. Kretschmer, T. Strobel. 2013. "ICT and productivity: conclusions from the

empirical literature." Information Economics and Policy 25(3): 109-125.

Corcoran and Liu (2014): Accuracy of Zillow's Home Value Estimates, Real Estate Issues 39

(1).

Easley, David and Maureen O’Hara. 1987. “Price, Trade Size, and Information in Securities

Markets,” Journal of Financial Economics, 19: 69-90.

Easley, David and Maureen O’Hara. 2004. “Information and the Cost of Capital,” Journal of

Finance 59(4): 1553-1583.

Fang, Lily and Joel Peress. 2009. “Media Coverage and the Cross-section of Stock Returns,”

Journal of Finance 64(5): 2023-2052.

   

13

Hollas, Rutherford and Thomson. 2010. "Zillow's Estimates of Single-family Home Values,"

Appraisal Journal 78 (1).

Levitt, Steven and Chad Syverson. 2008. "Market Distortions When Agents Are Better

Informed: The Value of Information in Real Estate Transactions," Review of Economics and

Statistics 90(4): 599-611.

Martínez, Diego, Jesús Rodríguez, José L. Torres. 2010. "ICT-specific technological change

and productivity growth in the US: 1980–2004." Information Economics and Policy 22(2): 121-

129.

Varian, Farrell, & Shapiro. (2004). Economics of Information Technology: An Introduction.

Cambridge: Cambridge University Press.

   

14

Table 1. Summary statistics

Variable Mean Std. Dev. Min Max Obs

Panel A: House level data

Sales price 321914 245299 10000 2950000 1200

Zillow estimate when sold 322056 230858 36000 2600000 1200

Zillow estimate 1 month prior to sale 320108 230967 38000 2600000 1200

Zillow estimate in the month listed for sale 324022 232077 33000 2900000 1189

List price 340012 251240 19900 2900000 1199

Number of days between listing and sales 185.1 269.9 2 3349 1195

Number of bedrooms 3.85 0.87 3 9 1200

Number of bathrooms 2.68 0.64 2 6 1199

Square footage 2373 547 2000 10890 1200

Lot square footage 8365 13936 595 304920 1199

Year built 1960 37 1810 2013 1200

Panel B: MSA level data

Land area 6040.69 4783.888 1600.9 27259.9 30

Population, 2013 4254068 3822144 1601565 1.97E+07 30

Housing units, 2013 1711144 1458024 654120 7797490 30

Number of families, 2013 1013282 876799.7 400899 4550781 30

Median household income, 2013 60789.97 11027.35 46497 90962 30

Population above 25 years old with a bachelor degree, 2013 20.74333 3.258589 12.7 26.9 30

Unemployment rate, 2013 9.943333 1.777771 7 14.7 30

Household internet accessibility, 2013 0.9149734 0.0129532 0.8945243 0.9439421 30

Number residential real estate agents, 2012 4311.6 4262.602 749 22063 30

Notes: Variables listed in Panel A were collected manually from Zillow. Panel B data was extracted from the US Census.

   

15

Table 2. Zillow estimate and sales price Dependent variable: Log sales price (1) (2) (3) (4) (5)

Log Zillow estimate 0.893 0.948 0.948 0.947 0.941

(0.038) (0.035) (0.035) (0.036) (0.036)

Length of listing -0.514 -0.5 -0.674 (0.842) (0.844) (0.843)

Proxy 0.21

(0.020)

Proxy format Linear Non-linear Non-linear Non-linear Non-linear Neighborhood first differencing Yes Yes Yes Yes Yes Higher order covariates Yes Yes Season fixed effects Yes A. Null hypothesis: coefficient estimate of log Zillow estimate=1 p-value 0.005 0.138 0.136 0.148 0.104 B. Null hypothesis: All coefficient estimates of the house characteristics variables=0 p-value 0.035 0.503 0.506 0.501 0.443 R-Squared 0.918 0.935 0.935 0.935 0.935 N 1200 1200 1200 1200 1200

Notes: The Zillow estimates are estimates one month prior to the time of sales. The linear proxy is list price minus the Zillow estimate around the time of listing and the non-linear proxy is the nonparametric third order polynomial approximation. Higher order covariates are third order polynomials for bed, bath, square feet, lot square feet, and year built. Robust standard errors are in parentheses.

   

16

Table 3. Price movements one month prior to sales (1) (2) (3) (4) Sales price - List price one month prior to sales Panel A. Full sample Full sample Drop outliers Zillow estimate 1 month prior to sales - Zillow estimate 2 month prior to sales

0.0933 0.0727 0.217 0.196

(0.0970) (0.0962) (0.111) (0.110)

Days listed -25.50 -24.53 -26.28 -25.66 (6.038) (6.269) (6.217) (6.454)

Change in MSA level Zillow estimate

0.774 0.776 0.632 0.619 (0.184) (0.184) (0.177) (0.177)

Time fixed effects Y Y Observations 801 801 769 769 R-squared 0.073 0.084 0.074 0.083

Notes: Observations where the change in Zillow estimates between the month of sales and 3 months prior to sales was greater than $100,000 are dropped in columns (3) and (4). Robust standard errors are in parentheses.

   

17

Table 4. Zillow estimates and list prices

(1) (2) (3) (4) List price List price Ln(list price) Ln(list price) Zillow estimate 0.00286 (0.0353) Zillow estimate one month ago 0.000661 (0.0158) Ln(Zillow estimate) -0.0210 (0.0330) Ln(Zillow estimate one month ago) -0.00261 (0.0138) House fixed effects Yes Yes Yes Yes Observations 2,257 1,620 2,256 1,619 R-squared 0.023 0.042 0.016 0.023

Notes: Robust standard errors are in parentheses.

   

18

Table 5. Zillow estimate and sales price: elasticity estimates by MSA

MSA Elasticity estimate p-Value: Elasticity

estimate=0

p-Value: Elasticity

estimate=1

p-Value: All coefficient

estimates on covariates=0

R2 N

Atlanta 0.976 (0.234) 0.004 0.920 0.500 0.976 40 Baltimore 0.201 (0.297) 0.518 0.518 0.113 0.726 40 Boston 2.007 (0.263) 0.000 0.007 0.001 0.934 40 Charlotte -0.169 (2.926) 0.955 0.699 0.942 0.367 40 Chicago 1.000 (0.115) 0.000 0.997 0.701 0.969 40 Cincinnati 0.857 (0.290) 0.025 0.640 0.881 0.801 40 Columbus 0.513 (0.303) 0.125 0.142 0.906 0.600 40 Denver 1.139 (0.239) 0.001 0.575 0.960 0.901 40 Las Vegas -0.901 (0.755) 0.267 0.036 0.222 0.036 40 Los Angeles 0.799 (0.142) 0.001 0.197 0.179 0.940 40 Miami-Fort Lauderdale 0.928 (0.355) 0.031 0.845 0.288 0.900 40 Minneapolis-St. Paul 0.894 (0.148) 0.000 0.495 0.074 0.914 40 Nashville 1.095 (0.192) 0.000 0.631 0.476 0.770 40 New York 0.973 (0.501) 0.088 0.959 0.513 0.910 40 Orlando 0.587 (0.220) 0.026 0.093 0.714 0.835 40 Philadelphia 1.282 (0.344) 0.007 0.440 0.300 0.465 40 Phoenix 0.563 (0.361) 0.154 0.257 0.638 0.470 40 Pittsburgh 1.056 (0.120) 0.000 0.652 0.599 0.988 40 Portland 0.853 (0.491) 0.126 0.774 0.687 0.886 40 Providence-Warwick 1.617 (0.616) 0.034 0.350 0.689 0.570 40 Riverside 0.248 (0.385) 0.536 0.083 0.224 0.513 40 Sacramento 0.280 (0.584) 0.642 0.246 0.812 0.600 40 San Diego 1.656 (1.540) 0.314 0.681 0.316 0.232 40 San Francisco 0.476 (0.441) 0.311 0.269 0.170 0.458 40 San Jose 0.884 (0.195) 0.002 0.568 0.039 0.555 40 Seattle 0.749 (0.255) 0.022 0.358 0.627 0.628 40 St. Louis 0.305 (0.220) 0.202 0.013 0.209 0.797 40 Tampa 1.291 (0.148) 0.000 0.086 0.260 0.172 40 Virginia Beach 1.628 (0.675) 0.042 0.379 0.628 0.685 40 Washington DC 0.470 (0.222) 0.067 0.044 0.565 0.696 40

Notes: Regression for each MSA uses the specification in column (5) of Table 2. Standard errors are in parentheses.

   

19

Table 6. Factors that explain the elasticity estimates across MSA’s

(1) (2) (3) (4) (5) (6) Elasticity estimate (Effect of online price estimates on sales prices)

Household internet accessibility 22.28 22.57 19.00 18.02 32.57 31.73 (9.244) (9.412) (10.22) (9.191) (13.34) (10.83)

Ln(number of residential real estate agents)

-0.0342 -0.334 -0.244 -0.278 -0.160 (0.0920) (0.311) (0.288) (0.352) (0.326)

Ln(land area) -0.141 -0.168 0.00672 -0.107 (0.201) (0.165) (0.252) (0.243)

Ln(population) 0.454 -3.472 -2.824 0.0484 (0.407) (1.304) (1.925) (2.175)

Ln(number of families) 3.920 3.283 5.156 (1.286) (1.773) (1.312)

Ln(median household income) -2.061 -3.701 (1.056) (1.034)

Ln(number of college degree holders)

1.203 1.376 (0.994) (0.999)

Ln(housing units) -4.896 (2.530)

Unemployment rate -0.0825 (0.0872)

Observations 30 30 30 30 30 30 R-squared 0.246 0.248 0.289 0.406 0.522 0.636

Notes: The elasticity estimates for each MSA in Table 5 are the dependent variable. All control variables are for 2013, except for the number of real estate agents, which is for 2012. Household internet accessibility is 1 minus the share of households without internet and computer access. Robust standard errors are in parentheses.