Upload
lamphuc
View
213
Download
1
Embed Size (px)
Citation preview
1
Information Technology in the Property Market
Yong Suk Leea*1, Yuya Sasakib
a Freeman Spogli Institute for International Studies, Stanford University b Department of Economics, Vanderbilt University
Abstract Information technology is increasingly being utilized in the property market. This paper examines how sensitive house transaction prices are to online price estimates using data collected from Zillow. We find that online property price estimates strongly predict transaction prices even when observable and unobservable house and neighborhood characteristics are controlled for. In addition, we find evidence that suggests that online price estimates may have a direct impact on transaction prices.
Keywords: information technology, online price estimates, house price JEL Codes: L86, R30
*Correspondence to: Freeman Spogli Institute for International Studies, Stanford University, Stanford, CA 94305. email address: [email protected].
2
1. Introduction
With the advent of information technology, researchers have examined how information
technology affects various aspects of our economy, including the labor market, firm productivity,
intellectual property, and social networks to name a few (Bailey et al. 2016, Cardona et al. 2013,
Bresnahan et al. 2002, Bloom et al. 2012, Varian et al. 2004). The property market has also seen
a surge in the use of information technology, especially in the form of price information. Due to
the location specific nature and uniqueness of each property, obtaining information on real estate
had been relatively costlier than other types of assets or commodities. Now potential sellers and
buyers can easily examine property markets online. Not only can one examine the sales prices
for a large number of comparable properties, one can now easily find out the price estimates for a
specific property. In this paper, we examine how well online price estimates predict actual
transaction prices using data we collect from Zillow, a major real estate information provider in
the U.S.
In order to examine the impact of online house price information on transaction prices,
we develop a reduced-form pricing equation as the convex combination of the online price
estimate and the hedonic valuation of a property. The main challenge for estimation is
controlling for the unobserved house and neighborhood attributes in the model. We present a
method that proxies for unobserved house specific attributes using the difference between the
listing price and the online price estimate at the time of listing. Quasi-experimental research
designs, and in particular, boundary discontinuities (Black 1999, Bayer et al. 2005) have been
used to control for unobservable area specific attributes in the literature. On the other hand,
Bajari et al. (2012) propose a method that relies less on the research design but on the structural
assumption that prior house sales prices can be used to control for time-varying unobservable
attributes in a hedonic regression. Better data, such as, repeat-sales house transaction data has
also enabled researchers to deal with unobservable house and neighborhood attributes (Bayer et
al. 2013). This paper incorporates weak structural assumptions into a hedonic model to examine
the impact of online house information on transaction prices.
We collect home value estimates, list prices, sales prices, and house and neighborhood
attributes from Zillow.com for 1,200 houses across 30 Metropolitan Statistical Areas (MSAs) in
the US. We find that the elasticity of house sales prices with respect to Zillow estimates is large
3
and quite close to one. We also find that the change in Zillow estimates impacts how sales prices
adjust from the list prices a month before sales.
In addition to the large literature that examines the impact of information technology on
markets and productivity (Cardona et al. 2013, Martínez et al. 2010, Bresnahan et al. 2002,
Bloom et al. 2012, Varian et al. 2004), our paper is also related to the literature that examines
how information affects asset prices. The role of information in the equity market has been
widely studied. For example, Fang and Peress (2009) find that mass media reduces informational
frictions and affect security prices. Easley and O’Hara (1987, 2004) show that large trades in the
securities market reflect better information and impact security prices, and that investors demand
higher returns on stocks for which there is less public information. In the real estate market,
Levitt and Syversson (2008) show that informational advantage translates to higher house sales
prices. They find that properties owned by real estate brokers sell at about a 4 percent premium
relative to non-realtor owned comparable properties. Finally, this paper is related to the literature
that examines how information technology affects market activity. Bailey et al. (2016) examine
how online social networks affect people's perceptions on property investments, and ultimately
find that friend networks on Facebook significantly influence individual housing investment
decisions and aggregate housing prices. Also, mobile phones have been found to facilitate
economic development in developing countries by facilitating the exchange of information in the
labor and agriculture markets (Aker and Mbiti, 2010).
2. The Econometric Model and Estimation Strategy
In order to estimate how sensitive transaction prices are to online price estimates, we
consider a reduced-form pricing equation as the convex combination of the online price estimate
and the hedonic valuation of the property, that is,
(eq.1) 𝑌!" = 𝛽𝑍!" + (1− 𝛽)[𝛼 + 𝑋!"𝛾 + 𝛿! + 𝜇! + 𝜀!"],
where 𝑌!" is the sales price of home i in neighborhood n, 𝑍!" the Zillow estimate at time of sales,
𝑋!" the set of observable home characteristics, e.g., square footage, number of rooms, etc., 𝛿! the
neighborhood fixed effect, and 𝜇! the home specific fixed effect. We subsume time subscripts for
simplicity but include season fixed effects to capture seasonal trends in sales prices.
The expression in the square brackets in the second term, 𝛼 + 𝑋!"𝛾 + 𝛿! + 𝜇! + 𝜀!",
constitute the traditional hedonic pricing model. We add the first term 𝑍!" to reflect the potential
4
effects of Zillow's estimate 𝑍!" on transaction prices 𝑌!" . As such, the parameter 𝛽 may be
interpreted as the degree to which Zillow's estimate predicts sales price. The null hypothesis that
𝑍!" does not impact actual transaction prices, i.e., 𝛽=0, can be readily tested once a consistent
estimate of 𝛽 is obtained. The OLS estimators of the parameters 𝛼, 𝛽, and 𝛾 would be consistent
if (𝛿!, 𝜇!) were mean independent of both 𝑍!" and 𝑋!". However, this statistical independence
assumption is hard to justify at least for two reasons. First, the unobserved house-specific
amenities 𝜇! are likely to be correlated with the observed house-specific amenities 𝑋!". Second,
more importantly in our study, the introduction of 𝑍!" in the extended hedonic pricing model
causes another source of endogeneity. To see this, it may help to think of how the home price
information 𝑍!" is generated by online real estate information providers. Although these service
agencies do not disclose their formulas, the estimates 𝑍!" are constructed using recent transaction
data in the neighborhood n of house i. As such, the statistical independence between the online
price estimate and unobserved neighborhood characteristics will probably not hold even if we
control for the observed house specific amenities 𝑋!". We therefore propose approaches to
handle these two sources of endogeneity.
First, to control for the endogenous unobserved house-specific amenities 𝜇!, we use a
proxy variable. Specifically, we construct a proxy variable using listing prices, denoted by 𝐿!.
The seller can perceive house-specific amenities 𝜇! that the econometrician cannot observe.
Denote the Zillow estimate of the house at the time of listing as 𝑍!". 𝑍!" and 𝐿! are both public
information, while 𝜇! is unobserved to prospective buyers. In order to send signals about the
amenities 𝜇! to prospective buyers, sellers may add their values to 𝑍!" when setting listing prices,
i.e.,
𝐿! = 𝑍!" + 𝑔(𝜇!).
List prices 𝐿! may differ from the online estimates 𝑍!" for various reasons. List prices tend to
start high since the seller predicts that the negotiation process will ultimately result in a lower
sales price. How quickly the seller needs to sell the property could also impact the list price. The
function 𝑔 thus captures the seller's adjustment of the self-valuation of 𝜇!. Note that the identity
function 𝑔 𝜇! = 𝜇! implies that there is no markup or markdown in the listing prices above the
observed and unobserved value of the house.2 Finally, to take this structure into estimation, we
2 We find that initial list prices are higher than the Zillow estimates in about 68 percent and lower in about 32 percent of the observations in our sample.
5
assume that 𝑔 is strictly increasing so that its inverse 𝑔!! exists. With this inverse function, we
can recover the unobserved house-specific amenities 𝜇! by
𝜇! = 𝑔!!(𝐿! − 𝑍!").
We use the linear difference (𝐿! − 𝑍!"), as well as a third order nonparametric approximation,
which we denote 𝑓(𝐿 − 𝑍!"). We use a nonparametric proxy in addition to the linear proxy to be
as flexible in estimating the unobservable component. We have no ex-ante reason to believe that
it should be linear or follow a specific structural form and hence we wanted to allow flexibility in
our estimation method.
This operation removes one of the two sources of endogeneity, namely 𝜇!. We still need
to handle the other unobserved variable 𝛿! for estimation. If we have multiple observations per
neighborhood, we can take first differences within a neighborhood to cancel out the 𝛿! terms.
Our data collection strategy thus intentionally collects multiple houses per neighborhood. Hence,
we control for neighborhood attributes by differencing two properties i and j within the same
neighborhood. Ultimately, the main regression we run is
Δ𝑌!" = 𝛽Δ𝑍!" + ∆𝑋!"𝜋 + Δ𝑓(𝐿 − 𝑍!)!" + 𝜖!,
where Δ!" denotes the first difference between house i and j in the same neighborhood. We
estimate the above equation where Δ𝑓(𝐿 − 𝑍!)!" is simply the linear difference, i.e., (𝐿 − 𝑍!)! −
(𝐿 − 𝑍!)! or a third order nonparametric approximation. To estimate the parameters with this
additive nonparametric function we first partial out each observable (Δ𝑌!", Δ𝑍!" ∆𝑋!") with a
nonparametric regression on Δ(𝐿 − 𝑍!)!" , and then performs OLS estimation of the
nonparametrically partialled out Δ𝑌!" on the nonparametrically partialled out Δ𝑍!" and ∆𝑋!".
3. Data
Zillow does not disclose the formula that it uses to generate their price estimates, but
Zillow's website mentions that it uses the physical attributes of the property, tax assessments, and
prior and current transactions of the property itself and the comparable properties nearby. In
addition to its current house price estimates, Zillow provides its past estimates, current and past
listing prices when available, and the most recent sales price, and past sales prices when
available. We collect the sales date and price, Zillow estimate at the time of sales, the estimates
6
up to three months before sales, the initial listing price, and historical sales and listing prices
when available.
We also collect the address of the house, square footage, number of bedrooms, number of
bathrooms, lot size, year built, and property tax. Zillow also provides nearby school names. In
constructing the sample, we collect data on multiple houses for each neighborhood to enable first
differencing within neighborhoods. We use the 30 MSAs that Zillow lists in explaining their
estimates (https://www.zillow.com/zestimate/) as the base sample, and replace a few MSAs that
did not have sales prices in the data. For each MSA we find 10 neighborhoods with median
Zillow estimates closest to the MSA median Zillow estimate and collect data on four houses per
neighborhood. If there are less than 10 neighborhoods in an MSA we additionally collect four
more houses from existing neighborhoods, starting with neighborhoods that have median Zillow
estimates closest to the MSA median Zillow estimate. Within each neighborhood, we restrict the
search to single family houses that are 2,000 sqft or above, have 3 bedrooms or more, 2
bathrooms or more, and were last sold between July 2012 and July 2013. For each neighborhood,
we narrow down to houses that have Zillow estimates that are closest to its Zip Code median
Zillow estimate, and to houses that list the same set of nearby public elementary schools. We
then randomly select the first four houses that have non-missing information on the Zillow
estimates at time of sales, sales price, initial listing price, number of beds and baths, house square
footage, lot size, and year built. This procedure returns 40 houses in 30 MSAs for a total of 1,200
observations. Our main motivation behind the above data collection strategy was to collect a
sample of single family houses that were representative of the median market segment in each
MSA. Table 1 Panel A summarizes the characteristics of these houses.
In order to explore possible factors that might explain the variation in the elasticity
estimates across the 30 MSAs, we also collect MSA level data from the US Census. We collect
data on household internet accessibility and the number of residential real estate agents by MSA.
In addition, we compile data on the population, land area, number of families, number of housing
units, median household income, educational attainment, and unemployment rate for each MSA.
The variables are for the year 2013, except for the number of real estate agents in the MSA,
which is for 2012. Table 1 Panel B presents the summary statistics for the MSA level variables.
[Table 1 here]
7
4. Empirical results
4.1 Main results
Table 2 presents the results. Column (1) controls for the unobserved house specific
characteristics by including the linear proxy, i.e., (𝐿 − 𝑍!). Column (2) and onwards use the
higher order nonlinear proxy, 𝑓(𝐿 − 𝑍!). Column (3) additionally controls for number of days on
the market. Column (4) uses higher order covariates of the observable house characteristics, and
column (5) adds season fixed effects.
[Table 2 here]
The coefficient estimate on log Zillow estimate is 0.89 in column (1). Once we use the
nonlinear proxy the estimates are stable across all specifications at around 0.94 to 0.95 and are
statistically significant. Panel A presents the p-value of the test that examines whether the
coefficient estimate on log Zillow estimate is equal to one. Except for column (1), we are unable
to reject the null that the coefficient estimate is equal to one at the 10 percent level. Panel B
reports the p-value of the test that jointly examines whether all the coefficient estimates on the
observable house characteristics are equal to zero. Again, except for column (1) we are unable to
reject the joint hypothesis that the estimates on the housing characteristics are jointly equal to
zero.
One may wonder whether the Zillow estimates, which are periodically updated may not
contain much information beyond the list price. However, as Table 2 column (1) indicates the
coefficient estimate on the Zillow estimate is significant when the linear proxy variable is
included. Recall that the list price linearly enters the regression when we use the linear proxy - it
is the difference between the list price and Zillow estimate at the time of listing. In other words,
the Zillow estimate is significantly related to the sales price even when list price is linearly
controlled for in the OLS regression. Moreover, the linear proxy itself is statistically significant.
The fact that this differential is statistically significant in explaining actual sales prices, even
when the observables and the Zillow estimate is controlled for, indicates that there is useful
information captured in the Zillow estimate beyond list prices.
8
The literature has found that Zillow estimates tend to overestimate sales prices (Hollas et
al. 2010), and is less accurate in lower priced homes (Corcoran and Liu 2014). However, these
studies have focused on either one or two specific regional markets. It is difficult to generalize
whether there are systematic biases in Zillow estimates across all MSAs. Our econometric
strategy assumes that the difference between the Zillow estimates and list prices contain some
information on unobserved house characteristics and we utilize that information. Hence, our
estimation strategy takes the Zillow estimate at its face value. The results in table 2 confirm that
there is informational value in the difference between the list price and Zillow estimate.
The main focus of our paper is to simply examine how well Zillow estimates predict sales
prices, and we do not aim to estimate the causal impact of Zillow estimate on transaction prices.
The latter would be a much more ambitious goal. However, future research that aims to estimate
the causal impact of online property estimates would need to further explore whether there are
systematic biases in the online estimates, and whether those are positively or negatively
correlated with unobserved house characteristics.
4.2 Do Zillow estimates predict sales prices around the time of sales?
Overall, Table 2 indicates that (1) Zillow's estimates are highly correlated with
transaction prices, even when observable and unobservable house and neighborhood
characteristics are controlled for, and (2) observable house characteristics have no additional
predictive power once the Zillow estimate is controlled for. This may not be surprising given
that Zillow uses a complex forecasting methods relative to the hedonic frameworks generally
used in the literature.3 Though we do not directly address causality in this paper, given such high
predictive power and the increasing use of information of technology, one could wonder whether
3 Zillow indicates that they use proprietary automated valuation models that apply advanced algorithms. Zillow analyzes home-related data and actual sales prices to identify relationships within a specific geographic area. Home characteristics, such as square footage, location or the number of bathrooms, are given different weights according to their influence on home sale prices in each specific geography over a specific period of time. This creates a set of valuation rules, or models that are applied to generate each home's Zillow estimate. Specifically, some of the data Zillow uses in the algorithm include (1) physical attributes including location, lot size, square footage, number of bedrooms and bathrooms and many other details; (2) tax assessments including property tax information, actual property taxes paid, exceptions to tax assessments and other information provided in the tax assessors' records; and (3) prior and current transactions including actual sale prices over time of the home itself and comparable recent sales of nearby homes. The above and additional description of Zillow's estimate can be found at https://www.zillow.com/zestimate/.
9
the price estimates provided by Zillow can actually influence sales prices. We are unable to
provide a definitive answer to this question in this paper, given that we do not have random
variation in the Zillow estimates. Instead, we examine whether the short term change in Zillow
estimates when sales prices are being negotiated relates to how the final sales price adjusts from
the list price one month prior to sales.
Table 3 presents the results from a regression where the dependent variable is sales price
minus list price one month prior to sales, and the main regressor is the difference in Zillow
estimate between 1 month and 2 month prior to sales. We control for the number of days the
house was listed on the market, time (year and month of sales) fixed effects, and the change in
MSA level Zillow estimates. We exclude the own house observation when calculating the MSA
level change.
[Table 3 here]
Columns (1) and (2) indicate that the Zillow estimates are positively related to how prices
adjust in the final month leading to sales, but the relationship is not statistically significant.
However, there were several observations where the Zillow estimates fluctuate drastically over a
short period of time, and may not be perceived as reliable. In columns (3) and (4) we drop
observations where the change in Zillow estimates over the last two months is greater than
$100,000. The coefficient estimates in the trimmed sample are positive and statistically
significant. The results indicate that a thousand dollar change in the Zillow estimate is related to
the final sales price adjusting from the list price one month prior to sales by 200 dollars. This
effect is not driven by regional trends as we do control for the MSA level change in Zillow
estimates. Though our evidence is not causal, it does suggest that online price estimates may
have a direct impact on prices.
Another indirect way of examining the information between listing prices and the Zillow
estimate is to explore how Zillow estimates are related to the movement of list prices. We
examine how Zillow estimates are related to the movement of list prices around the time of sales
in Table 4. Using our data we construct a panel data of list prices and Zillow estimates. Each
house has Zillow estimates from 1, 2, and 3 months prior to sales. We were able to collect
historical list prices for a subset of the initial sample. We perform a simple house level fixed
10
effects regression of list price on current or one month prior Zillow estimates. As the results
indicate, we find no significant relation between Zillow estimates and list prices in the months
before sales. The information contained in these two variables are not simply redundant and
people do not immediately adjust list prices based on the short term changes in Zillow estimates.
[Table 4 here]
4.3 What Explains the Variation in the Elasticity Estimates across MSAs?
We next examine how the elasticity estimates vary across MSAs. Table 5 presents the
impact of one month prior Zillow estimates on sales prices in each of the 30 MSAs using the full
model in Table 2 column (5). For each MSA we conduct hypotheses tests on whether the
elasticity estimate is one and whether all covariates are jointly equal to zero. Even with 40
observations per MSA, many of the estimates are statistically different from zero at the 10
percent level. The estimates vary considerably across MSAs, e.g. ranging from -0.9 in Las Vegas
to 2 in Boston. Many of the estimates are statistically indistinguishable from one even at the
MSA level.
[Table 5 here]
What might explain the variation in the elasticity estimates across MSAs? We
hypothesize that the availability of internet at home would increase the demand for online house
price information and the relative valuation of online house price information in determining
sales prices, i.e., the elasticity estimate. We use the household internet accessibility rate,
measured as one minus the share of households without internet and computer access in the
MSA, as our main proxy for accessibility to online house price information. Despite the sample
size of 30, the bivariate regression in Table 6 column (1) confirms that the relationship is
statistically significant at the 5 percent level. A 1 percent point increase in household internet
accessibility is associated with an increase in the elasticity estimate by 0.2. In column (2) we
include the number of residential real estate agents in the MSA, but this variable has no
significant impact and the coefficient estimate on household internet accessibility remains
unchanged. In column (3), we control for the size of the MSA by controlling for the land area
and population. The coefficient estimate on household internet accessibility decreases slightly
11
but is still significant. In column (4), we additionally control for the number of families. The
number of families conditional on the population captures the potential demand for housing in
the MSA. The coefficient estimate on the log number of families is positive and the coefficient
estimate on population is now negative. This likely suggests that cities with more families per
population demand more housing and puts more value on online price information. In column (5),
we control for the median household income and the number of adults with a college degree.
There is a weak negative relationship with median income but no significant relationship with
the size of the college educated population. We note that the coefficient estimate on household
internet accessibility increases to 32.57. Finally, we control for housing supply and the economic
condition in the city. The negative coefficient estimate on the log number of housing units
suggests that the supply of housing conditional on potential demand reduces the elasticity
estimate, potentially by reducing the relative demand for online house price information. The
coefficient estimate on household internet accessibility barely changes and is now statistically
significant at the 1 percent level.
[Table 6 here]
5. Conclusions
Our empirical results show that online property price estimates strongly predict
transaction prices. Moreover, we find suggestive evidence that indicates that such strong
predictive power may have an impact on determining actual transaction prices. Though the
literature has examined how informational advantage translates to higher home sales prices
(Levitt and Syversson 2008), we believe our paper is a novel attempt at understanding the role
information technology plays in the property market.
The high correlation between house sales prices and online price estimates potentially has
significant implications. How information is generated could have a non-negligible impact on
house prices. Also, the prevalence and accessibility of online house price information and
people's reliance on such information could influence house price dynamics. Future research that
can better identify the causal impact of information technology on property transaction prices
would be valuable.
12
Reference
Aker, Jenny C.; Mbiti, Isaac M. 2010. “Mobile Phones and Economic Development in Africa,”
Journal of Economic Perspectives 24(3): 207-232.
Bailey, Cao, Kuchler, and Stroebel. 2016. "Social Networks and Housing Markets," NBER
Working Paper, w22258.
Bajari, Patrick, Jane Cooley, Kyoo il Kim, and Christopher Timmins. 2012.“A Rational
Expectations Approach to Hedonic Price Regressions with Time-Varying Unobserved Product
Attributes: The Price of Pollution,” American Economic Review 102(5): 1898-1926.
Bayer, Patrick, Fernando Ferreira, and Robert McMillan. 2007. “A Unified Framework for
Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy 114(4):
588-638.
Bayer, Patrick, Marcus Casey, Fernando Ferreira, and Robert McMillan. 2013. “Estimating
Racial Price Differentials in the Housing Market,” mimeo.
Black, Sandra. 1999. “Do Better Schools Matter? Parental Valuation of Elementary
Education,” Quarterly Journal of Economics 114(2): 577-599.
Bloom, Sadun, and Van Reenen. 2012. "Americans Do IT Better: US Multinationals and the
Productivity Miracle." American Economic Review 102(1): 167-201.
Bresnahan, Brynjolfsson, Hitt. 2002. "Information Technology, Workplace Organization, and
the Demand for Skilled Labor: Firm-Level Evidence." Quarterly Journal of Economics 117 (1):
339-376.
Cardona, M, T. Kretschmer, T. Strobel. 2013. "ICT and productivity: conclusions from the
empirical literature." Information Economics and Policy 25(3): 109-125.
Corcoran and Liu (2014): Accuracy of Zillow's Home Value Estimates, Real Estate Issues 39
(1).
Easley, David and Maureen O’Hara. 1987. “Price, Trade Size, and Information in Securities
Markets,” Journal of Financial Economics, 19: 69-90.
Easley, David and Maureen O’Hara. 2004. “Information and the Cost of Capital,” Journal of
Finance 59(4): 1553-1583.
Fang, Lily and Joel Peress. 2009. “Media Coverage and the Cross-section of Stock Returns,”
Journal of Finance 64(5): 2023-2052.
13
Hollas, Rutherford and Thomson. 2010. "Zillow's Estimates of Single-family Home Values,"
Appraisal Journal 78 (1).
Levitt, Steven and Chad Syverson. 2008. "Market Distortions When Agents Are Better
Informed: The Value of Information in Real Estate Transactions," Review of Economics and
Statistics 90(4): 599-611.
Martínez, Diego, Jesús Rodríguez, José L. Torres. 2010. "ICT-specific technological change
and productivity growth in the US: 1980–2004." Information Economics and Policy 22(2): 121-
129.
Varian, Farrell, & Shapiro. (2004). Economics of Information Technology: An Introduction.
Cambridge: Cambridge University Press.
14
Table 1. Summary statistics
Variable Mean Std. Dev. Min Max Obs
Panel A: House level data
Sales price 321914 245299 10000 2950000 1200
Zillow estimate when sold 322056 230858 36000 2600000 1200
Zillow estimate 1 month prior to sale 320108 230967 38000 2600000 1200
Zillow estimate in the month listed for sale 324022 232077 33000 2900000 1189
List price 340012 251240 19900 2900000 1199
Number of days between listing and sales 185.1 269.9 2 3349 1195
Number of bedrooms 3.85 0.87 3 9 1200
Number of bathrooms 2.68 0.64 2 6 1199
Square footage 2373 547 2000 10890 1200
Lot square footage 8365 13936 595 304920 1199
Year built 1960 37 1810 2013 1200
Panel B: MSA level data
Land area 6040.69 4783.888 1600.9 27259.9 30
Population, 2013 4254068 3822144 1601565 1.97E+07 30
Housing units, 2013 1711144 1458024 654120 7797490 30
Number of families, 2013 1013282 876799.7 400899 4550781 30
Median household income, 2013 60789.97 11027.35 46497 90962 30
Population above 25 years old with a bachelor degree, 2013 20.74333 3.258589 12.7 26.9 30
Unemployment rate, 2013 9.943333 1.777771 7 14.7 30
Household internet accessibility, 2013 0.9149734 0.0129532 0.8945243 0.9439421 30
Number residential real estate agents, 2012 4311.6 4262.602 749 22063 30
Notes: Variables listed in Panel A were collected manually from Zillow. Panel B data was extracted from the US Census.
15
Table 2. Zillow estimate and sales price Dependent variable: Log sales price (1) (2) (3) (4) (5)
Log Zillow estimate 0.893 0.948 0.948 0.947 0.941
(0.038) (0.035) (0.035) (0.036) (0.036)
Length of listing -0.514 -0.5 -0.674 (0.842) (0.844) (0.843)
Proxy 0.21
(0.020)
Proxy format Linear Non-linear Non-linear Non-linear Non-linear Neighborhood first differencing Yes Yes Yes Yes Yes Higher order covariates Yes Yes Season fixed effects Yes A. Null hypothesis: coefficient estimate of log Zillow estimate=1 p-value 0.005 0.138 0.136 0.148 0.104 B. Null hypothesis: All coefficient estimates of the house characteristics variables=0 p-value 0.035 0.503 0.506 0.501 0.443 R-Squared 0.918 0.935 0.935 0.935 0.935 N 1200 1200 1200 1200 1200
Notes: The Zillow estimates are estimates one month prior to the time of sales. The linear proxy is list price minus the Zillow estimate around the time of listing and the non-linear proxy is the nonparametric third order polynomial approximation. Higher order covariates are third order polynomials for bed, bath, square feet, lot square feet, and year built. Robust standard errors are in parentheses.
16
Table 3. Price movements one month prior to sales (1) (2) (3) (4) Sales price - List price one month prior to sales Panel A. Full sample Full sample Drop outliers Zillow estimate 1 month prior to sales - Zillow estimate 2 month prior to sales
0.0933 0.0727 0.217 0.196
(0.0970) (0.0962) (0.111) (0.110)
Days listed -25.50 -24.53 -26.28 -25.66 (6.038) (6.269) (6.217) (6.454)
Change in MSA level Zillow estimate
0.774 0.776 0.632 0.619 (0.184) (0.184) (0.177) (0.177)
Time fixed effects Y Y Observations 801 801 769 769 R-squared 0.073 0.084 0.074 0.083
Notes: Observations where the change in Zillow estimates between the month of sales and 3 months prior to sales was greater than $100,000 are dropped in columns (3) and (4). Robust standard errors are in parentheses.
17
Table 4. Zillow estimates and list prices
(1) (2) (3) (4) List price List price Ln(list price) Ln(list price) Zillow estimate 0.00286 (0.0353) Zillow estimate one month ago 0.000661 (0.0158) Ln(Zillow estimate) -0.0210 (0.0330) Ln(Zillow estimate one month ago) -0.00261 (0.0138) House fixed effects Yes Yes Yes Yes Observations 2,257 1,620 2,256 1,619 R-squared 0.023 0.042 0.016 0.023
Notes: Robust standard errors are in parentheses.
18
Table 5. Zillow estimate and sales price: elasticity estimates by MSA
MSA Elasticity estimate p-Value: Elasticity
estimate=0
p-Value: Elasticity
estimate=1
p-Value: All coefficient
estimates on covariates=0
R2 N
Atlanta 0.976 (0.234) 0.004 0.920 0.500 0.976 40 Baltimore 0.201 (0.297) 0.518 0.518 0.113 0.726 40 Boston 2.007 (0.263) 0.000 0.007 0.001 0.934 40 Charlotte -0.169 (2.926) 0.955 0.699 0.942 0.367 40 Chicago 1.000 (0.115) 0.000 0.997 0.701 0.969 40 Cincinnati 0.857 (0.290) 0.025 0.640 0.881 0.801 40 Columbus 0.513 (0.303) 0.125 0.142 0.906 0.600 40 Denver 1.139 (0.239) 0.001 0.575 0.960 0.901 40 Las Vegas -0.901 (0.755) 0.267 0.036 0.222 0.036 40 Los Angeles 0.799 (0.142) 0.001 0.197 0.179 0.940 40 Miami-Fort Lauderdale 0.928 (0.355) 0.031 0.845 0.288 0.900 40 Minneapolis-St. Paul 0.894 (0.148) 0.000 0.495 0.074 0.914 40 Nashville 1.095 (0.192) 0.000 0.631 0.476 0.770 40 New York 0.973 (0.501) 0.088 0.959 0.513 0.910 40 Orlando 0.587 (0.220) 0.026 0.093 0.714 0.835 40 Philadelphia 1.282 (0.344) 0.007 0.440 0.300 0.465 40 Phoenix 0.563 (0.361) 0.154 0.257 0.638 0.470 40 Pittsburgh 1.056 (0.120) 0.000 0.652 0.599 0.988 40 Portland 0.853 (0.491) 0.126 0.774 0.687 0.886 40 Providence-Warwick 1.617 (0.616) 0.034 0.350 0.689 0.570 40 Riverside 0.248 (0.385) 0.536 0.083 0.224 0.513 40 Sacramento 0.280 (0.584) 0.642 0.246 0.812 0.600 40 San Diego 1.656 (1.540) 0.314 0.681 0.316 0.232 40 San Francisco 0.476 (0.441) 0.311 0.269 0.170 0.458 40 San Jose 0.884 (0.195) 0.002 0.568 0.039 0.555 40 Seattle 0.749 (0.255) 0.022 0.358 0.627 0.628 40 St. Louis 0.305 (0.220) 0.202 0.013 0.209 0.797 40 Tampa 1.291 (0.148) 0.000 0.086 0.260 0.172 40 Virginia Beach 1.628 (0.675) 0.042 0.379 0.628 0.685 40 Washington DC 0.470 (0.222) 0.067 0.044 0.565 0.696 40
Notes: Regression for each MSA uses the specification in column (5) of Table 2. Standard errors are in parentheses.
19
Table 6. Factors that explain the elasticity estimates across MSA’s
(1) (2) (3) (4) (5) (6) Elasticity estimate (Effect of online price estimates on sales prices)
Household internet accessibility 22.28 22.57 19.00 18.02 32.57 31.73 (9.244) (9.412) (10.22) (9.191) (13.34) (10.83)
Ln(number of residential real estate agents)
-0.0342 -0.334 -0.244 -0.278 -0.160 (0.0920) (0.311) (0.288) (0.352) (0.326)
Ln(land area) -0.141 -0.168 0.00672 -0.107 (0.201) (0.165) (0.252) (0.243)
Ln(population) 0.454 -3.472 -2.824 0.0484 (0.407) (1.304) (1.925) (2.175)
Ln(number of families) 3.920 3.283 5.156 (1.286) (1.773) (1.312)
Ln(median household income) -2.061 -3.701 (1.056) (1.034)
Ln(number of college degree holders)
1.203 1.376 (0.994) (0.999)
Ln(housing units) -4.896 (2.530)
Unemployment rate -0.0825 (0.0872)
Observations 30 30 30 30 30 30 R-squared 0.246 0.248 0.289 0.406 0.522 0.636
Notes: The elasticity estimates for each MSA in Table 5 are the dependent variable. All control variables are for 2013, except for the number of real estate agents, which is for 2012. Household internet accessibility is 1 minus the share of households without internet and computer access. Robust standard errors are in parentheses.