View
0
Download
0
Category
Preview:
Citation preview
Measuring the Coworker Effects on Wages∗
Jianhong Xin†
Saturday 9th January, 2021
[Click for Latest Version]
Abstract
The fast growing literature studying the impact of co-workers on individual’s wageshas recently made significant progress by developing techniques that allowed it tomove from small and idiosyncratic case studies to more generalizable studies based onlarge labor markets. However, I show that the empirical methodology underlying thisshift delivers a large positive or negative bias in measured co-worker effects in realisticsettings. I combine insights from the assortative matching theory with recent computerscience advances in graph embedding techniques to develop a machine learning methodthat allows researchers to obtain efficient and unbiased estimates in those settings. Theproposed method allows to non-parametrically measure the potentially heterogeneousimpact of different co-workers on individuals’s wages. I am currently using the proposedmethod to measure co-worker effects in the matched employer-employee panel datacovering the entire population of Denmark.
Keywords: Coworker Effects, Two-Sided Unobserved Heterogeneity, Assortative Match-ing, Machine Learning, Graph Embedding, Matrix Completion.
∗I am deeply indebted to Iourii Manovskii, Marcus Hagedorn, Dirk Krueger for their invaluable guidanceand support. I would like to thank Harold Cole, José-Vı́ctor Ŕıos-Rull, Andrew Postlewaite, Xu Cheng,Guillermo Ordonez, Hanming Fang and all other seminar participants at University of Pennsylvania and2020 Joint Statistical Meetings.†University of Pennsylvania, Department of Economics, The Ronald O. Perelman Center for Political
Science and Economics, 133 South 36th Street, Philadelphia, PA 19104. Email: jxin@sas.upenn.edu.
https://economics.sas.upenn.edu/people/jianhong-xin
1 Introduction
How does the wage of a worker depend on where she works and whom she works with? How
to disentangle the contribution to wages the unobserved components of worker’s individual
productivity, her firm productivity and her coworkers’ productivities? What is the magnitude
of the complementarity between the productivity of the worker and the firm? What is the
magnitude of the complementarity between the productivities of coworkers? How to predict
the potential wage and output of any worker relocated to any firm she never worked at and
with coworkers she might have never encountered? Measuring coworker effects on wages at
the scale of a local labor market provides the key to answer these questions. The empirics of
coworker effects also pave the way for subsequent research, for instance, to investigate the
efficiency of the labor market allocation: Method that allows to predict wages and outputs in
a counterfactual meeting of any workers across any firm is capable of projecting the optimal
assignment of workers against search frictions and can shed light upon the design of policies
such as the taxation and unemployment insurance system to achieve the optimal assignment.
To estimate coworker effects, the central empirical challenge is well-known as the selection
problem (Manski (1993), Brock and Durlauf (2001), Angrist (2014), Bramoulle et al. (2020)).
The problem is rooted in the sorting in the labor market: workers may be endogenously
matched into peers across firms and occupations. The sorting may be based on similar
productive attributes or common external causes, which would potentially confound the
peer effects. Moreover, a big proportion of these attributes may not be observed in the data,
considering that observable worker and firm characteristics only account for small fraction
of the observed variation in wages. 1
The conventional approach to circumvent the selection problem is to instrumentalize the
exogenous sources of variation in exposure to coworker influences. Researchers can random-
ize peer components or treatment affecting outcome through field experiments (Sacerdote
(2001), Duflo et al. (2011), Banerjee et al. (2015), Eckles et al. (2016) and Feld and Zölitz
(2017)) or have to rely on a variety of strong and context-specific exogeneity assumptions
for observational data. 2 Perhaps not surprisingly, these empirical research have been lim-
1A large number of studies have found evidence that workers with similar unobservable productive at-tributes systematically self-select across firms and become peers. To name just a few, Andrews et al. (2012)find positive correlation between worker and firm contributions to wage for German data. Hagedorn et al.(2017) identify sorting based on unobserved characteristics in both workers and firms and find positive as-sortative matching (PAM), which has reinforced the previous finding. Abowd et al. (2018) find unobservabledifferences in worker productivity are strongly positively correlated with unobservable differences in firmproductivity across sectors. Crane (2014) find productive, fast-growing firms tend to hire more productiveworkers using U.S. census data. Mendes et al. (2007) document PAM for Portuguese data. Borovikova andShimer (2017) find high wage workers work for high wage firms for Austria.
2For example, models studying peer effects in classroom typically assume idiosyncratic variations in peer
1
ited to small and idiosyncratic case studies. While it is difficult to apply their methods to
investigate coworker effects at a broader scale, the empirical findings in these studies exhibit
a large extent of heterogeneity, making it difficult to generalize their outcome for the whole
market as well. 3
Yet, there is a recent growing interest in the literature moving beyond these case studies
to investigate coworker effects in large labor markets. In a recent advance, Cornelissen et al.
(2017) (henceforth CDS) are the first to estimate the average coworker effects on wages for the
whole labor market in Germany. They proposed an empirical method to measure the average
worker’s wage response to a change of her coworker quality with a linear-in-means model.
Building on the worker and firm fixed effects model pioneered by Abowd et al. (1999) (AKM
hereafter), the method accounts for selection into peer groups by including additive worker
and firm fixed effects. Taking the approach of Arcidiacono et al. (2012), the method estimates
coworker effects emanating from unobservable characteristics by iteratively estimating the
focal worker’s ability, the firm ability, and the spillover coefficient mapping her own wage
response to her coworker quality measured by the average of their fixed effects. Similar
strategy based on including two-way additive fixed effects are also adopted by Hanushek
et al. (2003), Betts and Zau (2004), Lavy et al. (2012), Burke and Sass (2013) and Sanz de
Galdeano (2020).
However, this method could be confronted with two challenges. First, by assuming linearly
additive fixed effects, the model cannot capture potential interactions between unobserved
heterogeneity of agents on both sides of the market. Particularly, wages that are monotone
in worker and firm productivity are inconsistent with standard sorting models where the
complementarity between worker and firm productivities play a key role (e.g. Becker (1973),
Eeckhout and Kircher (2011), Gautier and Teulings (2006), Lopes de Melo (2013), Lentz
(2010), Lise et al. (2016), Hagedorn et al. (2014)). The estimated worker and firm fixed effects
therefore may fail to correctly reflect the true worker abilities and firm productivities when
these attributes are truly complementary in the underlying production process. 4 Based on
components across cohorts (Hoxby (2000)). Models studying peer effects on networks are often abstractfrom correlated effects based on unobservable characteristics. The typical assumption is the network andobservables are exogeneous to the outcome (Bramoulle et al. (2009)), or the endogenous formation of thenetwork is only conditional on observed characteristics (Bramoulle et al. (2020)). These assumptions abstractfrom correlated effects based on unobserved heterogeneity.
3Mas and Moretti (2009) for instance, find strong evidence of positive productivity spillovers conditionalon the physical presence of coworkers in a supermarket chain, while in a different scenario, Bloom et al.(2014) conclude quite the opposite. Waldinger (2009) investigates spillover among university researchers andfound no evidence for localized peer effects, in sharp contrast to the findings of Herbst and Mas (2015).
4From a theoretical point of view, Gautier and Teulings (2006) and Eeckhout and Kircher (2011) showthe role of the complementarity in determining the sign and magnitude of sorting and its implication forreallocation. From an empirical point of view, Hagedorn et al. (2017) non-parametrically estimate the wageand output profile and find overall positive assortative matching between workers and firms and positive
2
these estimates, the CDS method can induce a sizable misspecification bias in the measured
coworker effects: I show that the bias can be large, with its sign being either positive or
negative, depending on the strength and sign of the wage complementarity.5
Second, the adoption of linear-in-means model only captures average coworker effects.
Despite its trackability, the linear-in-means model cannot reconcile the vast heterogeneity
in coworker effects documented in the empirical literature: across various occupations and
industries, different workers are found affected by their coworkers differently. 6 The second
challenge stems from the fact that coworker influences are heterogeneous and these hetero-
geneities may not have been observed in the data. However, there has been scarce discussion
in the literature on estimating heterogeneous coworker effects based on unobservables.
In this paper, I propose a new semi-parametric methodology to measure the effects of
coworkers on wages. Combining economic theory and recent advances in machine learning,
the method considers the dependence of wages on the heterogeneous productive attributes of
both workers and firms that are either observed or unobserved. First, the method delivers an
efficient and robust estimation under the presence of potential complementarities between
these attributes on both sides of the market. Second, beyond linear-in-means models, the
method is also the first to capture the heterogeneous coworker effects based on unobserved
heterogeneity.
The main idea to non-parametrically identify the wages that are non-linear in the com-
plementarities and the coworker effects is to partition the set of workers into clusters where
workers inside a group are similar to each other in their productivities and coworker influ-
ences. Then at cluster level, I estimate the complementarities between each worker clusters
and firms as well as the coworker effects between worker clusters.
The identification consists of two stages. In the first stage, I identify similar workers and
cluster workers based on their similarity. To achieve this goal, I develop a method in the
spirit of Hagedorn et al. (2020) but taking into account the potential time-varying impact
from the evolving set of coworkers. The key underlying assumption is that wages are driven
by the unobserved “types” of the worker, the firm she matched into and the “types” of
her coworkers. Here, “types” are interpreted as a time-invariant unobserved membership
of groups with certain productive worker attributes that govern the worker’s productivity
complementarity between their producitivities in output, in sharp contrast to additive firm and worker fixedeffect model would imply.
5Specifically, “wage complementarity” refers as the wage component that is specific to the combinationof the productivity type of the worker and the firm in the match.
6There are two other major problems with the linear-in-means model received by the literature: First, themodel is not most interesting for it constrains the net effect from regrouping peers thus the estimation hasonly limited policy implications. Second, the linear-in-means model is often misspecified and not supportedby numerous empirical studies (Hoxby and Weingarth (2005), Sacerdote (2011)).
3
and her peer influence on coworkers. The vision is that there could be a large number of
unobserved types of workers and their coworkers that meaningfully determine the observed
wages in the data, and the core of the method is to partition workers of a large number of
types into a relatively small numbers of clusters so that each cluster is populated with workers
of similar types. I leverage the insight featured by most sorting models in the literature: two
workers with similar attributes and similar working history, must earn similar wages in the
same firm influenced by the same set of coworkers. These restrictions allow me to identify
pairs of workers with similar unobserved types by comparing their wages observed in the
same firm at each point of time.
Based on a worker partition, the method enables to counter-factually predict the “out-
of-sample” wage of any worker relocated to any firm she never worked at and with coworkers
she might have never encountered, using the wages of other workers belonging to her “sim-
ilar worker cluster” in the target firm to predict. The clustering-based estimation can be
viewed as a “matrix completion” process that allows to measure the global consequence of
wages (and other potential economic outcomes) between every worker and every firm with
any combinations of coworkers after a reallocation. Methodology-wise, completing the wage
matrix is also important for two reasons. First, it allows to further evaluate the similarity
of workers who has not been directly worked as coworkers in the same firm by comparing
their “completed wages” based on wages of other workers in their cluster. Second, it enables
to validate the quality of the clustering through testing the accuracy of the out-of-sample
predictions.
In the data, however, most individual workers have been working for only a few firms
with limited number of coworkers, and not many similar workers have been working together
in the same firm. This posits a sparsity problem to the proposed method since wages are only
comparable within firms implied by the theory. To account for the sparsity problem, I adopt
a hierarchical clustering approach to group workers in an iterative manner. I start with the
most restrictive criteria only joining pairs of workers with the minimal average wage difference
throughout their coworkership. Despite the limited amount of initial merges due to the
strictness of the criteria, joining these workers into bigger clusters can mitigate the sparsity
problem through expanding the connectivity of coworker network and the set of available
comparisons. This is because now I can make comparisons not only between a worker and
her immediate coworkers in the same firm, but also between her and coworkers of another
individual from her cluster encountered at different firms, based on her wages “completed”
by the cluster average. I continue to merge similar workers until no more workers are left
to be merged by the current criteria. Then, I iteratively move through a set of progressively
relaxed similarity criterion allowing for merging workers within a increasingly bigger cutoff
4
for their wage differences. The process delivers a hierarchical sequence of worker clustering.
There is an apparent trade-off when to terminate the algorithm: as the procedure iter-
ates forward joining more individuals into larger clusters, the connectivity of the coworker
network become increasingly denser. The method is therefore able to make predictions for
a wider range of workers as larger sets of comparable workers now become available. In
the meanwhile, however, the accuracy of the prediction gradually deteriorates, subject to
a greater “approximation error” since the purity of each cluster become increasingly con-
taminated with including workers with lower similarity. To best balance the the trade off, I
resort to machine learning: I split the data into three random subsets: the training set, the
validation set and the test set. I estimate the hierarchical sequence of worker clustering only
based on observations in the training set. Then the optimal worker partition chosen in the
sequence of worker clustering as the one that best predicts observations in the validation set.
For evaluation purpose, the performance of the estimated worker partition is validated and
reported using the test dataset.
In the second stage of this approach, I estimate the wage complementarities and coworker
effects given the worker clustering obtained from the first stage. The essence to account for
the potential self-selection problem in estimating coworker effects is to control for the wage
complementarity that governs the sorting of workers across firms based on their unobserved
productivity types. To control for the wage complementarity empirically, I include two-
dimensional “match fixed effects” cross-indexed by the identity of the individual’s worker
cluster and the identity of the individual firm in the match. Moving beyond the AKM
approach relying on restrictive worker and firm fixed effects decompositions, the proposed
method non-parametrically identifies wage complementarity reflecting non-linear interactions
during each type of match.
Conditional on the wage complementarity, I non-parametrically identify coworker effects
utilizing the focal worker’s wage response to variations in her coworker productivity distri-
bution driven by their job mobility. Here, the important identification assumption is that
conditional on the productivity of the worker, the productivity of the firm and the pro-
ductivity of all her coworkers, the wages of the worker are exogenous to her coworker job
mobility. Empirically, I approximate the coworker productivity distribution for each indi-
vidual with discrete “coworker components”, the mass function depicting the measure of
the coworkers assigned to each worker bin. Then I estimate coworker effects function by
projecting the wages of the worker on her coworker components conditional on the match
effects. Each coefficient captures the coworker influence, the elasticity of the wage response
to the coworker share of each productivity type. Beyond the linear-in-means model, the
proposed method non-parametrically estimates heterogeneous coworker influences, as these
5
coefficients are unrestricted by functional form assumptions. In an important extension of
the method, I further generalize the framework that enables to measure complementarities
between coworkers. I estimate a two-dimensional coworker effect function that allows for
asymmetric coworker influence from one clusters to the other. In the general framework, I
show the identification of the worker clustering follows the identical machine learning algo-
rithm, and the two-dimensional coworker effects can be estimated separately for the focal
workers in each worker clusters.
I am now taking the algorithm to the administrative matched employer-employee data
of Denmark, which covers a population of 3 million workers over a span of 20 years. De-
spite its high accuracy, the proposed hierarchical clustering algorithm is relative slow for big
dataset: its computational complexity is quadratic in space and cubic in time. To accommo-
date the demand for scalability, I integrate the baseline method with Graph Convolutional
Networks (GCN), a computer science advance in graph embedding techniques (Kipf and
Welling (2016), Hamilton et al. (2017)). In recent years, GCN-based graph embedding tech-
niques have enjoyed tremendous success in multi-discipline applications ranging from natural
language processing, knowledge graphs to online recommender systems (Xu et al. (2018), Cai
et al. (2018), Zhou et al. (2018), Wu et al. (2019)). The fundamental idea is to represent each
node of the graph by a vector in a low dimensional space such that the similar pair of nodes
are embedded close in the space. However, while these graph embedding methods exhibit
high performance in computations, I found the accuracy of these methods are limited when
tested with simulated data.
Following Hagedorn et al. (2020), I integrate the baseline hierarchical clustering algorithm
with the graph embedding techniques with a divide-and-conquer strategy. The “dividing”
step computes worker embeddings using GraphSAGE and group closely embedded workers
and divide them into separate subsets. The “conquering” step applies the baseline hierarchi-
cal clustering algorithm only to each local subset: when the dividing step is relatively accu-
rate, only similar workers are assigned into each cluster. Therefore, the divide-and-conquer
strategy can significantly reduce the dimension of the problem by erasing voluminous redun-
dant comparisons without any compromise of accuracy.
This paper is related to multiple strands of the literature. The first contribution is to
the fast-moving research on peer effects using large matched-employer-employee data. I de-
veloped a parsimonious machine-learning-based approach that enables reliable and testable
results. My framework extends Cornelissen et al. (2017) by allowing wages to reflect flexi-
ble worker-firm complementarities and capture heterogeneous peer effects across unobserved
worker productivities. Moving beyond homogeneous peer effects, Sanz de Galdeano (2020)
find heterogeneous peer effects across observed characteristics for MEE data for Brazil, tak-
6
ing a similar approach to Arcidiacono et al. (2012) and Cornelissen et al. (2017). In parallel
with the literature focusing on contemporaneous peer effects, Herkenhoff et al. (2018) and
Jarosch et al. (2019) find asymmetric flow of knowledge spillover from high wage workers
to low wages workers over years. As for evaluating the efficacy of the estimate. Eckles and
Bakshy (2020) conduct a constructed observational study by comparing the prediction of
observational estimator of peer behavior based on a nonexperimental control group to a
randomized experiment. My method takes a machine learning approach and evaluate out-of-
sample prediction on the test set. Second, the paper contributes to the literature to identify
labor market sorting based on unobserved heterogeneity. Bonhomme et al. (2019), Lentz
et al. (2018) propose random-effect-based approach to bicluster workers and firms based on
wage distribution. Complementary to these existing methods, my approach delivers in finite
samples precise and accurate counterfactual predictions for any individual worker if allo-
cated to any firm conditional on the set of coworkers. The third contribution of the paper
is to the literature on team productions. worker-firm sorting and wage complementarities
separately from the wage effects of coworkers. With additional assumptions on bargaining
protocol and the market structures, my method also provides to non-parametrically estimate
worker production in teams given observed wages, which is an equilibrium object and can be
non-parametrically inverted for outputs. In contrast, Bonhomme (2020) quantify individual
contribution given observed team output. Finally, the paper contribute to search frictions
and assortative matching literature. To my knowledge, this paper is the first to jointly es-
timate the complementary between worker and firm productivity taking into account the
coworker effects.
The rest of the paper is organized as follows. Section 2 setups the environment and il-
lustrate the extent of the misspecification bias in measured coworker effects if the standard
assumption that wages are linear in worker and firm effects is violated in the data. Sec-
tion 3 propose a novel machine learning based approach and apply to extended framework
of Cornelissen et al. (2017). The extension allows for the worker-firm complementarity and
captures heterogeneous peer effects. Section 4 presents simulation results to show the effi-
ciency and efficacy of the algorithm. Section 5 integrate my baseline method with the recent
graph embedding techniques to achieve scalability. Section 6 concludes.
2 Background
This section introduces the framework of Cornelissen et al. (2017), the first leading empirical
methodology that attempts to estimate peer effects for a large local labor market. Then I
show the method is not robust at the presence of the wage complementarity between workers
7
and firms, which could induce a sizable misspecification bias in the measured coworker effects.
2.1 Environment
In the matched employer and employee data for workers I = {1, ..., N} and firms J ={1, ...,M}, we can keep track of the wages for each individual worker i ∈ I when he ismatched into firm j ∈ J in year t ∈ T . For every observed match in each period (i, j, t),denote the log wages as wijt and match indicator mijt = 1, otherwise mijt = 0. Individual
worker can have up to one job in each year. The set of workers for each firm j at the same
period t are referred to as peer group Pjt = {i ∈ I | mijt = 1}. Denote Njt = |Pjt| − 1.Denote the coworker set for a reference worker i as P∼i,jt = {i′ ∈ I, i′ 6= i | mi′jt = 1}.
2.2 The Method of Cornelissen et al. (2017)
Cornelissen et al. (2017) is the leading empirical method to measure peer effects based on
matched employer-employee data. To account for the selection problem, i.e. the endogenous
sorting of high-productivity workers into high-productivity peer groups or high-productivity
firms based on unobserved attributes, CDS extend the worker and firm fixed effects model
of Abowd et al. (1999) by including control variables and multiple fixed effects. The goal is
to estimate the following wage equation: 7
wijt = αi + φjt + γᾱ∼i,jt + �ijt. (1)
αi is a worker fixed effects for worker i to capture permanent worker ability and φjt is
time-varying firm fixed effects for firm j for time t. The model captures peer effects in a
“linear-in-means” setup: Here, term ᾱ∼i,jt is peer productivity defined as the average worker
fixed effect in the peer group, computed by excluding individual i:
ᾱ∼i,jt =1
Njt
∑i′∈P∼i,jt
αi′
The spillover coefficient γ measuring the coworkers’ impact on wages, is the key parameter
of interest.
7For expositional clarity, this wage equation is simplified relative to the CDS by abstracting from occu-pation fixed effects and from the influence of other observable time-varying characteristics, as they do notaffect the conclusion of this section.
8
2.2.1 Homogeneous peer effects
Two underlying assumptions in CDS are restrictive. The first one is that peer effect function
is homogeneous. This is inconsistent with heterogeneity found in empirical studies.
2.2.2 Worker-firm complementarity
The second important one inherited from AKM is that the worker-firm wage component can
be separably additive as a worker fixed effect and a time-varying firm fixed effect:
αi + φjt.
The first implication of this underlying assumption is the wage gap between two coworkers
is a constant: if worker i is more productive and gets a higher wage than his coworker i′ for
one firm j, he is expected to be so for all other firm j′ in the economy, irrespective to the
productivity of the firm :
E(wijt − wi′jt) = (αi − αi′)(
1− γNjt
), ∀j ∈ J .
Second, the high wage firm would always pay a high wage premium: for two firms j and j′
with equal peer productivity ᾱ∼i,jt = ᾱ∼i,j′t, the expected wage difference is independent of
the productivity of the worker employed:
E(wijt − wij′t|ᾱ∼i,jt = ᾱ∼i,j′t) = φjt − φj′t, ∀i ∈ I.
These two assumptions combined would rule out the interdependence of worker and firm
productivity in wages, so that there’s no gain for firms to select the right job applicants,
nor is there extra credit for the job seeker to select a best job. These implications are
inconsistent with most structural models in the assortative matching literature where the
production function is such that it is optimal to sort workers to firms where joint output is
maximized.
In more realistic settings, however, wages can reflect the effect of complementarities
between worker and firm productivity. The worker ability could be complementary (or sub-
stitutable) to the firm productivity such that a high-ability worker become more (or less)
productive moving from a low-productivity firm to a high-productivity firm comparing to a
low-ability worker. In consequence, the inter-dependence of worker and firm productivity give
rise to positive (or negative) assortative matching, i.e. high-productivity workers sorted into
high-productivity (or low-productivity) firms with other high-productivity colleagues in the
9
equilibrium outcome. Hagedorn et al. (2014) for instance found positive assortative match-
ing in German administrative data, in alignment with theories that attribute sorting to the
worker-firm complementarity.
The complementarity has two important implications. First, it induces sorting between
workers and firms which posits a well-known empirical challenge of “selection problem” in the
estimation of coworker effects. The selection problem arises when the cohort of coworkers
are not formed at random. Second, the inter-dependence of worker and firm productivity
implies that wage cannot be decomposed in an additively separable worker and firm fixed
effect, which is in sharp contrast to the specification of CDS.
2.2.3 Misspecification bias
The CDS method cannot correctly account for the selection problem induced by the the
complementarities between workers and firms. The misspecification bias can be sizable, with
its sign being either positive or negative, depending on the magnitude and sign of the com-
plementarity.
Data generating process To illustrate the misspecification bias, I present the perfor-
mance of CDS estimator applied to wages simulated from an alternative simple data gen-
erating. In contrast to Equation (1), the wage does reflect the complementarities between
workers and firms:
wift = w(αi, φf ) = (αρi + φ
ρf )
1/ρ. (2)
Here, each worker i ∈ I is endowed with a permanent latent productivity αi each firmf ∈ J a permanent latent productivity φj on entering the market. Both αi and φf areindependently drawn from the standard uniform distribution and cannot be observed in the
data. Importantly, I focus on the case where the wages incorporates no coworker effect: wages
are solely determined by the productivity of the worker αi and firm φf in the match, but
not by coworkers. Substitution parameter ρ controlling the curvature of wage function w,
representing the magnitude of the complementarity. In particular, when rho = 1, Equation
(2) degenerates to (1) with the corresponding γ = 0. To generate realistic peer selction and
positively associative matching, assume the worker and firm matches if and only if
|αi − φf | < 0.1.
10
The rest of the model follows a basic search and matching paradigm. The model use
standard calibration with moments of the labor market mobility. The details is delegated to
Section 4.1.
Bias in finite sample The simulation result of estimator γ from equation (1) shown in
Figure (2). The 95% bootstrap confidence interval is constructed using B = 100 bootstrap
samples.
Note that when ρ = 1, the worker and firm productivity x and y are linearly additive. In
this case, the estimator γ̂ correctly recover the true magnitude of peer effect, i.e. the true value
of γ = 0. In alternative cases, I simulate data for four different values of ρ ∈ {−3,−1, 0.5, 1.5}.In each case, the estimate γ̂ is subject to mis-specification bias. The size and sign of the bias
is dependent on the strength of complementarity measured by |ρ− 1|: when ρ > 1, the signof the bias is upward γ̂ > 0. When ρ < 1 its sign is downward γ̂ < 0.
Figure 1: Eγ̂ 6= 0 when ρ 6= 1.
Asymptotic performance CDS estimator γ̂ does not converge to the true γ = 0 asymp-
totically. The misspecification bias does not vanish asymptotically as the length of simulation
T approaches infinity.
11
Figure 2: γ̂ is inconsistent.
CDS method detect a negative spillover parameter in the simulation where the true one
is zero. To the intuition of the bias can be illustrated by contradictions: if we restrict the
spillover parameter to be zero in (1), it amounts to fit an AKM regression with worker and
time-varying firm effects only,
wift = αi + φft + uift
then a negative correlation could be found between the peer quality α̂−i,ft and the regres-
sion residual uift. The negative correlation implies if withhold the restriction, the spillover
parameter can be lowered to a negative number to better fit the model. That is in the case
of (1):
γ =cov(ᾱ−i,ft, uift)
var(ᾱ−i,ft)< 0.
The correlation is negative is because of the presence of complementarity, the estimated
constant worker fixed effect tend to underpredicted the wage for a high-productivity worker
while overpredict for a low-productivity worker. Focusing only on the within-peer-group
variations the identification utilize, the regression residual is systematically higher for better
workers, who are innately paired with worse coworkers within the same peer group.
The example implies that estimating the match component w(x, y) is vital for estimating
coworker effects. It controls for variations in wages accounted by the movement of non-
observable worker and firm characteristics that may endogenously correlated with unobserved
coworker attributes. If complementarity in w(x, y) is correctly measured, estimating coworker
12
effect will be free from the selection problem. This is the main focus of the paper and will
be delivered in the next section.
3 Machine-learning based Approach
In this section, I propose economic-theory-based machine-learning approach to estimate wage
peer effects in a more generalized framework by extending Cornelissen et al. (2017) in two im-
portant directions: First, the framework allows wages to reflect the flexible interdependence
of worker and firm productivities, so that complementarities or substitabilities between them
can be well captured. In accordance with assortative matching theories, these complemen-
tarities can potentially account for the endogenous peer selection. Second, the framework
allows for heterogeneous coworker effects across workers with different productivity levels.
By doing so, the model can reconcile the mixed empirical findings in peer effect literature
where different workers may exhibits heterogeneous impact on the wage of their coworkers
in various scenarios.
In addition to these two relaxations, my method provides to make precise counterfactual
prediction of the wage one individual if reallocated to any firm in the economy with a
different set of coworkers. From a macroeconomic perspective, the wage is an equilibrium
object of a structural model and can be non-parametrically inverted to all other equilibrium
outcome such as output and productivity. Thus, being able to empirically estimate the
complementarities and coworker effects on wages, researchers can make further inference
of complementarities and coworker effects in these outcomes as well. It opens the gate to
address substantive questions: for instance, how to assess labor market efficiency and how
to design policies to achieve the efficient assignment of coworkers. Of course, answering such
questions involves making additional assumptions of the labor market, regarding the market
structure, the bargaining protocol, etc.
3.1 The Framework
The goal is to jointly estimate worker-firm complementarities and heterogeneous coworker
effects in wages in the following framework:
wift = w(xi, yf )︸ ︷︷ ︸match effects
+1
Nft
∑j∈P−i,ft
a(xj)︸ ︷︷ ︸coworker effects
+ νift. (3)
13
where wift are observed wages in a large matched employer-employee dataset. xi is the
unobserved productivity of worker i and yf the latent productivity for firm f . Productivity
xi and yf are drawn at the born of the worker and vacancy from the exogenous distribution
whose support can be normalized to the closed unit interval
xi ∈ X = [0, 1], yf ∈ Y = [0, 1],
and remain constant over time. Denote match effects w(xi, yf ) the component that captures
the complementarity between worker producitivity xi and firm productivity yf in wages.
Denote P−i,ft the (self-exclusive) coworker set for worker i’ in peer group. Peer groups aredefined by the set of workers at workplace (firm) f at time t, therefore can be indexed by
(f, t). DenoteNft = |P−i,ft| the number of coworkers. Disturbance νift captures the variationsaccounted by all other factors, satisfying that E
[νift∣∣ xi, yf] = 0.
The key object of interest is the match and coworker effects component w(xi, yf ) and
a(xj). The first key underlying assumption is that wages are not determined by the identity
of the worker i, firm j or coworker j conditional on their productivity xi, yf and xj. The
second key assumption is that both mapping w and a are both finite and continuous mapping
defined on compact set:
w : X × Y →cts
R, a : X →cts
R.
I start with baseline model (3) as it is an immediate generalization of the CDS method
and thus can be a good benchmark.8 Importantly, I do not impose restrictive function form
assumptions on the match and coworker effect component w(x, y) and a(x), so that w(x, y)
can flexibly reflect arbitrary interactions between worker and firm productivity, and a(x) can
capture the heterogeneous peer effects between coworkers.
The example in Section 2 implies that estimating the match component w(x, y) is vital
for estimating coworker effects a(x), as it controls for variations in wages accounted by the
movement of non-observable worker and firm characteristics that may endogenously corre-
lated with unobserved coworker attributes. Once w(x, y) is correctly measured, estimating
coworker effect a(x) will be free from the selection problem.
8In specific, when the match and spillover component are both linear
w(x, y) = x+ y, a(x) = γx
equation (3) degenerates into CDS specification (1). Similar to CDS, my framework is abstract from en-dogenous peer effects: own wage is independent of peer wage conditional on the productivity of the worker,her firm and her coworkers. Therefore, the method has bypassed the highlighted challenge of the refectionproblem as well as distinguishing between the endogenous and exogenous peer effects in the literature sinceManski (1993).
14
3.1.1 Identification
When the types of workers {xi}i∈I and firms {yf}f∈F are observable, coworker effects functiona(x) in equation (3) can be identified when identification assumption holds:
Assumption 3.1. Identification: For all workers i and her coworkers ∀j ∈ P−i,ft:
νift ⊥⊥ h−i,ft(k)∣∣∣∣ xi, yf , {xj}j∈P−i,ft ,
where h(xj) is the measure of her coworkers j belonging to productivity type xj = k:
h(k) =1
Nft
∑j∈P−i,ft
1{xj = k}.
Identification assumption (3.1) states that conditional on the type of the worker, the firm
and her coworkers, the wages νift are exogenous to the mobility of her coworkers. Impor-
tantly, identification holds under this assumption for a general wage function incorporating
contemporaneous coworker effects as follows:
wift =gn(xi, yf , {xj}j∈P−i,ft) + νift (4)
where n is the size of the peer group; {xj}j∈P−i,ft is the collection of coworker productivity.
Theorem 3.1. Under Assumption (3.1), general wage equation (4) is identified.
The baseline (3) is identified as it is a specific form of equation (4). 9 The identification
holds given both types of workers and firms x and y being observed. The greater challenge is
how to measure the match and spillover functions when these types are not observed. This
is discussed in the next session.
9One concern regarding Assumption (3.1) is that there could be potentially peer-group level shocks thatsimultaneously affect wages and coworker components. Then Assumption (3.1) is violated and the estimationfor (3) may lead to biased results. The issue can be easily fixed using “within estimator” w, a, i.e. by estimating(3) conditional on time-varying peer-group fixed effects:
wift = w(xi, yf ) +1
Nft
∑j∈P−i,ft
a(xj) + Zft + �ift. (5)
(16) is identified when
�ift ⊥⊥ h−i,ft(xj)∣∣∣∣ xi, yf , {xj}j∈P−i,ft .
The identification only utilizes movements of wages and coworker composition within peer groups.
15
3.2 The Method
I develop an economic-theory-based semi-parametric approach to estimate the wage function
(3). The method can be viewed as an extension of the non-parametric method proposed by
Hagedorn, Manovskii and Xin (2020), to allow for the coworker impact on wages. The main
take from their work is that wage function can be non-parametrically estimated by grouping
workers with similar unobserved productivities with a hierarchical clustering approach. From
a machine learning perspective, non-parametrically estimate w(xi, yf ) and a(xj) be viewed
as a matrix completion problem: the goal is to best predict wages if counter-factually match
any worker i ∈ I into any firm f ∈ F , based on wages in observed matches {i, f, t, wift} inthe matched-employer-employee dataset, and under the constraint of (3).
Taking the coworker effects into account, the wages of the same worker matched into the
same firm can move in response to the evolution of peer components. To identify similar
workers, I leverage the insight that workers with similar matching history would get similar
wages working in the same firm at the same point in time. Thus, the similarity between two
coworkers can be measured by their average wage distance during the coworkership. Notice
that implied by both (3) and (4), any pair of similar coworkers i and i′ ∈ Pft at the same firmsame period should receive similar coworker influence on their wages, as these two workers
share almost identical coworker set P−i,ft and P−i′,ft. To my knowledge, this feature is alignwith most assortative matching models in the literature, including Hagedorn et al. (2014),
Gautier and Teulings (2012), Eeckhout and Kircher (2011), Lise et al. (2016).
The identification of unobserved worker productivities, wage complementarities and
coworker effects therefore can be conducted in two consecutive stages: “clustering” and “es-
timation”. In the “clustering” stage, the target is to identify in the data groups of workers
that are similar in latent productivity and assign them into the same group, and predict
the wage at firms one did not work at, based on what workers assigned to the same group
make at that firm in the same year. The worker clustering takes an agglomerative hierar-
chical approach: the algorithm starts with each worker initialized as a single-point cluster,
and iteratively merges the most similar pair of “child” clusters at current stage into a new
“parent” cluster, and update the similarity between the new cluster and the rest of existing
ones. In the “estimation” stage, I estimate match and coworker effect function a(x) and
w(x, y) given the worker clustering assigned.
I adopt a cross validation method to decide the number of clusters. The optimal clustering
is chosen to make the best out-of-sample prediction of wages.
16
3.3 The “Clustering” Stage
This section explains how the algorithm works for the “clustering” stage .
3.3.1 Notations
Clustering C is a of a set of workers I if it forms a partition of I:
C = {C1, C2, ..., CK}, Ck = {i ∈ I|ci = k}.
The assignment function c maps individuals to their cluster c : I → K. Clusters are indexedby integers from the cluster-label set K ≡ {1, ..., K}. The number of clusters in C is K.Clusters Ck and firm f match at t if i works at firm f at some t. Denote the matching set :
Ckft ≡ {i ∈ Ck : i works at firm f at some t}
Denote the matching indicator between workers and firms for certain periods on set C×F×T :
Ωk,f,t =
1 if worker i ∈ Ck works at firm f at time t.0 otherwiseWhen worker cluster Ck matches firm f at year t, the cluster mean µkft can be evaluated:
µkft =1
|Ckft|∑
k′∈Ckft
wk′ft, if Ωkft = 1.
Dissimilarity Within firm f at t, wage distance between individual worker j, k:
Djkft = wjft − wkft =w(xj, yf )− w(xk, yf ) +a(xk)− a(xj)
Nft(6)
Note that when worker i and j are similar xi ≈ xj, Djkft ≈ 0 given that both w and a arecontinuous. The average wage distance between individual worker j, k over all wage distances
observed at periods t and firms f :
Djk = mean{f,t | Ωjft=1, Ωkft=1}
Djkft (7)
17
Worker similarity between each pair of worker is measured by the average wage distance a
coworkership.
d(j, k) =
|Djk|∑
t∈T ΩjftΩkft > 0
∞ otherwise
In particular, worker j is similar to k at cutoff κ, if d(j, k) < κ; otherwise worker j is
dissimilar to k at κ. The hierarchical clustering algorithm sequentially merge individual
worker to worker clusters. I also define wage distance and similarity measure between a pairs
of worker clusters. Given clustering set C, the wage distance and similarity between workercluster pair Cj and Ck within firm f at t is:
Djkft = µjft − µkft. (8)
The average wage distance and dissimilarity between Cj and Ck:
Djk = mean{f,t | Ωjft=1, Ωkft=1}
Djkft (9)
d(Cj, Ck) =
|Djk|∑
t∈T ΩjftΩkft > 0
∞ otherwise(10)
Worker cluster Cj is similar to Ck at cutoff κ, if d(Cj, Ck) ≤ κ. Likewise, worker cluster Cj isdissimilar to Ck at cutoff κ, if d(Cj, Ck) > κ. Affinity graph A ∈ SK of C. Vertex j ∈ K of Arepresents a cluster Cj ∈ C and edge (j, k) represents the similarity between Cj and Ck at κ:
Aj,k(κ) =
1 if d(Cj, Ck) < κ
−1 if d(Cj, Ck) > κ
0 if Cj and Ck does not match.
. (11)
Components Denote Π ∈ P (C) one partition of worker clustering set C who has L subsets.The l-th subset of Π consists of Kl member clusters Sl = {C1l , C2l , ..., CKl}:
Π = {{C11 , ..., CK1}, ..., {C1L , ..., CKL}} = {S1, ...,SL},L∑l=1
Kl = K.
Subset S is a path-similar component at cutoff κ if all its member clusters forms a connectcomponent of the affinity graph A. That is, for any pair of member clusters Ck, Ck′ ∈ S,
18
exists path on S {Ck → Cp(1) → ... → Cp(N) → Ck′} such that any two consecutive clusterson the path are similar at κ: Ap(n), p(n+1)(κ) = 1, ∀ n ∈ {0, ..., N}. Partition Π is a path-similar partition of C at κ if all its subsets are path-similar components at κ. Denote it asΠ∗(C).
Subset S is a disagreement-free component at cutoff κ if there are no dissimilar pairof member clusters Ck, Ck′ ∈ S at κ, or Ak, k′ (κ) > −1, ∀ Ck, Ck′ ∈ S. Partition Π is adisagreement-free partition of C at κ if all its subsets are disagreement-free components atκ. Denote it as Π∗∗(C).
3.3.2 The Algorithm
The input of the algorithm is the wage information wift for all worker-firm pairs (i, f) which
match at time t, i.e. Ωi,f,t = 1. The output of the algorithm is worker clustering C.The algorithm goes through a number of iterations ι ∈ {0, 1, 2, . . . , ῑ} each associated
with regularization parameter κ ∈ {0, �, 2�, . . . , κ̄ ≡ ῑ�}, where � represents a small number.At the first iteration ι = 0, initialize worker clustering C0 = I to be the set of single-worker clusters, i.e. N worker clusters each contains an individual worker: C0 = {C01 ={1}, C02 = {2}, . . . , C0N = {N}}. and the corresponding initial assignment functions wouldbe c0i = i,∀i ∈ I. The outcome of each iteration ι is a worker clustering Cι.
Workers are clustered in a hierarchical order of their similarities: for each iteration ι, the
algorithm decide whether or not to group current worker cluster Cιj, Cιk ∈ Cι based on the
current cutoff κ. If Cιj, Cιk are similar at κ, they will be both merged into the same worker
cluster:
merge Cιj and Cιk if Aj,k(κ) = 1.
The algorithm can be summarized as follows:
Algorithm 1 Worker Clustering (Baseline)
function Worker Clustering
Initialize clustering: one worker = one cluster
Distance d(Cj, Ck) = meanf∈F ,t∈T | µjft − µkft |
for κ ∈ [0, ..., κ̄] do
if d(Cj, Ck) < κthen Merge (Cj, Ck) and update cluster wage ŵ, â end if
Repeat until distance for all ≥ κ
end for
end function
19
3.3.3 Theoretical Properties
This subsection derives theoretical properties of the worker clustering by Algorithm 1 with
additional assumptions.
If κ increase slowly, i.e. step increment � is small for each iteration, at most one pair of
workers are merged into the same cluster in the next iteration clustering Cι+1, and Algorithm1 can accurately group similar workers.
Theorem 3.2. When κ increases slowly, each iteration ι associated to cutoff κ delivers a
clustering Cι is a path-similar partition of workers I by κ. That is, for each pair of individualon the j, k ∈ I in the same cluster j, k ∈ C,C ∈ Cι, exists a path connecting j and k on Csuch that each pair of adjacent workers on the path are similar by κ.
Proof. Proof by induction: Theorem 3.2 obviously holds for ι = 1. Assume it holds for ι = n
with cutoff κ = ι�: without loss of generality, suppose that Ck, Cj ∈ C are two clusters to beagglomerated at iteration j ∈ Cj, k ∈ Ck, then each of Cj and Ck is a path-similar partitionfor I at the previous cutoff κ′ = (n − 1)� < κ, thus Cj is also path-similar by κ. Considerthe distance between the closest pair of workers on each cluster j∗ ∈ Cj, k∗ ∈ Ck, j∗ and k∗
are similar at κ as d(j∗, k∗) ≤ d(j∗, Ck) ≤ d(Cj, Ck) < κ. The newly formed cluster Cj,k isalso a path-similar partition of I, since for any two member j, k ∈ Cj,k, can find path fromj → j∗ → k∗ → k such that all adjacent workers are similar at κ. Therefore, the theoremalso holds for ι = n+ 1.
For computational efficiency, � is set a reasonable large number in practice and there
can be multiple worker clusters in the same path-similar component (constituting the same
connected components of the affinity graph A(κ)) assigned in each iteration ι. Ruling out dis-
agreement for each path-similar component requires additional constraints. This discussion
is delegated to Section 3.7.
Homophily Worker of certain productivity x ∈ X exhibits local homophily near neighbor-hood Br(x) ≡ {x′ ∈ X : |x′ − x| < r}, if any pair workers with similar types ∀k, k′ ∈ Br(x)have been working as coworkers with independent probability:
p(k, k′) >log µr(x)N
µr(x)N.
µr(x) is the fraction of workers whose type x ∈ Br(x): µr(x) =∫Br(x) φ(x)dx. φ(x) is the
probability density at x. Local homophily assumes that workers with similar latent produc-
tivities have higher tendency to become coworkers. It guarantees sufficient local matching
20
density for these workers that are close in productivity space to meet in the same workplace.
Algorithm 1 can detect and group workers with similar unobserved productivity where local
homophily holds.
Single-crossing Worker of certain productivity xj ∈ X satisfies the single crossing con-dition, if the average wage distance between any individual worker j and k, Djk(xj) is
monotonically increasing in the productivity xj, for all xk ∈ X and yj ∈ Y . Recall that by(6) and (7):
Djk(xj) = mean{f,t | Ωjft=1, Ωkft=1}
w(xj, yf )− w(xk, yf ) +a(xk)− a(xj)
Nft
Note that Djk is a finite function defined on a compact set X . The single-crossing conditionassumes that a worker of higher productivity should always get a higher wage. Under this
condition, Djk(xj) crosses zero only for once at xj = xk. In consequence, small wage dis-
tances between workers can be mapped into the proximity of their productivity. A sufficient
condition for single-crossing is when wage w(x, y) is increasing in x and size of the peer group
Nft is large so that|a(xk)−a(xj)|
Nftis always dominated by |w(xj, yf )− w(xk, yf )| .
Theorem 3.3. (No global split) Suppose x ∈ X exhibits local homophily in Br(x), for anypair of worker j and k with similar productivity ∀xj, xk ∈ Br(x), can find κ > 0 at whichAlgorithm 1 terminates and delivers path-similar clustering C∗ at κ that assigns both workersto the same cluster with high probability:
limN→∞
p(c∗k = c∗j) = 1.
Lemma 3.1. (Erdos and Renyi, ’60) Denote random graph G(N, p) that has N vertices and
whose edge between any pair of vertices form independently with probability p.
p =p0 logN
N
Graph G(N, p) is connected with high probability if and only if p0 > 1.
Proof. when N → ∞, immediately after Lemma 3.1, sub-graph G(µr(x)N, p) is connectedwith high probability, i.e. exists a path connecting all workers in Br(x) such that any adjacentworkers j, k on the path are coworkers. Now need to show can find κ > 0 all j and k on the
path are also similar at κ. Since j, k ∈ Br(x), |xj − xk| < r. Because wage distance function(6) is continuous, the average distance is bounded: ∃δjkft : |Djk(xj) − Djk(xk)| < δjkft.Therefore, can choose κ = maxj,k,f,t δjkft such that all adjacent worker j and k are similar
21
at κ. That is Br(x) constitutes a path-similar component and all its members are assignedto the same cluster.
Theorem 3.4. (No local contamination): Assume that all x ∈ X satisfies single-crossingcondition. Exists κ > 0 at which Algorithm 1 terminates and delivers path-similar clustering
C∗ such that∀k : |xj − xk| > r, p(c∗k′ = c∗j) = 0.
Proof. Theorem 3.4 immediately follows the continuity and the monotonicity of wage dis-
tance Djk(xj): without loss of generality, assume that xj > xk + r, and then κ0 ≡Djk(xj) − Djk(xk) > 0. For all small κ < κ0, worker j and k will be dissimilar and willnot be grouped to the same cluster.
Theorem 3.4 states that if terminate Algorithm 1 at a relative big cutoff κ, the outcome
clustering C can detect and group all similar workers in a densely connected neighborhood.On the other hand, Theorem 3.3 states that if terminate Algorithm 1 at a relative small
cutoff κ′, the corresponding clustering C ′ that can distinguish any dissimilar pair of worker jand k. The optimal stopping criterion for κ must best balance the tradeoff between “splitting
similar workers into multiple clusters” and “contaminating a cluster by introducing dissimilar
workers”. The discussion of choosing the optimal κ is delegated to Section 3.5.
3.4 The “Estimation” Stage
Based on worker clustering C and assignment function ci, the “estimation” stage recoversthe complementarities and coworker effects function w(xi, yf ) and a(xj) at worker cluster
level. The idea is that when workers assigned to the same worker cluster are close in their
productivity space, I can recover coworker influences for any individual their cluster average:
Ewift = w(xi, yf ) +1
Nft
∑j∈P−i,ft
â(xj)
≈ ŵ(ci, f) +1
Nft
∑j∈P−i,ft
â(cj). (12)
where ŵ : C × F → R, ŝ : C → R.The estimation recovers unobserved heterogeneous characteristics in workers and firms
that determines the wages. Despite the fact that xi, yf are latent, the cluster membership ci
and firm identity f are now observable from the “clustering” stage. Conditional on observed
22
ci and f , equation (13) is identified under Assumption 3.1.
wift = ŵ(ci, f) +1
Nft
∑j∈P−i,ft
â(cj) + ν̃ift
= ŵ(ci, f) +1
Nft
∑k∈K
h−i,ft(k) â(k) + ν̃ift (13)
Coworker component h−i,ft(k) =∑
j∈P−i,ft 1{cj = k} counts the number of coworkers inP−i,ft assigned to cluster k. The match component ŵ(ci, f) is captured by the joint-worker-cluster-by-firm fixed effect and coworker effect â(cj) can be estimated by the coefficient of
wage responses to the changing coworker components.
3.5 Regularization
I use regularization criterion to select the optimal worker clustering among the sequence of
all iterations {Cι} that minimizes the generalization error of the machine learning algorithm.In particular, I place penalty when the cluster sizes are too small, and select the clustering
Cι (and ŵ and â it implied) that minimizes the RMSE of out-of-sample forecast.To evaluate the criterion, I randomly split the data into three components: the training
set (80%), the validation set (10%), and the test set (10%). 10 Only based on observations
in the training set, I estimate the sequence of worker clustering {Cι and function ŵι, âι} foreach iteration ι. Based on that, I can make out-of-sample predictions for each observation
wi,f ′,t′ in the validation set and the test set. If Cιi matches to f
′ at t′ in the training set, i.e.
the algorithm can find one or more workers assigned to the same cluster Cιi who worked at
firm f ′ at t′, it should optimally predict the average wage in that cluster:
w̃if ′t′ = ŵ(ci, f′) +
1
Nf ′t′
∑j∈P−i,f ′t′
â(cj) = µci,f ′t′
If such worker does not exist, predictions cannot be made based on a similar reference worker.
The best predictor would be worker i’s average wage in the training sample conditional on
all available information.
Predicted wage ŵιif ′t′ of i at new firm f′ at t′ =
w̃if ′t′ , if Cιi and f ′ match at t′:worker i tr. sample ave., otherwise10Since wages are inter-dependent within each peer group, the split is at random at peer groups. For
example, if peer groups (f, t) is drawn and assigned to the training set, all workers in Pft are assigned tothe training set.
23
The criterion function is the RMSE evaluated on the test sample.
Q(w, ŵι) =
(mean
(i,f,t)∈{validation set}(ŵιift − wift)
)1/2. (14)
and the optimal clustering is selected as
ι∗ = arg minι{Q(w, ŵι)} (15)
Starting from the initial iteration ι = 0, each worker forms its own cluster. At this itera-
tion, the average of each single-worker cluster can accurately reflect wages for the individual.
However, I cannot make out-of-sample predictions based on “similar reference workers”, but
can only rely on the worker’s personal average. Therefore, criterion (14) for the initial itera-
tion would be large. As the algorithm proceeds, the wage cutoff κ = ι� gradually increases,
with more workers can be grouped as similar, and the average size of clusters gets bigger.
Criterion (14) will first decrease as more workers can be predicted with cluster average in-
stead of personal average. (14) will increase again when the average size of clusters gets too
large to accurately predict individual wages. One can imagine when the wage cutoff goes to
infinite, all workers are assigned to one single big cluster. The algorithm would predict the
average wage for all workers which couldn’t be accurate. The lowest points of the u-shaped
regularization curve represents the optimal trade-off.
Figure 3: Criterion Q(w, ŵ) on the validation set.
24
3.6 Measuring coworker complementarities
In this section, I show how to further generalize the framework (3) to allow for the com-
plementarity between coworker productivities in the coworker effects. Further allowing for
coworker complementarities besides the complementarity between the worker and the firm
has important implications on efficiency and reallocation of workers across the firms.
3.6.1 Framework
In a more relaxed framework, wages can be determined as 11
wift = w(xi, yf )︸ ︷︷ ︸match effects
+1
Nft
∑j∈P−i,ft
a(xi, xj)︸ ︷︷ ︸general coworker effects
+ νift. (17)
Here, I adopt the two-dimensional continuous spillover function aX 2 → R to capturethe wage complementarity between worker i and her coworkers j ∈ P−i,ft. Particularly,a(xi, xj) reflect the coworker influence exerted by coworker j on the focal worker i. Note that
conditional on the set of productivity xi, yf , {xj}j∈P−i,ft being observed, the identificationholds for (16) as it takes a specific form of the general wage function (4).
3.6.2 The “clustering” stage
Importantly, the identification of unobserved worker productivity is identical to the baseline
equation (6), i.e. the worker similarity can be measured with the wage distance between
11Similarly to (3), one can include time-varying fixed effects to account for the potential endogeneity bythe common shocks (e.g. technology shocks at the firm or other cohort level):
wift = w(xi, yf ) +1
Nft
∑j∈P−i,ft
a(xi, xj) + Zft(xi) + �ift. (16)
This equation is identified under the assumption of exogeneity
�ift ⊥⊥ h−i,ft(xj)∣∣∣∣ xi, yf , {xj}j∈P−i,ft .
and empirically requires no multi-linearity between the realization of
X = {1{ci = l, f}, {h−i,ft(cj)}, 1{ci = l, f, t}}.
25
individual worker j, k within firm f at t:
Djkft = wjft − wkft =w(xj, yf )− w(xk, yf ) +a(xi, xk)− a(xi, xj)
Nft(18)
As the similarity can be measured pairwisely with identical distance function, the worker
clustering obtained with the “clustering” stage identifies types {xi}I for the extended model.
3.6.3 The “estimation” stage
Given worker clustering C acquired in the clustering stage, I estimate coworker effect functionâ(ci = l, cj = k) defined for each pair of interaction between worker cluster Ck and Cl. The
estimation and identification are in the similar fashion of (13), but separately for each group
of worker in the same clusters ci = l.
wift = ŵ(ci, f) +1
Nft
∑j∈P−i,ft
â(ci, cj) + ν̃ift
= ŵ(l, f) +1
Nft
∑k∈K
h−i,ft(k) â(l, k) + ν̃ift (19)
Coworker component h−i,ft(k) =∑
j∈P−i,ft 1{cj = k} counts the number of coworkers inP−i,ft assigned to cluster k.
3.7 Disagreement-free Partition
When the step of cutoff � is small, at most one pair of workers are merged into the same
cluster next clustering Cι+1. While it is accurate, the algorithm is slow. In practice, I usereasonably large cutoff step � to merge multiple worker clusters in the same path-similar
component assigned in each iteration ι. This is more efficient. The problem is, there could
be disagreement in same path-similar component. For example, one component {Ci, Cj, Ck}can be path-similar by certain κ, if worker cluster Ci is similar to cluster Cj, and cluster Cj
is similar to cluster Ck. It can also contain a “disagreement” if worker cluster Ci and Ck are
observably dissimilar at a workplace where Cj is absent. Ruling out disagreement for each
path-similar component requires additional constraints.
The idea is to partition each path-similar component into finer collection of disagreement-
free components:
{S∗∗l(1), ...,S∗∗l(M)} = split disagreement(S∗l ).
Subroutine split disagreement is a fine-tuning device developed to rule out dissimilar workers
in a path-similar component. It takes each path-similar component S∗l ∈ Π∗ as input, and
26
recursively finds the minimum cut to split the component into multiple disconnected “child
component” once a disagreement is found. Finding the smallest cut of a component is to
remove least edges weighted by similarity so that the component is split into disconnected
child sub-components. If both child components are disagreement-free, the algorithm stops.
Otherwise repeat the procedure until all “child component” do not contain any dissimilar
pair of worker clusters at the cutoff κ. The procedure can be viewed as a tree transverse with
depth-first search.
Algorithm 2 Split Disagreement
function Split Disagreement . Input: S∗ . Output: S∗∗
Initialize partition list S∗∗ = {}
if component S∗ is disagreement-free
then add S∗ to partition list S∗∗
else
Find the most distant cluster
Cj, Ck = arg maxCj ,Ck∈S∗
d(Cj, Ck)
Find minimum cut
{S∗l ,S∗r } = maxflow(S∗, Cj, Ck)
Repeat for component containing Cj:
{S∗∗j(1), ...,S∗∗j(M)} = split disagreement(S∗l )
Repeat for component containing Ck:
{S∗∗k(1), ...,S∗∗k(M ′)} = split disagreement(S∗r )
Add {S∗∗l(1), ...,S∗∗l(M),S∗∗r(1), ...,S∗∗r(M ′)} to partition list S∗∗
end if
end function
27
Incorporating this modification, I propose the efficient version algorithm for worker clus-
tering as follows:
Algorithm 3 Worker Clustering (Efficient)
function Worker Clustering
Initialize clustering: one worker = one cluster
Distance d(Cj, Ck) = meanf∈F ,t∈T | µjft − µkft |
for ι ∈ [0, ..., ῑ] do
Current clustering Cι, cutoff: κ = ι�
Evaluate affinity graph Aj,k(κ)
Find its connected component Π∗(Cι) = {S1, ...,SL}
for l ∈ [1, ..., L] do
S∗∗ ≡ {S∗∗l(1), ...,S∗∗l(M)} = split disagreement(S∗l )
end for
Disagreement-free partition
Π∗∗ = {S∗∗1(1), ...,S∗∗1(M ′), ...,S∗∗L(1), ...,S∗∗L(M ′′)}
Merge clusters in all components of Π∗∗ into the same cluster
The output is the clustering for the next iteration Cι+1
end for
end function
28
Illustration of Worker Clustering
1 23
5 6
Firm 𝑓, Time 𝑡
Wag
e 𝑊
𝑖,𝑓,𝑡
Worker 𝑖
1. The input of the algo-rithm is wages observed in thematched employer employeedata wift (y axis) for all work-ers i (x axis) at the sameworkplace, i.e. in firm f at thesame period t. Individuals ex-ert influence on their cowork-ers’ wages.
1 23
5 6 1 2 3 4 5 6
Firm 𝑓, Time 𝑡
Wag
e 𝑊
𝑖,𝑓,𝑡
Worker 𝑖 Worker 𝑖
Worker hierarchy
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
2. Starting from the first it-eration associated with theminimum κ (y axis of theright panel), the algorithmassign the most similar work-ers (“1” and “2”) into thesame cluster.
Firm 𝑓, Time 𝑡
3
5 6 3 4 5 6
Worker hierarchy
1 2
1 2
Firm 𝑓, Time 𝑡
Wag
e 𝑊
𝑖,𝑓,𝑡
Worker 𝑖 Worker 𝑖
Worker hierarchy
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
3. Continue to the next itera-tion with larger κ, the nextmost similar pair of worker(“3” and “4”) are grouped.
29
1 23
5 6 1 2 3 5 64
Firm 𝑓, Time 𝑡
Wag
e 𝑊
𝑖,𝑓,𝑡
Worker 𝑖 Worker 𝑖
Worker hierarchy
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
4. As cutoff κ further in-crease, larger fraction ofsingle-worker clusters areagglomerated into biggerclusters. Note that mergenot only takes place betweenindividual workers but alsobetween worker clusters (e.g.between cluster “12” andindividual “3”).
45 1 2 3 4 5 6
78
Firm 𝑓′′, Time 𝑡′′
Wag
e 𝑊
𝑖,𝑓′′,𝑡′′
Worker 𝑖 Worker 𝑖
Worker hierarchy
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
5. As the size of clusterincrease, the algorithm cancompare and group a widerrange of workers. For ex-ample, worker “4” and “6”has never been coworkers.Nonetheless, they are identi-fied similar and cluster via in-termediate worker “5”. Notethat “4” and “5” earn similarwage in firm f ′ at t′ while “5”and “6” earn similar wage infirm f at t.
1 23
5 6 1 2 3 4 5 6
Firm 𝑓, Time 𝑡
Wag
e 𝑊
𝑖,𝑓,𝑡
Worker 𝑖 Worker 𝑖
Worker hierarchy
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
train
6. When increasing κ to in-finity, the algorithm ultimateagglomerates all workers. Theoutput is a “worker hier-archy”: for every individualworker, a group of similarworkers given cutoff κ.
30
Illustration of Regularization
11 2 3 4 5 61
3
45
78
2
6
error
cross-validate
1 23
5 6
Firm 𝑓, Time 𝑡
Wag
e 𝑊
𝑖,𝑓,𝑡
Worker 𝑖 Worker 𝑖
Worker hierarchy Firm 𝑓′, Time 𝑡′
Worker 𝑖
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
Wag
e 𝑊
𝑖,𝑓′ ,𝑡′
train predict
𝜅
1. Each level of cutoff κ corresponds to a unique worker clustering, based on which the algorithmpredict out-of-sample wages in the validation set. Given κ fixed at the level in the middle panel, thecorresponding clustering C = {{1, 2, 3}, {4, 5, 6}}. The algorithm use the average wage of worker2 and 3 in firm f ′ at t′ to predict for worker 1 (the dashed square) and compare it to the actualwage (the solid square) in validation set to evaluate the RMSE error.
Firm 𝑓, Time 𝑡
13
Wag
e 𝑊
𝑖,𝑓,𝑡
5 6
Worker 𝑖
1 2 3 4 5 6
Worker 𝑖
Worker hierarchy Firm 𝑓′, Time 𝑡′
2
train predict
1
3
45
78
2
6
Worker 𝑖
Dis
sim
ilar
ity
|𝐷𝑖,𝑖′|
Wag
e 𝑊
𝑖,𝑓′′,𝑡′′
cross-validate
𝜅
2. Search for κ that minimize the prediction error. Note in this case, the RMSE decreases whenmove to a lower κ, corresponding to clustering C = {{1, 2}, {3}, {4, 5, 6}}, and the counterfactualwage for worker 1 is only predicted with worker 2’s wage.
31
4 Simulation Results
To evaluate the accuracy of the estimator in finite sample, I run Monte Carlo simulations. I
simulate wages from a data generating process with standard calibration.
4.1 Simulation
This section shows that Algorithm 1 effectively estimates the worker clustering, the wage
complementarity and the coworker effects function. Each worker i ∈ I has productivity
xi, x : I → R and each firm j ∈ J is has productivity yj, y : J → R. For each year
t ∈ {1, ..., T} the unemployed workers randomly search and apply for job vacancies in firms.
The average monthly job finding rate is calibrated to λ̄ is set to 40%. To generate realistic
positively associative matching, i.e. high-productivity workers will work in high-productivity
firms, assume that the offer is accepted if and only if
|xi − yf | < 0.1.
For each type of match (xi, yj), worker i left firm j subject to an exogeneous job separation
rate δ = 3%. Importantly, the match effect component for (xi, yf ).
w(xi, yf ) = (xρi + y
ρf )
1/ρ.
and the coworker effect exerted by coworker j whose type xj is a(xj). Wage is determined
by Equation (3) given worker productivity xi, firm productivity yf and all the coworker
productivities {xj}j∈P−i,ft :
wift = w(xi, yf ) +1
Nft
∑j∈P−i,ft
a(xj) + νift.
The parameters are summarized in Table I:
32
Workers N = 10, 000Firms M = 200Years T = 20Worker types xi ∼ U [0, 1]Firm types yf ∼ U [0, 1]Job finding rate λ = 40%Job separating rate δ = 3%Meeting randomMatching set {(x, y) : ||x− y|| < 0.1}
Table I: Data Generating Process
4.1.1 Data Generating Process #1
In the first DGP, the match effect takes functional form of a CES and there is no peer effects:
w(x, y) =(x1/2 + y1/2
)2, a(xi) = 0.
Clustering The outcome cluster assignment ci for all workers in displayed in Figure 12.
On the x-axis for each pixel is the true worker productivity xi while on the y-axis is the label
of the assigned cluster ci for the same worker i. The clustering C displayed in Figure 12 is
highly accurate: workers that are close in X are assigned to the same or adjacent clusters
and the assignment function ci is on the 45 degree line.
Figure 12: Worker cluster assignment ci. Note that cluster labels are identified up to apermutation. In cluster label are ranked by the average true worker productivity.
33
Coworker effects The coworker effect function is accurately estimated. In Figure 13 When
there is no coworker effect is in the DGP, the algorithm correctly detect zero. â(x) ≈ 0.
Figure 13: Estimated coworker effects â(cj) and the ground truth a(xj)
To evaluate the accuracy of estimator â, I use relative risk, defined as the ratio of the
root of mean squared error for estimator â(X ) over the total variance of the true coworker
effects in the population.
RR =
(∫ 10
(â(x)− a(x))2φ(x)dx∫ 10a(x)2φ(x)dx
)1/2× 100%
φ(x) is probability density function of workers whose type is x. To create a comparable
benchmark, I show the performance of the implied CDS estimator for coworker influence for
each individual i as:
âCDSi = γ̂ · ψ̂i.
The results are summarised in Table II.
Baseline estimator â CDS estimator âCDS
Relative Risk 6.87% inf
Table II: Relative Risk for both estimators
34
4.1.2 Data Generating Process #2
In the second DGP, I simulate wage with the same match effect function, but now with
positive coworker effect function:
w(xi, yf ) =(x
1/2i + y
1/2f
)2, a(xj) = 0.05xj.
Coworker effects Still, the coworker effect function is accurately estimated. In Figure 15
the estimated and true coworker effect function well align on top of each other. â(x) ≈ a(x).
Figure 14: Estimated coworker effects â(ci) and the ground truth a(xi)
35
General coworker effects The results for the general estimator between type
â(x = l, x′ = k)
are displayed as follows:
Figure 15: Estimated coworker effects â(ci, cj) and the ground truth a(xi, xj)
The performance of the baseline estimator â(l), two-dimensional estimator â(l, k) and
benchmark estimator âCDS are summarised in III
Baseline est. â(k) General est. â(l, k) CDS est. âCDS
Relative Risk 3.42% 5.68% 44.81%
Table III: Relative Risk for both estimators
36
5 “Divide and Conquer”: Achieve the Scalability
While the baseline hierarchical clustering algorithm demonstrates its accuracy in identifying
unobserved worker productivities and estimating the coworker effects, one impediment to
implement it to large administrative data lies in its computational complexity in both time
and space. The time complexity of an algorithm measures how the number of operations
scales with the size of the data and space complexity measures its memory usage. For a typical
hierarchical clustering, the time complexity is cubic O(N2 logN) and space requirement is
quadratic O(N2), where N is the number of workers in the data. This is formidable volume
of computations and space requirement for the scale of the matched employer-employee data
of Denmark where the size N = 3 million workers.
5.1 Graph Embedding Techniques
To accommodate the demand for scalability, I combine the baseline Algorithm 1 with recent
advancements in graph neural network (GNN) based graph embedding techniques.
Graph embedding is a widely applied graph analysis method that has achieved ground-
breaking success in recent applications in multiple domains ranging from recommender sys-
tem to pharmaceutics (Cai et al. (2018), Wu et al. (2019),Zhou et al. (2018)) for detailed
reviews of these recent development and applications. The target to map a graph into a low
dimensional space where the graph information is best preserved. Particularly in my appli-
cation, the focus is to find embeddings for each worker i and each peer group (time-varying
firm f by t) given wage matrix wi,ft ∈ RN×FT . Here, w is viewed as a biparte graph between
worker node i ∈ I and time-varying firm node p ∈ F ×T and with the weight on edge (i, ft)
being the observed wage wi,ft. The embedding is a vector representation of an individual
worker or peer group in a low dimension: hi ∈ Rv and hft ∈ Rv, that can best preserve the
wage information in the data.
Graph embedding can be efficiently computed with GNN techniques, pionneered by Kipf
and Welling (2016). The method of GNN follows a neighborhood aggregation scheme: the
embedding of a worker node is computed by recursively aggregating and transforming the
embeddings of its neighboring coworkers (Xu et al. (2018)). The scheme therefore enables
37
flexible interactions between coworkers reflected by the embedding of workers and the time-
varying firms.
Illustration of GNN-based Graph Embedding
𝒉𝒋𝒌−𝟏
𝒉𝒇′𝒕′𝒌−𝟐
𝒉𝒇𝒕𝒌−𝟐
𝒉𝒊𝒌−𝟏
𝒉𝒇𝒕𝒌
Wage Matrix Graph Convolutional Network
j
ft
𝒉𝒇𝒕𝒌−𝟐
ft
𝐿 𝐻; 𝑊 =
𝑖,𝑓𝑡
𝑊𝑖𝑓𝑡 − 𝑤𝜃 ℎ𝑓𝑡, ℎ𝑖
𝒉𝒇𝒕𝒉𝒇′𝒕′
𝒉𝒊
𝒉𝒋
𝑾𝒊𝒇𝒕
ℎ𝑓𝑡𝑘 = 𝑔𝜃({ℎ𝑖
𝑘−1, ℎ𝑗𝑘−1})
f’t’
i
j
i
ft
ft
ft
f’t’
Figure 16: The implementation of GraphSAGE. Consider second order neighborhood aggre-gations. Each circle in both panels represents a worker node (“i”), a square a time-varyingfirm node (“ft”), and the edge the wage for the match (“wi,ft”). All grey square in the rightpanel represent the same graph neural network function gθ (paramenterized by θ) that ag-gregates the embedding of neighbor nodes (“i” and “j”) to update the embedding of currentnode (“ft”). Once the embedding are computed, wages can be recovered by neural networkfunction wθ. The object is to find embeddings H = {{hi}, {hft}}, wθ, and gθ to minimizeloss function L(H,W ).
The worker embedding can be computed efficiently with GNN. The time and space com-
plexity is only linearO(N). In addition, the computation and optimization for neural network
is highly modularized, parallelizable and can be easily distributed on GPU.
5.2 Worker embeddings
Figure 17 displays the result of worker embedding for wages simulated by the data generating
process in Section 4.1. The coordinate of a node is two t-SNE representation of the embedding
for a worker. Each node is colored by the true productivity type x. The GNN algorithm can
38
well distinguish workers globally as worker in similar colors tend to appear at the similar
location, but the result subject to a plenty of local mistakes and is less accurate than the
baseline Hierarchical clustering Algorithm 1.
{ Worker-by-time-varying-firm wage matrix }Graph Embbedding−−−−−−−−−−−→ Rv k-means−−−−→ subsets
Figure 17: t-SNE representation of Worker Embedding
5.3 “Divide and Conquer”
The baseline hierarchical clustering is accurate but very costly in both computation and
memory usage in big-data applications. Despite being less accurate, the GNN-based graph
embedding is can be implemented with high performance and efficiency. This section pro-
pose to integrate the baseline algorithm with the GNN approach with a divide and conquer
strategy. The integration is closely related to the proposal by Hagedorn et al. (2020).
In the “division” step, compute worker embeddings using GNN and group closely em-
bedded workers and divide them into separate subsets. In the “conquering” step, apply
hierarchical clustering only to each local subset: on the premise that only similar workers
assigned into each cluster, this step significantly reduce the dimension of the problem by
erasing voluminous redundant comparisons without any compromise of accuracy. In case
that GNN mistakenly “split” similar workers into different subset, I reshuffle the subsets and
repeat the procedure.
39
Different from the random walk based “node2vec” embedding employed by Hagedorn,
Manovskii and Xin (2000), GNN-based embedding has a number of advantages. First, it
is suitable to study coworker effects as it explicitly model interaction between neighboring
coworkers in a flexible manner. Second, GNN approach allows to incorporate node feature
information. Moreover, the computation for GNN can be easily paralleled and efficiently
computed with GPU.
5.4 Simulation Results
To illustrate the efficiency of the divide-and-conquer strategy, I simulate Data Generation
Process 2 in Section 4.1 with large number of workers N = 100, 000.
Clustering The outcome cluster assignment ci for all workers for the divide-and-conquer
algorithm in displayed in Figure 18. The clustering C displayed is accurate: workers that are
close in X are assigned to the same or adjacent clusters and the assignment function ci is on
the 45 degree line.
Figure 18: Worker cluster assignment ci (divide-and-conquer).
Coworker effects Figure 15 indicates that the coworker effect function is accurately esti-
mated.
40
Figure 19: Estimated coworker effects â(ci) and the ground truth a(xi)
For estimator â on X :
RMSE =
(∫ 10
(â(x)− a(x))2φ(x)dx)1/2
= 0.16%
41
6 Conclusion
In this paper I have developed a new empirical methodology that allows to study peer effects.
I show that the leading empirical methodology is biased under worker-firm complementar-
ity. I developed a semi-parametric approach to jointly estimate the wage complementarities
and coworker effects. The method can also capture heterogeneous coworker effects, which
helps to reconcile the diverging results in microeconomic literature. The approach combines
recent advancement in machine learning and the approach is based on economic theory. To
accommodate the demand for computational efficiently, I integrate the baseline algorithm
with GNN-based graph embedding techniques. I am currently using the proposed method to
measure co-worker effects in the matched employer-employee panel data covering the entire
population of Denmark.
42
References
Abowd, J. M., F. Kramarz, and D. N. Margolis (1999): “High Wage Workers and
High Wage Firms,” Econometrica, 67, 251–334.
Abowd, J. M., F. Kramarz, S. Pérez-Duarte, and I. M. Schmutte (2018): “Sorting
Between and Within Industries: A Testable Model of Assortative Matching,” Annals of
Economics and Statistics, 1–32.
Andrews, M. J., L. Gill, T. Schank, and R. Upward (2012): “High Wage Workers
Match with High Wage Firms: Clear Evidence of the Effects of Limited Mobility Bias,”
Economics Letters, 117, 824–827.
Angrist, J. (2014): “The perils of peer effects,” Labour Economics, 30, 98–108.
Arcidiacono, P., G. Foster, N. Goodpaster, and J. Kinsler (2012): “Estimating
spillovers using panel data, with an application to the classroom,” Quantitative Economics,
3, 421–470.
Banerjee, A., E. Duflo, R. Glennerster, and C. Kinnan (2015): “The Miracle of
Microfinance? Evidence from a Randomized Evaluation,” American Economic Journal:
Applied Economics, 7, 22–53.
Becker, G. (1973): “A Theory of Marriage: Part I,” Journal of Political Economy, 81,
813–846.
Betts, J. and A. Zau (2004): “Peer groups and academic achievement: Panel evidence
from administrative data,” .
Bloom, N., J. Liang, J. Roberts, and Z. J. Ying (2014): “ Does Working from Home
Work? Evidence from a Chinese Experiment *,” The Quarterly Journal of Economics,
130, 165–218.
Bonhomme, S. (2020): “Heterogeneity, Sorting and Complementarity,” Working paper,
National Bureau of Economic Research.
Bonhomme, S., T. Lamadon, and E. Manresa (2019): “A Distributional F
Recommended