Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Electronic copy available at: http://ssrn.com/abstract=1915511
Hedge Funds: The Good, the Bad, and the Lucky
Yong Chen†
Texas A&M University
Michael Cliff‡
Analysis Group
Haibei Zhao§
Georgia State University
August 5, 2015
* We are grateful to Vikas Agarwal, Charles Cao, Heber Farnsworth, Wayne Ferson, Will Goetzmann, Feng Guo,
Michael Halling, Petri Jylha, Greg Kadlec, Andrew Karolyi, Robert Kieschnick, Bing Liang, Andrew Lo, Hugues
Pirotte, Jeffrey Pontiff, Zheng Sun, Josef Zechner, Harold Zhang, and seminar/conference participants at
Cornerstone Research, the Institute for Quantitative Asset Management (IQAM), Pennsylvania State University,
Shanghai University of Finance and Economics, Texas A&M University, University of North Carolina at Chapel
Hill, University of Texas at Dallas, University of Virginia, Vienna University of Economics and Business, Virginia
Tech, VU University of Amsterdam, the FBE 654 Asset Pricing class at University of Southern California, the 4th
NYSE Euronext Hedge Fund Conference in Paris, and the 2015 Financial Intermediation Research Society (FIRS)
Conference for helpful comments. The paper was previously circulated under the title “Hedge Funds: The Good, the
(Not-so) Bad, and the Ugly.” All remaining errors are ours alone. The views expressed in this article do not
necessarily represent those of Analysis Group, Inc.
† Mays Business School, Texas A&M University, College Station, TX 77843; [email protected].
‡ Analysis Group, Washington, DC 20006; [email protected].
§ Robinson College of Business, Georgia State University, Atlanta, GA 30303; [email protected].
Electronic copy available at: http://ssrn.com/abstract=1915511
Hedge Funds: The Good, the Bad, and the Lucky
August 5, 2015
Abstract
We develop a new method to evaluate hedge fund skill in the presence of luck. In the cross
section, by assuming each fund comes from one of several skill groups, we estimate the number
of groups, the fraction of each group, and the mean and variability of skill within each group.
Our method allows luck to affect both unskilled and skilled funds. At the individual fund level,
we propose a performance measure that combines the fund’s estimated alpha with the cross-
sectional distribution of fund skill. In out-of-sample tests, a strategy using our measure
outperforms those using estimated alpha and t-statistic.
JEL Classification: C13, G11, G23
Keywords: Hedge funds, performance evaluation, EM algorithm, performance persistence
1
1. Introduction
The past two decades witnessed hedge funds, with less regulatory rigidity and more
trading flexibility, grow into an important investment vehicle. Tremendous interest has emerged
from both academics and practitioners in assessing whether hedge funds add value for investors.
Indeed, a growing literature examines hedge fund performance from different angles. So far,
there is no consensus about whether an average hedge fund can add value.1 However, at the
individual hedge fund level, prior studies have shown strong evidence of the existence and
heterogeneity of fund skill.2 Two important questions naturally arise from these findings. First,
how many hedge funds have enough skill to add value? Second, how can we identify skilled
hedge funds? These questions motivate our study.
One major challenge in addressing the above questions is that fund managers’ true skill is
not observable.3 In practice, researchers typically measure skill with estimated performance
measures such as alpha. Consequently, due to inaccuracies associated with the estimates, a zero-
skill manager may be lucky and exhibit superior performance, while a good manager may be
unlucky and show inferior performance. Prior studies have proposed several methods to control
for the effect of luck on inference about fund skill in multiple hypothesis testing. Kosowski,
Timmermann, White, and Wermers (2006) and Fama and French (2010) use bootstrap
1 For example, Ackermann, McEnally, and Ravenscraft (1999), Brown, Goetzmann, and Ibbotson (1999), and Liang
(1999) show that in aggregate, hedge funds realize positive risk-adjusted performance. However, Griffin and Xu
(2009) find little evidence that hedge funds, on average, deliver abnormal performance.
2 Kosowski, Naik, and Teo (2007) show that the superior performance of top hedge funds cannot be attributed to
pure randomness. Several papers also investigate the cross-sectional relationship between hedge fund performance
and fund characteristics. Aragon (2007) finds that hedge funds with stricter redemption restrictions offer higher
returns. Agarwal, Daniel, and Naik (2009) find that hedge fund performance is positively related to fund managers’
incentives and discretion. Li, Zhang, and Zhao (2010) link hedge fund performance to fund managers’ educational
background and work experience. Titman and Tiu (2011) show that hedge funds with lower R-squares against
systematic factors realize better future performance. Sun, Wang, and Zheng (2012) find that hedge funds with
different return patterns from peer funds are associated with better subsequent performance.
3 We use “fund” and “fund manager” interchangeably in this paper.
2
simulations to infer skill among mutual funds. Barras, Scaillet, and Wermers (2010) apply a false
discovery approach to mutual funds and detect skill in only a small fraction of funds.4
In this paper, we develop a new method to estimate the prevalence of fund skill and apply
it to a sample of hedge funds. Our approach is based on the assumption that the skill of each
fund, characterized by its alpha 𝛼𝑖, comes from one of several skill groups with mean alpha 𝜇𝑗
and variability of alpha 𝜎𝑗.5 As a stylized example, we can view funds as being “Good” (say 𝜇𝐺 =
3% per year), “Neutral” (𝜇𝑁= 0%), or “Bad” (e.g., 𝜇𝐵 = −2% per year), though our approach can
accommodate more than three skill groups. Accordingly, the observed cross-sectional
distribution of fund alphas is a mixture of the three distributions.
Figure 1 illustrates the mixture of these distributions. Given an observed distribution of
alphas (dashed line), our estimation algorithm identifies the three sub-distributions (solid lines)
that match the cross-sectional distribution when combined together.6 The shape of the cross-
sectional distribution dictates the number of skill groups and their distributional parameters. As
in Fama and French (2010), skill gives rise to fat tails in the distribution of alpha. As shown in
the figure, funds in the Good skill group can have bad realized performance.
We use a modified Expectation-Maximization (EM) algorithm to estimate the average
skill (𝜇𝑗), the variability of skill (𝜎𝑗), and the size of the group (𝜋𝑗) for each skill group j. These
parameter estimates not only describe the cross-sectional distribution of alphas across different
4 See Ferson and Chen (2015) for a refinement and generalization of the Barras et al. method, by using more of the
structure of the model suggested by Barras et al.
5 We focus on net-of-fee returns. Hence, we adopt an investor’s perspective in asking whether the manager can earn
a gross return that is sufficient to cover costs.
6 In this example, we set 𝜋𝐺 = 0.2, 𝜋𝑁 = 0.7, 𝜋𝐵 = 0.1; 𝜇𝐺 = 2%, 𝜇𝑁 = 0, 𝜇𝐵 = −2%, and 𝜎𝑗 = 0.7% for all
groups. This simple example does not incorporate estimation errors in alpha that are introduced later in the paper.
All the parameter estimates in our empirical analysis incorporate the effects of estimation errors.
3
skill groups, but also provides useful information to make inference about skill of individual
funds. In practice, fund alphas are estimated with noises. However, we show that the information
from the cross section can be combined with estimated alphas to make more accurate inference
for individual funds.
At the individual fund level, we construct a new performance measure—the conditional
probability a fund comes from the highest-skilled group. This performance measure incorporates
both a fund’s estimated alpha and the information about the cross-sectional fund skill. When
estimated alpha is very noisy with large estimation error, the measure relies more on the cross-
sectional information as opposed to estimated alpha. On the other hand, if estimated alpha has a
high precision, it receives a great weight in the performance measure.
This performance measure has advantages over the conventional way of using the t-
statistic to adjust for the precision of estimated alpha. Though the t-statistic tells how strongly we
can reject the null hypothesis of zero skill, it does not identify which funds are more skilled. For
example, a fund with a t-statistic of 3.0 does not necessarily have more skill than another fund
with a t-statistic of 2.0. This is because the t-statistic, as the product of estimated alpha and its
precision, does not differentiate between these two components. In contrast, by weighting the
fund’s estimated alpha and the prior information about the cross section, our approach
incorporates the magnitude of estimated alpha based on its precision and provides a ranking of
fund skill. Having such a ranking is important for investors (like funds of hedge funds) facing
capital constraints that limit the number of funds in which they can invest.
In our empirical analysis, we employ a sample of 8,695 hedge funds by merging two
major hedge fund databases—Lipper TASS and Hedge Fund Research—over the period of
1994–2011. We use the Fung and Hsieh (2004) seven-factor model to estimate alpha from
4
historical fund returns, and we consider alternative models for robustness. We mitigate hedge
fund data biases and propose a new way to correct backfill bias. Empirically, we find that a
mixture of four skill groups best fits the empirical distribution of actual fund performance
(compared with other numbers of skill groups), which we refer to as Excellent, Good, Neutral,
and Bad. The first two groups have positive mean alpha, including 9% excellent funds with �̂� =
0.72%/month and 38% good funds with �̂� = 0.35%/month. Meanwhile, 43% of the fund are
neutral funds with zero-alpha after fees (i.e., having skill just enough to cover their fees), and 9%
are deemed as bad funds with �̂� = −0.80%/month. This finding is consistent with the notion that
hedge fund skill tends to be heterogeneous. This result also depicts a remarkably different picture
about hedge fund skill than the limited evidence of skill that prior studies find for mutual funds
(e.g., Barras, Scaillet, and Wermers, 2010; Fama and French, 2010).
To identify superior individual funds, we use the performance measure that computes the
conditional probability a fund comes from each skill group, by combining the fund’s estimated
alpha with parameter estimates for the cross section. Specifically, in each month we form four
portfolios based on funds’ conditional probabilities of being excellent, good, neutral, and bad
estimated from the previous 24 months. Then, we examine “out-of-sample” performance of these
monthly-rebalanced portfolios. We find that the portfolio of the “predicted excellent” funds (i.e.,
those with the greatest likelihood of being excellent) subsequently realize high alpha over a long
horizon. In fact, the alpha spread between the predicted excellent and the predicted bad portfolios
remains significantly positive even three years post-formation. This suggests that our
performance measure is able to detect skill. Further, when comparing the investment value of our
approach with alternative strategies based on past estimated alpha and its t-statistic, we find that
our approach outperforms those competing strategies in out-of-sample tests.
5
Our paper makes several contributions to the literature. First, as an alternative to the false
discovery method applied in Barras, Scaillet, and Wermers (2010), we use the EM algorithm to
make inferences about mixture distributions of fund skill. Barras, Scaillet, and Wermers (2010)
allow luck to affect zero-skill funds (i.e., “false discoveries”), but by using a large test size (e.g.,
a size of 30%) they rule out the possibility that skilled fund can have zero-alpha due to bad luck.
Our method allows luck to affect both zero-skill funds and skilled funds. The fact that we
identify a larger fraction of skilled funds than simply counting statistically significant alphas
suggests that it is important to consider imperfect test power. More importantly, for each
individual fund, we construct a performance measure that combines the fund’s own estimated
alpha and the cross-sectional distribution of fund skill. Thus, the performance measure involves
learning about skill from other funds. Jones and Shanken (2005) demonstrate how learning
across funds affects the inference about the cross sectional distribution of fund skill. We extend
their intuition to a setting of asset allocation across many funds with different skill. While Jones
and Shanken (2005) consider one homogenous skill distribution, our method accommodates
multiple skill groups, which is a necessary condition for comparing skill across funds.
The rest of the paper proceeds as follows. In Section 2, we outline our approach to
inferring fund skill. Section 3 describes the data. Section 4 presents the empirical results about
the fractions of funds from different skill groups and fund performance persistence. Section 5
discusses additional analyses and robustness checks. Finally, Section 6 concludes.
6
2. Methodology
In this section, we first lay out the general setup for inferring the characteristics of the
skill groups and estimating the conditional probability that a fund belongs to the top skill group.
Next, we relate our method to existing studies and discuss some properties of our performance
measure. Finally, we describe our estimation procedure. Technical details about the estimation
approach and simulations are provided in the Appendix.
2.1. The model
We start by assuming that there is an unknown number J groups of funds with different
skill levels. For each group j (j = 1, 2, … , J), a representative fund is characterized by its alpha,
which is assumed to follow a Normal distribution 𝑁(𝜇𝑗, 𝜎𝑗2), where 𝜇𝑗 is the mean alpha in the
group and 𝜎𝑗 captures the dispersion in true skill across funds within the group. The clustering of
performance within a group around the mean (𝜇𝑗) can be attributed to common investment styles
(e.g., Brown and Goetzmann, 1997), while the variability 𝜎𝑗 is driven by fund-specific traits 𝜔𝑖
(e.g., infrastructure or trading intensity). Hence, the true alpha for manager i who belongs to
group j is 𝛼𝑖 = 𝜇 𝑗 + 𝜔𝑖. True fund skill can vary through time due to changes in fund
management or because the informational advantage that can generate alpha in one period erodes
over time.
We use 𝜋𝑗 to denote the fraction of the funds that come from skill group j, which is also
the unconditional probability a fund belongs to the group. Thus, the sum of the group fractions
equals one, i.e., ∑ 𝜋𝑗𝐽𝑗=1 = 1. Consequently, the J sets of triples {𝜇𝑗, 𝜎𝑗, 𝜋𝑗} jointly define a
composite distribution for fund i with the following density function:
7
𝑓(𝛼𝑖) = ∑ 𝜋𝑗𝜙𝐽𝑗=1 (𝛼𝑖; 𝜇𝑗 , 𝜎𝑗), (1)
where 𝛼𝑖 denotes skill of the fund, 𝜙(𝛼𝑖; 𝜇𝑗 , 𝜎𝑗) is the Normal probability density with mean 𝜇𝑗
and standard deviation 𝜎𝑗 evaluated at 𝛼𝑖.7 The probability of observing 𝛼𝑖 in a population equals
the weighted probability of observing 𝛼𝑖 in each group, weighted by that group’s fraction in the
population. As illustrated in Figure 1, the density function 𝑓(𝛼𝑖) also describes the cross-
sectional distribution of fund skill.
So far, we treat fund alpha 𝛼𝑖 as observable. However, empirical analysis in the literature
routinely uses ordinary least squares (OLS) estimated alpha with a sample-specific estimation
error 𝑒𝑖 for fund i. As a result, estimated alpha equals true alpha plus estimation error: �̂�𝑖 = 𝛼𝑖 +
𝑒𝑖 = 𝜇𝑗 + 𝜔𝑖 + 𝑒𝑖. Thus, estimated alpha satisfies the following density function:
𝑓(�̂�𝑖|𝛼𝑖) = 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖), (2)
where 𝑠𝑖 is the standard deviation of estimation error 𝑒𝑖 (i.e., the standard error of �̂�𝑖). The
estimation error, 𝑒𝑖, is assumed to follow a Normal distribution, which is a common assumption
of OLS. By combining Equations (1) and (2) and marginalizing the joint distribution, we obtain
the distribution of estimated alpha �̂�𝑖 as follows:
𝑓(�̂�𝑖) = ∫ 𝑓(�̂�𝑖|𝛼𝑖)𝑓(𝛼𝑖)𝑑𝛼𝑖+∞
−∞= ∑ 𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)
+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)𝑑𝛼𝑖
𝐽𝑗=1 . (3)
7 We assume a Normal distribution for several reasons. First, the parameters of Normal distribution have clear
economic meaning about the mean and variability of skill in our setting, as opposed to other distributions like t-
distribution or inverse gamma. Second, according to the central limit theorem, Normal distribution seems natural to
characterize true alpha as true alpha can be viewed as a sum of several random variables (e.g., fund-specific traits).
Third, Normal distribution provides technical tractability to derive the iteration scheme used in our approach (see
details in the Appendix A.1). Finally, even though each skill group is assumed to follow a Normal distribution, the
composite distribution is non-Normal with fat-tails, consistent with the empirical distribution for our data.
8
Next, evaluating the integral for each type j, we have:8
𝑓(�̂�𝑖) = ∑ 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗),𝐽𝑗=1
where (𝜎𝑖,𝑗)2 = (𝑠𝑖)2 + (𝜎𝑗)2.
(4)
This equation characterizes the density function for estimated alpha of fund i. Compared
with Equation (1), the combined variance 𝜎𝑖,𝑗 incorporates two sources of variation in estimated
alpha—fund-specific estimation error 𝑠𝑖 and within-group variation 𝜎𝑗. Empirically, we find the
average 𝑠𝑖 across funds (as shown in Table 2) to be of the same order of magnitude as 𝜎𝑗 (as
shown in Table 3). This suggests that the two sources of variations are roughly equally
important. Given the importance of estimation error, making inferences based on estimated alpha
alone without considering estimation error would lose a significant amount of information about
fund skill.
Figure 2 illustrates the effects of the two sources of variation from 𝑠𝑖 and 𝜎𝑗. Suppose we
obtain a positive estimated alpha �̂� for a fund. Here, both sources of variation in the estimated
alpha affect our inference. The fund may come from the zero-skill group but exhibit a positive �̂�
due to 𝜔 and estimation error, i.e., (𝜔0 + 𝑒0). Alternatively, this fund may come from the
positive-skill group, but a net negative (𝜔1 + 𝑒1) leads to an estimated alpha that is smaller than
the group mean. Thus, we are uncertain exactly which group the fund comes from. As such, our
method allows a zero-skill fund to appear to have positive estimated alpha, as well as allowing a
skilled fund to have small (close to zero) estimated alpha. In other words, luck (𝑒𝑖) can affect
both unskilled and skilled funds in our model.9
8 The detailed derivation of Equation (4) is provided in Equation (A6) of the Appendix A1.
9 This is different from Barras, Scaillet, and Wermers (2010), who assume perfect test power by using a large test
size (e.g., 30%) and hence rule out the possibility that luck can affect skilled fund.
9
Once we have the estimates of {𝜇𝑗 , 𝜎𝑗 , 𝜋𝑗} (using the estimation procedure described
below in Section 2.3), we can characterize 𝑓(�̂�𝑖) using by Equation (4) and make inference
about each fund’s skill. Specifically, we can make a probabilistic statement about how likely a
fund is from each skill group, and use the probability as the basis for our performance measure.
The conditional probability that the fund belongs to group j equals:10
𝑃𝑗 = 𝑃𝑟𝑜𝑏(fund 𝑖 is from group 𝑗|�̂�𝑖, �̂�𝑖) = 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗)/𝑓(�̂�𝑖). (5)
For ease of illustration, we order the groups such that 𝜇1 < 𝜇2 < ⋯ < 𝜇𝐽, and thus group
J has the highest mean skill. From the perspective of investment practice, our focus is on the
conditional probability that a fund comes from the group with highest mean skill, namely 𝑃𝐽. The
higher 𝑃𝐽 is, the more likely the fund has superior skill. The idea of this performance measure is
as follows. When we make inference about fund i’s skill, the estimation error of its alpha affects
the relative importance between our prior based on the cross-sectional distribution and the fund’s
estimated alpha. In Equation (4), the total variation of estimated alpha �̂�𝑖 is decomposed into
fund-specific estimation error 𝑠𝑖 and within-group variation 𝜎𝑗. Hence, if fund alpha is estimated
with high precision (i.e., when 𝑠𝑖 is small), our method assigns a relatively great weight to
estimated alpha. On the other hand, if estimated alpha is of low precision, we rely less on
estimated alpha and more on the prior. In the extreme case, when 𝑠𝑖 goes to infinity (i.e., no
precision), the 𝑃𝑗 measure converges to 𝜋𝑗 (i.e., our prior knowledge).
10 To avoid notational clutter, we do not index 𝑃𝐽 for fund i, though it is a function of fund i’s estimated alpha and
standard error.
10
2.2. Relation of our method to existing studies
Our estimation method builds on several existing studies. First, our paper is related to
Kosowski, Timmermann, White, and Wermers (2006) and Fama and French (2010), who control
for the effect of estimation error when inferring fund skill. However, unlike their studies that
focus on whether the performance of top-performed managers comes from skill or luck, we
estimate the probability that a fund comes from the highest-skilled group and our 𝑃𝐽 measure
allows us to rank and compare skill across many funds.
Second, our approach is closely related to Barras, Scaillet, and Wermers (2010), who
apply the false discovery method to infer fund skill. Similar to their work, we consider the
existence of multiple skill groups. However, unlike their study assuming perfect test power by
using a large test size (e.g., a size of 30%), we allow for imperfect power that an estimated alpha
close to zero could be a good (bad) manager with bad (good) luck.11 More importantly, while
their inference relies on the alpha t-statistic, we propose a fund-specific performance measure 𝑃𝐽
that accounts for estimated alpha and estimation error separately, and we show empirically in
Section 4 that this separation leads to a significant improvement in out-of-sample investment
value.
Third, our measure 𝑃𝐽 depends on both a fund’s own estimated alpha and the performance
of other funds. As a result, learning about skill of other funds provides useful information to infer
skill for a given fund. This is related to Jones and Shanken (2005), who study how learning
across funds affects the inference about the cross-sectional distribution of fund skill in a
Bayesian framework. Our study extends their idea of cross learning to a setting of asset
11 See Ferson and Chen (2015) for a study that considers imperfect power when inferring fund skill in the false
discovery framework.
11
allocation across many funds. In out-of-sample tests, we show that the information about the
cross sectional distribution of skill indeed has important implications for asset allocation across
funds. Furthermore, our method accommodates a mixture of multiple skill groups, while theirs
considers only one group. As shown below, our empirical analysis strongly rejects one skill
group and favors a mixture of multiple skill groups in hedge funds. In fact, the existence of
multiple skill groups is a necessary condition for obtaining a ranking of skill across funds.12
2.3. The estimation procedure
We now introduce the procedure to estimate the parameters with more technical details
provided in the Appendix A.1. As explained above, our goal is to estimate a set of parameters
{𝜇𝑗, 𝜎𝑗, 𝜋𝑗} that define the cross sectional distribution of true skill. To do so, we need to aggregate
information about all individual funds’ estimated alphas. However, these estimated alphas are
not true skill but estimates with noises. As a result, we refine our inference about each fund’s
skill by calculating the probability it belongs to each sub-distribution. As shown in equation (5),
such probability estimates in turn depend on the cross sectional distribution of true skill. Thus,
we have a simultaneous estimation problem.
12 Jones and Shanken (2005, p.545) state that “although two examples demonstrate the substantial effects of learning
on the allocation to a particular fund, the implications for asset allocation across funds remain unexplored.” Since
they assume one skill group, all funds come from the same distribution in their framework. As a result, learning
about the cross-sectional distribution of skill provides the same information for all funds, and ranking based on
posterior alpha would be the same as ranking based on the t-statistic. In our study, however, we consider multiple
skill groups and allow the probability of belonging to each group to be different across funds. Therefore, learning
from the cross sectional distribution is different for different funds.
12
We use a modified Expectation-Maximization (EM) algorithm to simultaneously estimate
the cross sectional parameters {𝜇𝑗, 𝜎𝑗 , 𝜋𝑗} and individual funds’ conditional probabilities.13 The
EM algorithm uses iterations of two steps. First, the expectation step calculates a conditional
probability that �̂�𝑖 is from group j, given previous estimates of the group parameters (or preset
initial values in the case of the first iteration) and the funds’ estimated alpha and standard error.
The expectation step refines our estimates of the fund’s skill based on the cross-sectional
information. Then, the maximization step aggregates the skill distribution of individual funds to
obtain updated estimates of the cross sectional parameters. These two steps iterate until
parameter estimates converge, and through the iteration process we solve the simultaneous
estimation problem.
In our setting, however, estimation errors in alphas complicate the EM algorithm, as the
estimator 𝜎𝑗 in the maximization step is highly non-linear without a closed-form solution. To
overcome this difficulty, we modify the EM algorithm by deriving a separate iteration scheme to
estimate 𝜎𝑗 until convergence. To the best of our knowledge, our method is the first to
incorporate estimation errors in the EM algorithm. The Appendix A.1 describes the details of the
algorithm.
Our method is flexible about the number of skill types. This is useful for inferring skill
among entities (such as hedge funds) with highly heterogeneous skill. Empirically, we use the
13 The original EM algorithm developed by Dempster, Laird, and Rubin (1977) has been widely used in estimation
and inference related to mixture distribution models and incomplete data. See, e.g., Wu (1983), Rudd (1991),
McLachlan and Peel (2000), and McLachlan and Krishnan (2008). The EM algorithm, though powerful, was rarely
used in finance research. Some exceptions are Kon (1984) who examines the distribution of daily stock returns and
Asquith, Jones, and Kieschnick (1998) who study the heterogeneity of IPO returns. However, these previous studies
examining stock returns do not deal with the complicating effects of estimation errors as we do, since in our setting
fund alphas are not directly observed but estimated with estimation errors.
13
Bayesian Information Criterion (BIC) to identify the number of skill groups that best fit the
actual data and confirm the existence of multiple skill groups in hedge funds.
We perform sensitivity tests to validate our method. First, to address the concern that our
parameter estimation of {𝜇𝑗, 𝜎𝑗 , 𝜋𝑗} might be sensitive to the choice of initial values, we
experiment with a grid of initial values and search for the global maxima of the likelihood
function. Second, we run simulations in which we generate artificial datasets with known
(population) group parameters, and compare the parameter estimates from the artificial datasets
to their population counterparts. We find that our parameter estimates are reasonably close to the
true parameters. Third, we run simulations in which we set values of alpha, and then rank the
funds using our performance measure 𝑃𝐽 in comparison with alternative measures of estimated
alpha and its t-statistic. The results show that the measure 𝑃𝐽 is better able to identify skilled
funds than the alternative measures. Furthermore, an investment strategy based on our
performance measure outperforms those based on the alternative measures. (In Section 4, we
perform this comparison with actual hedge fund data in out-of-sample tests.) These results
validate our estimation procedure. The Appendix A.2 provides the details of the simulations.
3. The Data
3.1. Hedge funds
For our empirical analysis, we employ a large sample of hedge funds by merging data
from the Lipper TASS and Hedge Fund Research (HFR) databases. Although these databases
contain fund returns going back to as early as 1977, they do not retain information of defunct
funds before 1994 and thus data in early years have survivorship bias (Fung and Hsieh, 2000;
14
Liang, 2000). To mitigate survivorship bias (see Brown, Goetzmann, Ibbotson, and Ross, 1992),
we focus on the period from January 1994 onwards. Following the hedge fund literature, we only
include funds that report net-of-fee returns on a monthly basis and have at least 24 months of
returns. We exclude funds of funds from our analysis. Over the period January 1994–December
2011, our sample contains 8,695 funds, of which 3,076 are alive as of the end of the sample
period and 5,619 are defunct funds.
Table 1 reports summary statistics of fund returns. The average fund age in our sample is
73 months, slightly longer than six years. The mean (median) return is 0.70% (0.65%) per month
or about 8.40% (7.80%) per year. At the top 25th percentile, the mean return is 1.01% per month
or about 12.12% per year. The average return volatility is 3.44% per month. Higher moments of
fund returns suggest negative skewness and fat tails relative to a Normal distribution. Consistent
with prior research, fund returns exhibit autocorrelation; the average first-order autocorrelation is
0.15. Such autocorrelation is interpreted in prior studies as an indication of illiquidity holdings or
return smoothing (e.g., Getmansky, Lo, and Makarov, 2004).
3.2. Correcting backfill bias
Hedge fund returns, as voluntarily reported, may have potential backfill bias (e.g., Fung
and Hsieh, 2000; Liang, 2000). This bias arises as historical returns are often backfilled when
new funds are added into a database. Since funds with good track records tend to join a database,
neglecting backfilling generates an upward bias in average fund return. This is similar to the
incubation bias (Evans, 2010). In the hedge fund literature, a typical treatment is to drop the first
one or two years’ data. When estimating alpha for funds, it is common to require a certain
number of observations (e.g., 24 months) to ensure test power. Thus, when early years are
15
truncated, only funds with relatively long track records remain in the analysis, which may
introduce a survivorship bias.
We propose a new way to correct backfill bias by simply adding an incubation dummy
variable to the factor regression when estimating alpha. Specifically, the dummy takes a value of
one for the backfill period, i.e. from the month when a fund’s return becomes available in the
data to the month when the fund joined the database. This dummy variable captures the
incremental return during the backfill period, which we allow to vary for each fund. The
advantages of our approach are that, first, we more accurately capture the actual backfill period
than applying a same backfill period to all funds, and second, it retains a more complete fund
history, which provides more information for the alpha estimates.
3.3. Factor model
As hedge funds trade across different asset classes, a multi-factor model is often used to
capture their risk exposures. We use the Fung and Hsieh (2004) seven-factor model as the
benchmark model to estimate fund alpha. The seven factors include an equity market factor, a
size factor, the change in the constant maturity yield of the ten-year Treasury, the change in the
spread between Moody’s Baa yield and the ten-year treasury, and three trend-following factors
for bonds, currency, and commodities.14 Our regression model has the following specification:
𝑟 𝑖,𝑡 = 𝛼 + 𝛼1𝐼(𝑡 ≤ 𝑡𝑖,𝐵) + 𝜷′𝒇𝑡 + 𝜀𝑖,𝑡, (6)
14 The bond, currency and commodity trend-following factors are constructed as portfolios of lookback straddle
options on these assets (see Fung and Hsieh, 2001). The data on these factors are obtained from David Hsieh’s
website at http://faculty.fuqua.duke.edu/_dah7/DataLibrary/TF-FAC.xls.
16
where 𝑟𝑖,𝑡 is fund i’s return in excess of the risk-free rate (proxied by the one-month T-bill rate)
in month t, 𝐼(∙) is an indicator function, 𝑡𝑖,𝐵 is the month when fund i starts to report a database,
and the vector f denotes the seven factors. The intercept 𝛼 is the fund’s alpha, and the coefficient
on the dummy variable controls for potential backfill bias.
3.4. Estimated alpha
Table 2 describes the result on estimated alpha, with and without controlling for backfill
bias. As can be seen, controlling for backfill bias is important. Without the control, the average
alpha for all funds is 0.39% per month, while with the control the average alpha shrinks to 0.11%
per month. Hence, neglecting backfill bias would inflate alpha estimate by about threefold.
Backfill bias appears particularly strong for defunct funds. The average alpha drops from
0.32% without the control to virtually zero after the adjustment. It turns out that the observed
positive alpha for defunct funds is mostly concentrated in the backfill period. This result suggests
that those funds that had no skill but joined the database after good incubation returns are more
likely to fail. Thus, the superior performance observed for early months is likely to reflect a
backfill bias.15 In addition, live funds substantially outperform defunct funds, confirming the
importance of controlling for the survivorship bias. The alpha estimates and their standard errors,
after adjusting for backfill bias, are used in our later analysis as the main inputs to make
inferences about hedge fund skill.
15 Aggarwal and Jorion (2010) find that, after adjusting for data biases, emerging hedge funds exhibit strong
performance, which may reflect new funds’ incentive to perform well.
17
4. Empirical Results
This section reports the main results from our empirical analysis. We first present the
estimates of the parameters governing the different skill distributions. We then explore the
performance persistence based on our performance measure. Finally, we compare the out-of-
sample performance of an investment strategy based on our performance measure with that of
alternative strategies based on estimated alpha and its t-statistic.
4.1. The distributions of fund skill
As we do not observe the number of skill groups in data, we estimate our model using
different values of 𝐽 = 2, 3, 4, 5, 6, 7 and then compare model fit using the BIC. The BIC result
indicates four skill groups in our sample. The BIC of 3, 4 and 5 groups are 16927.29, 16911.10
and 16956.31, respectively. The BIC of other numbers of groups are even greater than that of
five groups. For robustness, we conduct a Likelihood Ratio test for the null H0: J = 3 (and J=5)
against H1: J = 4, which rejects the null at the 1% significance level, suggesting that the case of
four groups is significantly better than other numbers of groups in terms of fitting the data.
Table 3 reports the parameter estimates. Since two skill groups have positive mean alpha,
we refer to the groups as “Excellent” (�̂� =0.72% per month), “Good” (�̂� = 0.35%), “Neutral” (�̂�
= 0% by assumption), and “Bad” (�̂� = −0.60%). The estimates of 𝜋𝑗 suggest the composition of
funds is 9.3% excellent, 38.4% good, 43.0% neutral, and 9.3% bad. For investing in hedge funds
in practice, the focus should be on the excellent group that has highest mean skill. The estimated
variability of skill, i.e., �̂�𝑗, of the excellent and bad groups is higher than that in the two middle
groups, suggesting that funds with extreme skill have less in common and more specific to
18
themselves. This is consistent with Sun, Wang, and Zheng (2012) who find that hedge funds with
distinctive return patterns from peer funds tend to have better performance. Equation (4)
decomposes the total variation in fund skill into the within-group variation and fund-specific
estimation error. We find that the estimated within-group variations �̂�𝑗 are in the same order of
magnitude as the fund-specific estimation error as reported in Table 2, which suggests that both
of them are important determinants of the total variation in estimated alphas.
The fraction of funds with positive skill from our estimation is significantly higher than
that judged by the t-statistic. Based on our method, 48% of the sample funds belong to either the
Excellent or the Good group, whereas only 20.3% of the funds have t-statistic greater than 1.65
(see Table 2). The false discovery rate based on the Barras, Scaillet, and Wermers (2010) method
is 3.7%, so after adjusting for the false discovery problem, the fraction of skilled funds inferred
by the t-statistic would be even smaller (i.e., 20.3%–3.7%=16.6%) at the size of 10%. Thus, the
result suggests that accounting for imperfect test power (i.e., allowing skilled funds to have bad
luck) is important.
We employ simulations to assess the statistical significance of our parameter estimates.
We construct 1,000 artificial samples by drawing from the original sample with replacement. We
estimate the parameters in each artificial sample, and then calculate standard errors as the
standard deviation of the parameter estimates across the simulations. The bootstrapped standard
errors, reported in parentheses Table 3, suggest that our parameters are estimated with reasonable
precision. For instance, in the excellent group �̂�𝐸 , �̂�𝐸 , and �̂�𝐸 are all more than three standard
errors above zero.
In sum, using a modified EM algorithm, we estimate skill distributions for hedge funds.
Our results suggest that a significant portion of hedge funds have skill more than just covering
19
their fees. The finding is in sharp contrast with previous results (e.g., Barras, Scaillet, and
Wermers, 2010; Fama and French, 2010) for mutual funds where few funds are found to deliver
alpha after fees.
4.2. Performance persistence
Now, we address another important question: Does superior performance persist in hedge
funds?16 In our setting, testing performance persistence is important because it can validate our
grouping technique. If our grouping contains no information about fund skill, then there would
be no persistence in performance identified by our method. Otherwise, if our method identifies
skilled funds with superior performance, we expect a certain level of performance persistence.
We examine performance persistence using portfolios formed in rolling windows. In each
month starting from January 1996, we estimate the group parameters {𝜇𝑗, 𝜎𝑗, 𝜋𝑗} and each fund’s
performance measure 𝑃𝑗 from the previous 24 months. Then, we assign funds into one of four
skill-based portfolios—Excellent, Good, Neutral, or Bad. Specifically, each fund receives four
conditional probabilities (i.e., 𝑃𝑗’s) corresponding to the different skill groups. In each of the
rolling 24-month subperiods, for the fraction of each skill group estimated by our algorithm in
that subperiod, we assign that fraction of funds that show the highest conditional probability of
belonging to that group.17 For example, if 10% of the funds are excellent in a subperiod
according to our estimation procedure, then we assign the 10% of funds with the highest
16 Prior research shows mixed evidence about performance persistence in hedge funds. Brown, Goetzmann, and
Ibbotson (1999) and Agarwal and Naik (2000) find little support for performance persistence, while Kosowski,
Naik, and Teo (2007) and Jagannathan, Malakhov, and Novikov (2010) document significant evidence of
performance persistence.
17 In reality, some years may have the number of skill groups different from four. However, assuming four groups
for the whole sample facilitates the presentation of the results. For robustness, we allow the group number to change
over time, and our inference about performance persistence is unchanged.
20
conditional probability of being excellent 𝑃𝐽 to the excellent group. This way, we assign funds
into each group sequentially so that no fund will be assigned into multiple groups. As a result,
four equal-weighted portfolios are formed with funds out of these groups. The portfolios are
rebalanced monthly and held for different periods from three months to three years. The group
parameters are re-estimated each month so that we only use information up to the month of
portfolio formation. Funds that disappear during a holding period are included in the equal-
weighted portfolio until they disappear, and then their weights are reallocated to the remaining
funds. In practice, it may not be realistic to immediately invest into these portfolios after
formation, so we insert a one-month waiting window between the formation period and the
holding period.
Table 4 presents strong evidence of performance persistence. The out-of-sample alpha of
the excellent portfolio is both economically and statistically significant. For example, for a 12-
month holding period, the excellent portfolio has an alpha of 0.46% per month (t-statistic =
6.47), or about 5.52% per year. Further, the excellent portfolio outperforms the other portfolios
significantly for as long as three years. The alpha spread between the excellent and the bad
portfolios is about 0.48% per month (t-statistic = 5.69) for a 12-month holding period. The result
of performance persistence suggests that our method groups funds with different skill well.
4.3. Comparing the 𝑷𝑱 measure with estimated alpha and t-statistic
Next, is our measure 𝑃𝐽 better at identifying skilled funds than the conventional measures
of estimated alpha and its t-statistic? A priori, we have good reasons to believe so. First, using
estimated alpha alone omits important information about estimation precision. Second, though
21
the t-statistic equals estimated alpha multiplied by its precision (i.e., the inverse of the standard
error of estimated alpha), it does not differentiate the contribution from the two components. As
a result, the t-statistic only tells whether we can reject the null hypothesis of zero skill, but it
cannot be used to rank individual fund skill. In contrast, our performance measure adjusts for
precision of estimated alpha more efficiently by weighing the prior information and estimated
alpha.
Here, we make a comparison based on actual data. In particular, since most funds of
hedge funds (FOFs) invest in about 20-80 hedge funds (Brown, Gregoriou, and Pascalau, 2012),
we form a strategy by selecting top 20 funds based on the 𝑃𝐽 measure, and compare its out-of-
sample performance with that of alternative strategies picking top 20 funds based on estimated
alpha and t-statistic. Similar to the procedure in Table 4, in each month starting from January
1996, we compute the performance measure 𝑃𝐽 for each fund from the previous 24 months.
Then, we form an equal-weighted portfolio of investing in top 20 funds ranked by 𝑃𝐽.18 In a
similar way, we form two other equal-weighted portfolios by selecting top 20 funds based on
estimated alpha and t-statistic from the previous 24 months. The three portfolios are all
rebalanced monthly and held for different periods.
Table 5 reports the out-of-sample performance for the three strategies. The strategy based
on the 𝑃𝐽 measure significantly outperforms the other two for up to 24 months. For example, for
a six-month holding period, the portfolio of top funds ranked by our 𝑃𝐽 measure generates a risk-
adjusted return of 0.72% (t-statistic = 7.36), whereas the other two strategies yield 0.45% (t-
18 Note that if we hold the portfolio for multiple months, the actual number of funds held will exceed 20 since some
of the top 20 funds in one month will not stay in top 20 in later months. For example, for a three-month holding
period, the number of funds held in the portfolio will fall in the range of 20-60, depending on the transition of top
funds over time. We report the transition probabilities for the hedge funds in our sample in the next section.
22
statistic = 2.52) and 0.35% (t-statistic = 6.03), respectively. In untabulated test, we find that the
strategy based on the 𝑃𝐽 measure selects quite different funds from the other two strategies.19
This confirms that our 𝑃𝐽 measure is substantially different from both estimated alpha and its t-
statistic, and it provides more accurate information about fund skill.
Hedge funds with longer lockup periods generally have better performance (Aragon
2007; Agarwal, Daniel and Naik, 2009). Hence, we are concerned that the 𝑃𝐽 measure may
simply select funds with long lockup periods that restrict money redemption. As a robustness
check, in Panel B of Table 5, we repeat the analysis by removing the funds with lockup periods
longer than three months, and our inference is unchanged. Thus, the superior performance of
𝑃𝐽 is not driven by funds with long lockup periods.
5. Additional Tests and Robustness Checks
In this section, we conduct additional tests to gain further insights about hedge fund skill
as well as check the robustness of our results. We start with examining the transition
probabilities across the skill groups. Next, we test the relation between fund skill and investor
flows. Then, we link fund skill to fund characteristics. Finally, we examine the sensitivity of our
results to alternative factor models.
19 In fact, the fraction of common funds selected by the 𝑃𝐽 measure and t-statistic is 48.5% for the sample, and the
fraction of common funds selected by 𝑃𝐽 and estimated alpha is only 25.7%.
23
5.1. Transition probabilities across skill groups
In Table 6, we present the transition probabilities across the skill groups. For each month
we assign funds into one of four skill groups based on their 𝑃𝐽 measures from the previous 24
months. We then check how likely funds in each group remain in the group in 3, 6 and 12
months conditional on fund survival. As shown in Panel A, in three months, about 59% of
excellent funds will remain to be excellent and 71% of good funds remain to be good.
Furthermore, it is highly unlikely for an excellent fund in the current month to become a neutral
or bad fund in the future. As discussed above, it is natural to expect some decay in skill as
informational advantages erode over time. On the other hand, most bad funds either remain bad
or improve to become neutral if they continue to survive.
5.2. Fund skill and investor flows
How do investors affect and respond to fund skill? Given the persistence of performance,
can investors infer fund skill from past performance? We answer the questions by examining the
relation between fund skill and both prior and subsequent investor flows. As before, we use a 24-
month period to evaluate fund skill. Then, we examine fund flows subsequent to the evaluation
period as well as prior to the period. Following prior research (e.g., Sirri and Tufano, 1998), we
measure fund flows as the percentage change of fund total assets adjusting for fund returns.
Table 7 reports the results. First, investor flows chase past fund performance, as indicated
by a significantly higher level of money flowing into recent excellent and good funds than into
the other two groups. This is consistent with prior findings of Goetzmann, Ingersoll and Ross
(2003) and Getmansky, Liang, Schwarz, and Wermers (2010), suggesting that hedge fund
24
investors infer managerial skill from past fund performance. Meanwhile, the level of money
flows prior to the evaluation period is similar across the skill groups.
5.3. Skill type and fund characteristics
Table 8 relates the skill types to fund characteristics by regressing the performance
measures (i.e., conditional probabilities) on various fund characteristics. Funds with high
probability of being excellent or good tend to be large funds. These funds charge high
management and incentive fees and have long lockup and redemption notice periods. On the
other hand, neutral and bad funds seem unable to retain capital and have small fund size; they
also charge less incentive fee, perhaps indicating a lack of confidence in adding value. The
results are consistent with prior studies (e.g., Aragon 2007; Agarwal, Daniel and Naik, 2009).
Since our approach separates funds into different skill groups, our analysis has the richness to
check the relation between each skill type and fund characteristics, rather than examining the
association between one performance measure (e.g., estimated alpha) and fund characteristics. In
untabulated tests, we assign funds into four skill groups and perform probit regressions and the
main results are unchanged. Given the relationship between fund skill and fund characteristics,
future work can incorporate fund characteristics into the estimation procedure. We leave this
extension for future research.
5.4. Fund skill by investment styles
As differential investment styles may have different skill levels, we now examine the
skill distribution within each investment style. Since the two databases—TASS and HFR—use
25
somewhat overlapping but not identical classifications for investment strategies. We follow
Agarwal, Daniel and Naik (2009) to reclassify the funds into four broad styles: directional trade,
relative value, security selection, and multiple strategies. We exclude 479 funds in the analysis,
as their strategy information is either undefined or missing. Table 9 reports the results. Overall,
we observe a certain extent of variation in skill distribution across the styles.20 The directional
trade strategy group has the highest fraction (14.3%) of excellent funds and the skill distribution
among the remaining strategies tends to be similar in general. Meanwhile, we find that the
variability of skill in the directional trade style is also higher than that for the other styles.
Finally, the result also suggests that our skill types do not simply reflect the difference in
investment styles, since no single style seems to dominate others in terms of skill distribution.
5.5. Alternative factor models
So far, we have estimated fund alpha using the Fung and Hsieh seven-factor model. In
untabulated tests, we confirm the robustness of our results to alternative factor models. First,
Agarwal and Naik (2004) show that returns of several hedge fund strategies bear significant
exposure to factors built on returns on S&P 500 index options. We augment the Fung-Hsieh
factor model with two out-of-money call and put option factors proposed by Agarwal and Naik
(2004). Second, given their dynamic strategies, hedge funds’ risk exposures can vary over time.
To control for the potential impact of time-varying risk exposures on alpha estimate, we use the
Ferson and Schadt (1996) conditional model in which funds’ market beta varies with lagged
macro variables such as the three-month T-bill rate, a term spread, a default spread, and the
20 The variability of alpha and bootstrap standard errors for the parameters are not reported in the table to conserve
space, but they are available upon request.
26
dividend yield of the S&P index. Third, Getmansky, Lo, and Makarov (2004) show that hedge
fund returns exhibit substantial serial correlation. Serial correlation in returns can bias the
estimate of risk exposure (e.g., Scholes and Williams, 1977). In the spirit of Scholes and
Williams, we alleviate this concern by adding the one- and two-month lagged market returns to
the base model. Overall, our inference remains unchanged by using these alternative models. The
test details are untabulated to conserve space but available upon request.
6. Conclusions
In this paper, we present a new approach as well as empirical evidence about hedge fund
skill. Our study is motivated by the existence of skill and its heterogeneity among hedge funds.
Our method groups funds by skill based on their estimated alphas and standard errors, allowing
for the influence of luck. By assuming each fund comes from one of several distinct skill groups,
we use a modified EM algorithm to infer the number of skill groups, the fraction of funds in each
group, and the conditional probability that a fund belongs to each group.
Using data on monthly returns of a large sample of hedge funds, we find that about 48%
of the funds have positive skill, 43% zero skill, and 9% negative skill. Further, an investment
strategy based on our performance measure generates highly significant out-of-sample risk-
adjusted performance. We also show that our performance measure outperforms traditional
measures of estimated alpha and t-statistic in identifying skilled funds. Our inference is robust to
a wide array of sensitivity checks.
Our approach is flexible about the number of skill groups, which need not be restricted to
any particular number. Also, the only inputs required in our approach are estimated alpha and its
27
estimation error. Given these features, our approach can be applied to detect skill for other types
of investment vehicles such as private equity funds and other forms of skill such as market
timing ability, as long as skill is heterogeneous and so multiple skill groups exist.
28
References
Ackermann, Carl, Richard McEnally, and David Ravenscraft, 1999, The performance of hedge
funds: Risk, return, and incentives, Journal of Finance 54, 833–974.
Agarwal, Vikas and Narayan Naik, 2000, Multi-period performance persistence analysis of
hedge funds, Journal of Financial and Quantitative Analysis 35, 327–342.
Agarwal, Vikas and Narayan Naik, 2004, Risk and portfolio decisions involving hedge funds,
Review of Financial Studies 17, 63–98.
Agarwal, Vikas, Naveen Daniel, and Narayan Naik, 2009, Role of managerial incentives and
discretion in hedge fund performance, Journal of Finance 64, 2221−2256.
Aggarwal, Rajesh, Philippe Jorion, 2010, The performance of emerging hedge funds and
managers, Journal of Financial Economics 96, 238–256.
Aragon, George, 2007, Share restrictions and asset pricing: Evidence from the hedge fund
industry, Journal of Financial Economics 83, 33–58.
Asquith, Daniel, Jonathan Jones, and Robert Kieschnick, 1998, Evidence on price stabilization
and underpricing in early IPO returns, Journal of Finance 53, 1759–1773.
Bali, Turan, Stephen Brown, and Mustafa Caglayan, 2011, Do hedge funds’ exposures to risk
factors predict their future returns? Journal of Financial Economics 101, 36–68.
Barras, Laurent, Olivier Scaillet, and Russ Wermers, 2010, False discoveries in mutual fund
performance: Measuring luck in estimated alphas, Journal of Finance 65, 179–216.
Brown, Stephen and William Goetzmann, 1997, Mutual fund styles, Journal of Financial
Economics 43, 373–399.
Brown, Stephen, William Goetzmann, and Roger Ibbotson, 1999, Offshore hedge funds:
Survival and performance, 1989-95, Journal of Business 72, 91–117.
Brown, Stephen, William Goetzmann, Roger Ibbotson, and Stephen Ross, 1992, Survivorship
bias in performance studies, Review of Financial Studies 5, 553–580.
Brown, Stephen, Greg Gregoriou and Razvan Pascalau, 2012, Diversification in funds of hedge
Funds: Is it possible to overdiversify? Review of Asset Pricing Studies 2, 89-110.
Dempster, Arthur, Nan Laird, and Donald Rubin, 1977, Maximum likelihood from incomplete
data via the EM algorithm, Journal of the Royal Statistical Society Series B 39, 1–38.
Evans, Richard, 2010, Mutual fund incubation, Journal of Finance 65, 1581–1611.
29
Fama, Eugene and Kenneth French, 2000, Luck versus skill in the cross-section of mutual fund
returns, Journal of Finance 65, 1915–1947.
Ferson, Wayne and Yong Chen, 2015, How many good and bad fund managers are there, really?,
Working paper, University of Southern California and Texas A&M University.
Ferson, Wayne and Rudi Schadt, 1996, Measuring fund strategy and performance in changing
economic conditions, Journal of Finance 51, 425–460.
Fung, William and David Hsieh, 2000, Performance characteristics of hedge funds and
commodity funds: Natural vs spurious biases, Journal of Financial and Quantitative Analysis 35,
291–307.
Fung, William and David Hsieh, 2001, The risk in hedge fund strategies: Theory and evidence
from trend followers, Review of Financial Studies14, 313–341.
Fung, William and David Hsieh, 2004, Hedge fund benchmarks: A risk-based approach,
Financial Analysts Journal 60, 65–80.
Getmansky, Mila, Bing Liang, Chris, Schwarz, and Russ Wermers, 2011, Share restrictions and
investor flows in the hedge fund industry, working paper, University of Massachusetts and
University of Maryland.
Getmansky, Mila, Andrew Lo, and Igor Makarov, 2004, An econometric model of serial
correlation and illiquidity in hedge fund returns, Journal of Financial Economics 74, 529–609.
Goetzmann, William, Jonathan Ingersoll, and Stephen Ross, 2003, High-water marks and hedge
fund management contracts, Journal of Finance 58, 1685–1717.
Jagannathan, Ravi, Alexey Malakhov, and Dmitry Novikov, 2010, Do hot hands exist among
hedge fund managers? An empirical evaluation, Journal of Finance 65, 217–255.
Jones, Christopher and Jay Shanken, 2005, Mutual fund performance with learning across funds,
Journal of Financial Economics 78, 507–552.
Kon, Stanley, 1984, Models of stock returns: a comparison, Journal of Finance 39, 147–165.
Kosowski, Robert, Narayan Naik, and Melvyn Teo, 2007, Do hedge funds deliver alpha? A
bayesian and bootstrap analysis, Journal of Financial Economics 84, 229–264.
Kosowski, Robert, Alan Timmermann, Halbert White, and Russ Wermers, 2006, Can mutual
fund “stars” really pick stocks? New evidence from a bootstrap analysis, Journal of Finance 61,
2551–2595.
Li, Haitao, Xiaoyan Zhang, and Rui Zhao, 2011, Investing in talents: Manager characteristics
and hedge fund performances, Journal of Financial and Quantitative Analysis 46, 59–82.
30
Liang, Bing, 1999, On the performance of hedge funds, Financial Analysts Journal 55, 72–85.
Liang, Bing, 2000, Hedge funds: The living and the dead, Journal of Financial and Quantitative
Analysis 35, 309-326.
McLachlan, Geoffrey and Thriyambakam Krishnan, 2008, The EM algorithm and extensions
(2nd edition), John Wiley & Sons, New York.
McLachlan, Geoffrey and David Peel, 2000, Finite mixture models, John Wiley & Sons, New
York.
Rudd, Paul, 1991, Extensions of estimation methods using the EM algorithm, Journal of
Econometrics 49, 305–341.
Scholes, Myron and Joseph Williams, 1977, Estimating betas from nonsynchronous data,
Journal of Financial Economics 5, 309–328.
Schwarz, Gideon, 1978, Estimating the dimension of a model, Annals of Statistics 6, 461–464.
Sirri, Eric and Peter Tufano, 1998, Costly search and mutual fund flows, Journal of Finance 53,
1589–1622.
Sun, Zheng, Ashley Wang, and Lu Zheng, 2012, The road less traveled: Strategy distinctiveness
and hedge fund performance, Review of Financial Studies 25, 96–143.
Titman, Sheridan and Cristian Tiu, 2011, Do the best hedge funds hedge? Review of Financial
Studies 24, 123–168.
Wu, C. F. Jeff, 1983, On the convergence properties of the EM algorithm, Annals of Statistics
11, 95–103.
31
Table 1
Summary statistics
This table summarizes the sample of 8,695 hedge funds. Each fund is required to have at least 24 monthly
returns. T is the number of monthly observations for each fund. Backfill is the length of the backfill
period in months. �̅� is the average monthly return, and 𝜎 is the standard deviation of fund returns. 𝜌1 and
𝜌2 are the first- and second-order autocorrelations of fund returns, respectively. The sample period is
from January 1994 to December 2011.
Mean Median Std Dev 10% 25% 75% 90%
T 73 60 43 30 40 94 140
Backfill 28.95 21.00 26.63 3.00 9.00 40.00 72.00
�̅� 0.70 0.65 0.69 −0.01 0.33 1.01 1.48
𝜎 3.44 3.15 1.83 1.30 2.03 4.61 6.02
Skewness −0.20 −0.10 1.08 −1.40 −0.66 0.37 0.86
Ex. Kurtosis 2.83 1.44 4.77 −0.15 0.44 3.33 6.92
𝜌1 0.15 0.14 0.21 −0.10 0.01 0.27 0.42
𝜌2 0.06 0.05 0.18 −0.16 −0.06 0.17 0.29
32
Table 2
Estimated alphas and standard errors
This table summarizes estimated alphas and their standard errors from the sample funds based on the Fung-Hsieh seven-factor model. Alpha is
estimated as the intercept from the regression of fund excess returns on the factors. Panel A does not control for backfill bias, while Panel B
controls for backfill bias by including a dummy variable for each fund’s backfill period in the factor model regression. We report mean, median
and standard deviation of estimated alphas, as well as the percentage of funds exceeding the specified thresholds of t-statistic. Alpha is in percent
per month. N is the number of funds. The sample period is from January 1994 to December 2011.
Estimated Alpha Standard Error t-statistics
N Mean Median Std Dev Mean Median Std Dev % t<–1.96 % t<–1.65 % t>1.65 % t>1.96
Panel A: Without control for backfill bias
All funds 8,695 0.39 0.35 0.72 0.43 0.36 0.26 1.98 3.08 35.80 29.21
Live HF 3,076 0.52 0.47 0.63 0.40 0.34 0.23 0.91 1.33 45.12 36.48
Defunct HF 5,619 0.32 0.29 0.75 0.44 0.37 0.26 2.56 4.04 30.70 25.24
Panel B: With control for backfill bias
All funds 8,695 0.11 0.16 1.04 0.59 0.48 0.38 4.51 6.60 20.30 15.41
Live HF 3,076 0.32 0.31 0.83 0.55 0.45 0.36 2.47 3.84 27.31 20.87
Defunct HF 5,619 0.00 0.07 1.12 0.60 0.49 0.39 5.62 8.12 16.46 12.42
33
Table 3
Parameter estimates for fund skill distributions
This table presents the parameter estimates for different skill groups in the sample funds. Estimated alpha
is from the Fung-Hsieh seven-factor model with control for backfill bias. The Bayesian information
criterion suggests a mixture of four skill groups that are labeled as Excellent, Good, Neutral, and Bad,
respectively. The parameters �̂�𝑗, �̂�𝑗 and �̂�𝑗 are mean, variability, and the fraction of each skill group.
Alpha is in percent per month. Bootstrap standard errors are reported in parentheses. The sample period is
from January 1994 to December 2011.
�̂�𝑗 �̂�𝑗 �̂�𝑗
Excellent 0.722 0.362 0.093
(0.218) (0.128) (0.029)
Good 0.348 0.253 0.384
(0.132) (0.119) (0.036)
Neutral 0.000 0.177 0.430
( — ) (0.072) (0.033)
Bad −0.804 0.596 0.093
(0.221) (0.240) (0.026)
34
Table 4
Performance persistence
This table reports the result of performance persistence. We present the out-of-sample alpha for equal-
weighted portfolios consisting of hedge funds in different skill groups. For each fund in each month from
January 1996 through December 2011, we compute the fund’s conditional probability of coming from
each of the four skill groups (Excellent, Good, Neutral, and Bad), given the estimates from the previous
24 months. Then, the fund is assigned into a skill group depending on its conditional probabilities. Next,
we form equal-weighted portfolios of funds in the four skill groups. The portfolios are rebalanced
monthly and held for different holding periods. The out-of-sample alpha is estimated using the Fung-
Hsieh factor model. Alpha is in percent per month.
3 mo. 6 mo. 9 mo. 12 mo. 24 mo. 36 mo.
Excellent alpha 0.555 0.531 0.496 0.456 0.400 0.380
t-stat 7.407 7.472 7.004 6.471 5.882 5.794
Good alpha 0.287 0.271 0.263 0.254 0.243 0.239
t-stat 4.903 4.659 4.549 4.419 4.263 4.252
Neutral alpha 0.031 0.051 0.065 0.077 0.090 0.088
t-stat 0.509 0.853 1.088 1.281 1.491 1.446
Bad alpha −0.151 −0.099 −0.061 −0.027 0.014 0.041
t-stat −1.983 −1.335 −0.824 −0.372 0.190 0.566
Excellent–Good alpha 0.268 0.260 0.232 0.202 0.157 0.141
t-stat 6.921 7.575 7.022 6.234 5.215 4.899
Excellent–Neutral alpha 0.525 0.480 0.430 0.379 0.310 0.292
t-stat 7.254 7.045 6.387 5.740 4.940 4.830
Excellent–Bad alpha 0.707 0.629 0.556 0.483 0.387 0.339
t-stat 7.314 7.061 6.349 5.692 4.749 4.275
35
Table 5
Out-of-sample performance comparison
This table reports the out-of-sample alpha for three equal-weighted portfolios consisting of top 20 hedge
funds ranked by the 𝑃𝐽 measure, estimated alpha, and t-statistic, respectively, from January 1996 through
December 2011. 𝑃𝐽 is a fund’s conditional probability of being Excellent given the parameter estimates
from the previous 24 months. The portfolios are rebalanced monthly and held for different holding
periods. The out-of-sample alpha is estimated using the Fung-Hsieh factor model. Similarly, we form two
equal-weighted portfolios of top 20 funds ranked by estimated alpha and its t-statistic from the previous
24 months. Panel A presents the results using top 20 funds, and Panel B presents the results using only
top 20 funds whose lockup periods are shorter than 3 months.
Panel A
3 mo. 6 mo. 9 mo. 12 mo. 24 mo. 36 mo.
𝑃𝐽 alpha 0.820 0.716 0.652 0.607 0.552 0.491
t-stat 8.02 7.36 6.75 6.33 5.87 5.34
Est. alpha alpha 0.430 0.446 0.395 0.361 0.340 0.377
t-stat 2.25 2.52 2.36 2.24 2.21 2.55
t-statistic alpha 0.360 0.352 0.341 0.342 0.357 0.359
t-stat 5.79 6.03 6.29 6.58 7.00 7.05
𝑃𝐽– Est. alpha alpha 0.389 0.270 0.256 0.246 0.212 0.113
t-stat 2.56 1.95 2.03 2.09 1.97 1.12
𝑃𝐽– t-statistic alpha 0.459 0.364 0.310 0.266 0.195 0.132
t-stat 4.42 3.77 3.33 2.93 2.24 1.59
Panel B
3 mo. 6 mo. 9 mo. 12 mo. 24 mo. 36 mo.
𝑃𝐽 alpha 0.806 0.711 0.666 0.626 0.555 0.497
t-stat 7.58 6.98 6.69 6.36 5.85 5.37
Est. alpha alpha 0.434 0.439 0.453 0.425 0.348 0.362
t-stat 2.23 2.43 2.65 2.57 2.25 2.43
t-statistic alpha 0.365 0.346 0.33 0.326 0.324 0.313
t-stat 5.83 5.57 5.46 5.55 5.60 5.53
𝑃𝐽– Est. alpha alpha 0.372 0.272 0.213 0.200 0.207 0.136
t-stat 2.58 2.05 1.81 1.80 2.05 1.45
𝑃𝐽– t-statistic alpha 0.441 0.365 0.336 0.299 0.231 0.184
t-stat 4.26 3.70 3.50 3.20 2.64 2.18
36
Table 6
Transition probabilities
This table reports transition probabilities across the four skill groups from the current month to the next 3,
6 and 12 months. In each month from January 1996 through December 2011, we use a rolling window of
the previous 24 months to evaluate fund skill and form four groups based on funds’ conditional
probabilities of being Excellent, Good, Neutral, and Bad. Then, for each skill group we report the portion
of its funds that are Excellent, Good, Neutral, or Bad in the next 3, 6, and 12 months.
Excellent Good Neutral Bad
Panel A: Next 3 months
Excellent 58.64% 38.54% 2.59% 0.23%
Good 9.92% 70.77% 18.60% 0.70%
Neutral 0.81% 18.63% 69.23% 11.33%
Bad 0.33% 3.29% 37.20% 59.18%
Panel B: Next 6 months
Excellent 45.29% 47.32% 6.65% 0.73%
Good 11.70% 60.45% 25.62% 2.23%
Neutral 2.13% 25.29% 58.89% 13.69%
Bad 0.85% 8.99% 45.17% 44.99%
Panel C: Next 12 months
Excellent 30.23% 50.84% 16.60% 2.34%
Good 12.12% 50.32% 32.10% 5.45%
Neutral 4.90% 31.88% 49.06% 14.16%
Bad 2.88% 19.75% 49.37% 28.00%
37
Table 7
Fund skill and investor flows
This table reports the relation between hedge fund skill and investor flows. We use a 24-month period to
evaluate fund skill and form four portfolios based on funds’ conditional probability of being Excellent,
Good, Neutral, and Bad. Then, the average fund flows prior and subsequent to the evaluation period are
reported for each skill portfolio. For example, when the evaluation period is January 2000–December
2001, “−1” denotes December 1999, “−12” denotes the year of 1999 from January to December, “+1”
denotes January 2002, and “+12” denotes the year of 2002 from January to December, respectively. Fund
flows are in percent (of fund total assets) per month. Newey-West t-statistics are reported in parentheses.
Prior flows (%/mo.) Subsequent flows (%/mo.)
–12 mo. –6 mo. –3 mo. –1 mo. +1 mo. +3 mo. +6 mo. +12 mo.
Excellent 0.236 0.363 0.420 0.480 1.216 1.179 0.976 0.693
Good 0.254 0.305 0.325 0.380 0.336 0.317 0.223 0.075
Neutral 0.163 0.189 0.221 0.329 −0.692 −0.712 −0.715 −0.701
Bad 0.205 0.262 0.306 0.403 −1.179 −1.162 −1.099 −1.061
Excel–Bad 0.031 0.101 0.114 0.076 2.395 2.341 2.075 1.755
(0.23) (0.93) (1.24) (0.89) (18.00) (17.22) (14.46) (16.51)
38
Table 8
Skill type and fund characteristics
This table reports the results from the panel regressions of the conditional probability (in percentage) of
being Excellent, Good, Neutral, and Bad on hedge fund characteristics. Management fee and Incentive fee
are the management fees and incentive fees, respectively. High-water mark dummy is an indicator
variable equal to one if the fund uses high water mark and zero otherwise. Lockup period is the capital
lockup period. Notice period is the advance notice period required for money redemption. Fund Age is the
age of the fund. Fund Asset is the monthly log asset. t-statistics, reported in parentheses, are calculated
using standard errors clustered at both fund and monthly levels.
Excellent Good Neutral Bad
Management fee (%) 0.91 1.07 −1.96 −0.02
(4.52) (3.15) (−4.98) (−0.14)
Incentive fee (%) 0.09 0.11 −0.15 −0.06
(3.56) (2.22) (−2.64) (−3.18)
High-water mark dummy −0.42 0.30 0.23 −0.11
(−1.41) (0.67) (0.46) (−0.59)
Lockup period (year) 0.80 0.78 −1.40 −0.18
(2.98) (2.75) (−3.76) (−1.67)
Notice period (year) 1.54 1.92 −2.85 −0.62
(3.04) (3.00) (−3.42) (−3.24)
Fund age 0.03 0.01 −0.13 0.09
(0.63) (0.10) (−1.58) (3.26)
Fund asset 0.58 1.49 −1.43 −0.64
(10.30) (14.15) (−12.62) (−15.19)
Fund flow 0.36 0.67 −0.74 −0.30
(12.23) (11.46) (−12.67) (−9.75)
Strategy effect Yes Yes Yes Yes
Adj. R2(%) 3.25 5.94 5.57 4.81
39
Table 9
Distribution parameters by investment style
This table reports parameter estimates for different skill groups across hedge fund investment styles:
directional trade (DT), security selection (SS), relative value (RV), and multiple strategy (MS). Estimated
alphas are from the Fung-Hsieh seven-factor model with control for backfill bias. The parameter 𝜇𝑗
denotes mean of alpha for each skill group, and 𝜋𝑗 is the fraction of each group. Alpha is in percent per
month.
DT SS RV MS
�̂�𝐸 0.967 0.618 0.923 0.890
�̂�𝐺 0.215 0.246 0.290 0.397
�̂�𝑁 0.000 0.000 0.000 0.000
�̂�𝐵 −0.204 −0.447 −0.554 −0.394
�̂�𝐸 0.143 0.065 0.057 0.056
�̂�𝐺 0.358 0.456 0.443 0.459
�̂�𝑁 0.353 0.388 0.414 0.374
�̂�𝐵 0.147 0.091 0.086 0.111
N 1,844 3,744 994 1,634
40
Figure 1. This figure shows three performance groups (solid lines) and the composite distribution (dashed
line). The distribution parameters are set as 𝜋𝐺 = 0.2, 𝜋𝑁 = 0.7, 𝜋𝐵 = 0.1; 𝜇𝐺 = 2%, 𝜇𝑁 = 0, 𝜇𝐵 =
−2%, and 𝜎𝑗 = 0.7% for all groups. The distributions are based on true alphas.
41
Figure 2. This figure shows the effects of two sources of variability in fund skill—within-group skill
variability 𝜔𝑖 and estimation error 𝑒𝑖. A fund with estimated alpha equal to �̂� could be from the zero-skill
group (solid line) if the combined effect 𝜔0 + 𝑒0 is positive, or from the positive mean alpha group
(dashed line) if the combined effect 𝜔1 + 𝑒1 is negative.
42
Appendix
In this appendix, we first describe our estimation procedure, and then we present
simulation results that confirm the validity of our approach. Finally, we provide some additional
evidence on the potential investment value of our method.
A.1 The estimation method
We index all the estimated alphas for N funds as �̂�1, �̂�2, … … , �̂�𝑁 and their corresponding
standard errors as 𝑠1, 𝑠2, … … , 𝑠𝑁. As described in Equation (4), the combined variance (𝜎𝑖,𝑗)2 =
(𝑠𝑖)2 + (𝜎𝑗)2. Further, we use a random variable 𝑧𝑗ϵ{0,1} to denote the unconditional probability
that any fund belongs to group j, i.e., 𝑃(𝑧𝑗 = 1) = 𝜋𝑗 . Next, we rewrite the marginal distribution
for each �̂�𝑖 as:
𝑓(�̂�𝑖) = ∑ 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)
𝐽
𝑗=1
= ∑ 𝑃(𝑧𝑗 = 1)𝑃(�̂�𝑖|𝑧𝑗 = 1). (A1)
𝐽
𝑗=1
We use an Expectation-Maximization (EM) iteration, similar in spirit to Dempster, Laird,
and Rubin (1977) to find the maxima. The expectation step in the iteration is essentially
calculating the conditional probability that �̂�𝑖 comes from group j, given the estimates of
{𝜇𝑗, 𝜎𝑗, 𝜋𝑗} from the previous step:
𝑧𝑖𝑗 = 𝑃(𝑧𝑗 = 1|�̂�𝑖, 𝑠𝑖) =𝑃(𝑧𝑗 = 1)𝑃(�̂�𝑖|𝑧𝑗 = 1)
∑ 𝑃(𝑧𝑘 = 1)𝑃(�̂�𝑖|𝑧𝑘 = 1)𝐽𝑘=1
=𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗)
∑ 𝜋𝑘𝜙(�̂�𝑖; 𝜇𝑘 , 𝜎𝑖,𝑘)𝐽𝑘=1
. (A2)
The maximization step is from the first order conditions of the log likelihood function.
The likelihood function of these N data points is:
𝐿𝑁 = ∑ ln 𝑓
𝑁
𝑖=1
(�̂�𝑖) + 𝜆(∑ 𝜋𝑗 − 1
𝐽
𝑗=1
) = ∑ ln [
𝑁
𝑖=1
∑ 𝜋𝑗𝜙
𝐽
𝑗=1
(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)] + 𝜆(∑ 𝜋𝑗 − 1
𝐽
𝑗=1
). (A3)
43
Following the EM algorithm, the first order conditions for 𝜇𝑗, 𝜎𝑗 , and 𝜋𝑗 are:
∑ 𝑧𝑖𝑗
𝜇𝑗 − �̂�𝑖
(𝜎𝑖,𝑗)2
𝑁
𝑖=1
= 0,
∑ 𝑧𝑖𝑗
𝑁
𝑖=1
[(𝜇𝑗 − �̂�𝑖)
2
(𝜎𝑖,𝑗)4−
1
(𝜎𝑖,𝑗)2] = 0,
1
𝑁∑ 𝑧𝑖𝑗
𝑁
𝑖=1
= 𝜋𝑗.
(A4)
In the equations above, mean 𝜇𝑗 is calculated by taking estimated alpha for each fund, weighted
by the precision of the estimate and the probability that this fund comes from group j. The weight
𝑧𝑖𝑗 pays more attention to funds that are deemed more likely to come that group. The parameter
𝜋𝑗 is the average of all the conditional probabilities that the funds come from group 𝑗.
However, the variance 𝜎𝑗2, as part of (𝜎𝑖,𝑗)2, is difficult to estimate directly, because it
does not have a closed-form solution unlike 𝜇𝑗 and 𝜋𝑗 from Equation (A4). To address this
difficulty, we use the first-order condition from Equation (A3) and obtain
0 =𝑑𝐿𝑁
𝑑𝜎𝑗
=𝑑 ∑ ln 𝑓𝑁
𝑖=1 (�̂�𝑖)
𝑑𝜎𝑗
= ∑
𝑑𝑓(�̂�𝑖)𝑑𝜎𝑗
𝑓(�̂�𝑖)
𝑁
𝑖=1
= ∑𝑑(∑ 𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)
+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗) 𝑑𝛼𝑖)/𝑑𝜎𝑗
𝐽𝑗=1
𝑓(�̂�𝑖)
𝑁
𝑖=1
= ∑𝜋𝑗𝑑(∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)
+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗) 𝑑𝛼𝑖)/𝑑𝜎𝑗
𝑓(�̂�𝑖)
𝑁
𝑖=1
= ∑𝜋𝑗(−
1
𝜎𝑗∫ 𝜙(�̂�𝑖;𝛼𝑖,𝑠𝑖)
+∞
−∞𝜙(𝛼𝑖;𝜇𝑗,𝜎𝑗) 𝑑𝛼𝑖+∫ 𝜙(�̂�𝑖;𝛼𝑖,𝑠𝑖)
+∞
−∞𝜙(𝛼𝑖;𝜇𝑗,𝜎𝑗)
(𝛼𝑖−𝜇𝑗)2
𝜎𝑗3 𝑑𝛼𝑖)
𝑓(�̂�𝑖)𝑁𝑖=1 . (A5)
44
Further, we recognize that
∫ 𝜙(�̂�𝑖; 𝛼𝑖 , 𝑠𝑖)+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗) 𝑑𝛼𝑖
= ∫1
√2𝜋𝑠𝑖
𝑒−
(�̂�𝑖−𝛼𝑖)2
2𝑠𝑖2
+∞
−∞
1
√2𝜋𝜎𝑗
𝑒−
(𝛼𝑖−𝜇𝑗)2
2𝜎𝑗2
𝑑𝛼𝑖
= ∫1
2𝜋𝑠𝑖𝜎𝑗
𝑒−
(�̂�𝑖−𝛼𝑖)2
2𝑠𝑖2 −
(𝛼𝑖−𝜇𝑗)2
2𝜎𝑗2
+∞
−∞
𝑑𝛼𝑖
= ∫1
2𝜋𝑠𝑖𝜎𝑗
𝑒−𝛼𝑖
2(1
2𝑠𝑖2+
1
2𝜎𝑗2)+𝛼𝑖(
�̂�𝑖
𝑠𝑖2+
𝜇𝑗
𝜎𝑗2)−
(�̂�𝑖)2
2𝑠𝑖2 −
(𝜇𝑗)2
2𝜎𝑗2
+∞
−∞
𝑑𝛼𝑖
=𝑒
12
(𝑀2−(�̂�𝑖)2
𝑠𝑖2 −
(𝜇𝑗)2
𝜎𝑗2 )
√2𝜋𝜎𝑖,𝑗
∫1
√2𝜋𝑒
−(𝛽𝑖−𝑀)2
2𝑠𝑖2
+∞
−∞
𝑑𝛽𝑖
=1
√2𝜋𝜎𝑖,𝑗
𝑒−
(�̂�𝑖−𝜇𝑗)2
2(𝑠𝑖2+𝜎𝑗
2)
= 𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗), (A6)
where 𝛽𝑖 = 𝛼𝑖𝜎𝑖,𝑗
𝑠𝑖𝜎𝑗 and 𝑀 =
�̂�𝑖𝜎𝑗
𝑠𝑖𝜎𝑖,𝑗+
𝜇𝑗𝑠𝑖
𝜎𝑗𝜎𝑖,𝑗. Using Equation (A6), we rewrite (A5) as:
𝜎𝑗2 ∑
𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗)
𝑓(�̂�𝑖)
𝑁
𝑖=1
= ∑
𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)
(𝛼𝑖 − 𝜇𝑗)2
𝜎𝑗2 𝑑𝛼𝑖
𝑓(�̂�𝑖)
𝑁
𝑖=1
. (A7)
We can further write it using Equations (A1) and (A2):
𝜎𝑗2 ∑
𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)
𝑓(�̂�𝑖)
𝑁
𝑖=1
45
= 𝜎𝑗2 ∑
𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)
∑ 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)𝐽𝑘=1
𝑁
𝑖=1
= ∑ 𝜎𝑗2𝑧𝑖𝑗
𝑁
𝑖=1
= ∑𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)
+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)(𝛼𝑖 − 𝜇𝑗)2𝑑𝛼𝑖
𝑓(�̂�𝑖)
𝑁
𝑖=1
. (A8)
Finally, using the expression of 𝜋𝑗 in Equation (A4), we obtain:
𝜎𝑗2 =
1
𝑁∑
∫ 𝜙(�̂�𝑖; 𝛼𝑖 , 𝑠𝑖)+∞
−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)(𝛼𝑖 − 𝜇𝑗)2𝑑𝛼𝑖
𝑓(�̂�𝑖)
𝑁
𝑖=1
. (A9)
From this equation, we use the 𝜎𝑗 from previous step to evaluate the right-hand-side expression,
and use this result as our new estimate of 𝜎𝑗. We iterate to solve for 𝜎𝑗 recursively until
convergence. To avoid local maxima, we use different initial values for the iterations and select
the set of parameters with the maximum likelihood. To ensure economic meaningfulness, we
restrict the difference in mean alpha 𝜇𝑗 between two skill groups to be at least 0.17% per month
(i.e., 2% per year) and 𝜋𝑗 to be at least 5%. Finally, for each fund, we obtain 𝑠𝑖 as the estimated
standard error of alpha from the factor model.
When estimating the parameters in actual data, we do not directly observe the number of
the underlying distributions. Hence, we compare estimates from different numbers of
distributions and select the one that has the lowest Bayesian Information Criterion (BIC) as
proposed by Schwarz (1978). The BIC, defining the model fit, is
𝐵𝐼𝐶= −2𝑙𝑛(𝐿) + 𝑘𝑙𝑛(𝑁), (A10)
46
where 𝐿 is the maximized likelihood function, 𝑘 is the number of free parameters, and 𝑁 is the
number of funds.
A.2 The simulations
To check the performance of our estimation procedure, we use simulations to compare
our estimates with true values of the parameters. The simulations as outlined below.
Step 1: We select J sets of group parameters {𝜇𝑗, 𝜎𝑗 , 𝜋𝑗} as true values.
Step 2: Based on {𝜇𝑗, 𝜎𝑗, 𝜋𝑗}, the composite distribution of true alphas (𝑓(𝛼𝑖)) is known.
We randomly draw N samples {𝛼1, 𝛼2, … … , 𝛼𝑁} from the composite distribution, representing
the N managers’ true skill. We set N equal to 9,000, which roughly matches the number of funds
in our sample.
Step 3: We then add random noise to each of manager’s skill using the empirical
estimation errors from the actual data. Specifically, suppose the empirical standard errors are
{𝑠1, 𝑠2, … … , 𝑠𝑁}, then the managers’ skill after adding noise is {𝛼1 + 𝜀1, 𝛼2 + 𝜀2, … … , 𝛼𝑁 +
𝜀𝑁}, where 𝜀𝑖~𝑁(0, 𝑠𝑖2). Denote this vector of skill, after adding noise, as {�̂�1, �̂�2, … … , �̂�𝑁},
which serve as “observed” alpha in our simulation.
Step 4: We use {�̂�1, �̂�2, … … , �̂�𝑁} as the inputs to our estimation procedures and start the
expectation and maximization iteration. Since it is possible that the iterations are sensitive to the
initial values, for each iteration we generate 1,000 random initial values of {𝜇𝑗, 𝜎𝑗, 𝜋𝑗} where
𝜇𝑗 ∈ U[−1.5%, 1.5%] per month, 𝜎𝑗 ∈ U[0.05,1] and 𝜋𝑗 ∈ U[5%, 50%] where U[a,b] is the
uniform distribution on interval [a,b]. We use the same procedure to generate initial values when
we estimate our parameters from the actual data.
47
Table A1 presents the results of four skill groups with one of them having zero mean
alpha. (The results using three and five groups are similar.) To check the sensitivity of our
estimation to different parameter values, we select three sets of true values for the parameters
that are close to the estimates from the actual data (as reported in Table 3). For each set of true
values, we perform 1,000 simulations from which we obtain means and standard deviations of
the estimates. Overall, our algorithm produces estimates that, while imperfect, are largely
accurate in characterizing different skill groups among funds.
In Table A2, we check how well the 𝑃𝐽 measure identifies Excellent managers, since we
know whether or not the managers come from the Excellent group in the simulations. In each
simulation, funds are ranked by 𝑃𝐽 and the top 9.3% of the funds are selected. We report the
identification rate of 𝑃𝐽, defined as the percentage of the selected funds that are truly Excellent.
Similarly, we report the identification rates for the top 9.3% of funds when ranked by estimated
alpha and t-statistic. The results shows that the 𝑃𝐽 measure is better able to identify skilled funds
than the alternative measures.
In addition, the average alpha for the group of funds identified by 𝑃𝐽 is higher than those
based on the alternative measures. Note that top 9.3% contains more than 800 funds, given that
the simulations use 9,000 funds. When holding so many funds, the effects of good luck and bad
luck tend to cancel out, which attenuates the benefit of our approach to some extent. In practice,
however, an investor (such as a fund of hedge funds) typically invest in a much smaller number
of hedge funds (e.g., Brown, Gregoriou, and Pascalau, 2012). In Section 4.3, with actual hedge
fund data, we evaluate the performance difference for a portfolio that starts by holding 20 hedge
funds, and the results are reported in Table 5.
48
Table A1
Comparing parameter estimates with true values
This table presents the estimates of distribution parameters for a set of simulated hedge funds for which
the true parameters are known. Excellent, Good, Neutral, and Bad are the skill groups, and True stands
for true values. The parameters �̂�𝑗, �̂�𝑗 and �̂�𝑗 are the mean, variability and the fraction of skill groups.
Standard errors estimated from 1,000 simulations are reported in parentheses. Panels A, B, and C use
three different sets of true values for the parameters that are close to the actual estimates from our sample.
Panel A
�̂�𝑗 �̂�𝑗 �̂�𝑗
Excellent 0.930 0.657 0.103
(0.291) (0.099) (0.028)
True 1.000 0.700 0.100
Good 0.283 0.764 0.384
(0.108) (0.144) (0.052)
True 0.300 0.700 0.400
Neutral 0.000 0.688 0.412
( — ) (0.204) (0.112)
True 0.000 0.700 0.400
Bad −0.884 0.624 0.101
(0.325) (0.251) (0.029)
True −1.000 0.700 0.100
Panel B
�̂�𝑗 �̂�𝑗 �̂�𝑗
Excellent 0.710 0.936 0.096
(0.251) (0.144) (0.033)
True 0.600 1.000 0.100
Good 0.301 0.298 0.402
(0.056) (0.022) (0.082)
True 0.300 0.300 0.400
Neutral 0.000 0.302 0.406
( — ) (0.170) (0.042)
True 0.000 0.300 0.400
Bad −0.703 0.926 0.097
(0.197) (0.275) (0.037)
True −0.600 1.000 0.100
49
Table A1, continued
Panel C
�̂�𝑗 �̂�𝑗 �̂�𝑗
Excellent 0.580 0.910 0.066
(0.182) (0.258) (0.023)
True 0.600 1.000 0.060
Good 0.293 0.293 0.409
(0.067) (0.101) (0.099)
True 0.300 0.300 0.440
Neutral 0.000 0.295 0.460
( — ) (0.095) (0.124)
True 0.000 0.300 0.440
Bad −0.329 0.953 0.065
(0.096) (0.317) (0.022)
True −0.400 1.000 0.060
50
Table A2
Identification rate of different methods
This table reports the identification rate (i.e., the ability to identify skilled funds) using three different
measures—the 𝑃𝐽 measure, estimated alpha, and t-statistic. The true alphas are generated using the
parameter estimates in Table 3. We add noise to each true alpha using estimation error 𝑠𝑖, and the noise is
assigned randomly to each fund. Then, funds ranked as top 9.3% by each measure are selected.
Identification rate reports the percentage of the selected funds that are truly Excellent. Performance is the
average alpha (in percent per month) of the selected funds identified by each measure.
Top 𝑃𝐽 Top est. alpha Top t-stat
Identification rate 36.18% 28.51% 31.27%
Performance (%/mo.) 0.70 0.53 0.64