Hedge Funds: The Good, the Bad, and the Lucky · 2015. 9. 8. · Hedge Funds: The Good, the Bad, and the Lucky August 5, 2015 Abstract We develop a new method to evaluate hedge fund

Electronic copy available at: http://ssrn.com/abstract=1915511

Hedge Funds: The Good, the Bad, and the Lucky

Yong Chen†

Texas A&M University

Michael Cliff‡

Analysis Group

Haibei Zhao§

Georgia State University

August 5, 2015

* We are grateful to Vikas Agarwal, Charles Cao, Heber Farnsworth, Wayne Ferson, Will Goetzmann, Feng Guo,

Michael Halling, Petri Jylha, Greg Kadlec, Andrew Karolyi, Robert Kieschnick, Bing Liang, Andrew Lo, Hugues

Pirotte, Jeffrey Pontiff, Zheng Sun, Josef Zechner, Harold Zhang, and seminar/conference participants at

Cornerstone Research, the Institute for Quantitative Asset Management (IQAM), Pennsylvania State University,

Shanghai University of Finance and Economics, Texas A&M University, University of North Carolina at Chapel

Hill, University of Texas at Dallas, University of Virginia, Vienna University of Economics and Business, Virginia

Tech, VU University of Amsterdam, the FBE 654 Asset Pricing class at University of Southern California, the 4th

NYSE Euronext Hedge Fund Conference in Paris, and the 2015 Financial Intermediation Research Society (FIRS)

Conference for helpful comments. The paper was previously circulated under the title “Hedge Funds: The Good, the

(Not-so) Bad, and the Ugly.” All remaining errors are ours alone. The views expressed in this article do not

necessarily represent those of Analysis Group, Inc.

† Mays Business School, Texas A&M University, College Station, TX 77843; [email protected].

‡ Analysis Group, Washington, DC 20006; [email protected].

§ Robinson College of Business, Georgia State University, Atlanta, GA 30303; [email protected].

Electronic copy available at: http://ssrn.com/abstract=1915511

Hedge Funds: The Good, the Bad, and the Lucky

August 5, 2015

Abstract

We develop a new method to evaluate hedge fund skill in the presence of luck. In the cross

section, by assuming each fund comes from one of several skill groups, we estimate the number

of groups, the fraction of each group, and the mean and variability of skill within each group.

Our method allows luck to affect both unskilled and skilled funds. At the individual fund level,

we propose a performance measure that combines the fund’s estimated alpha with the cross-

sectional distribution of fund skill. In out-of-sample tests, a strategy using our measure

outperforms those using estimated alpha and t-statistic.

JEL Classification: C13, G11, G23

Keywords: Hedge funds, performance evaluation, EM algorithm, performance persistence

1

1. Introduction

The past two decades witnessed hedge funds, with less regulatory rigidity and more

trading flexibility, grow into an important investment vehicle. Tremendous interest has emerged

from both academics and practitioners in assessing whether hedge funds add value for investors.

Indeed, a growing literature examines hedge fund performance from different angles. So far,

there is no consensus about whether an average hedge fund can add value.1 However, at the

individual hedge fund level, prior studies have shown strong evidence of the existence and

heterogeneity of fund skill.2 Two important questions naturally arise from these findings. First,

how many hedge funds have enough skill to add value? Second, how can we identify skilled

hedge funds? These questions motivate our study.

One major challenge in addressing the above questions is that fund managers’ true skill is

not observable.3 In practice, researchers typically measure skill with estimated performance

measures such as alpha. Consequently, due to inaccuracies associated with the estimates, a zero-

skill manager may be lucky and exhibit superior performance, while a good manager may be

unlucky and show inferior performance. Prior studies have proposed several methods to control

for the effect of luck on inference about fund skill in multiple hypothesis testing. Kosowski,

Timmermann, White, and Wermers (2006) and Fama and French (2010) use bootstrap

1 For example, Ackermann, McEnally, and Ravenscraft (1999), Brown, Goetzmann, and Ibbotson (1999), and Liang

(1999) show that in aggregate, hedge funds realize positive risk-adjusted performance. However, Griffin and Xu

(2009) find little evidence that hedge funds, on average, deliver abnormal performance.

2 Kosowski, Naik, and Teo (2007) show that the superior performance of top hedge funds cannot be attributed to

pure randomness. Several papers also investigate the cross-sectional relationship between hedge fund performance

and fund characteristics. Aragon (2007) finds that hedge funds with stricter redemption restrictions offer higher

returns. Agarwal, Daniel, and Naik (2009) find that hedge fund performance is positively related to fund managers’

incentives and discretion. Li, Zhang, and Zhao (2010) link hedge fund performance to fund managers’ educational

background and work experience. Titman and Tiu (2011) show that hedge funds with lower R-squares against

systematic factors realize better future performance. Sun, Wang, and Zheng (2012) find that hedge funds with

different return patterns from peer funds are associated with better subsequent performance.

3 We use “fund” and “fund manager” interchangeably in this paper.

2

simulations to infer skill among mutual funds. Barras, Scaillet, and Wermers (2010) apply a false

discovery approach to mutual funds and detect skill in only a small fraction of funds.4

In this paper, we develop a new method to estimate the prevalence of fund skill and apply

it to a sample of hedge funds. Our approach is based on the assumption that the skill of each

fund, characterized by its alpha 𝛼𝑖, comes from one of several skill groups with mean alpha 𝜇𝑗

and variability of alpha 𝜎𝑗.5 As a stylized example, we can view funds as being “Good” (say 𝜇𝐺 =

3% per year), “Neutral” (𝜇𝑁= 0%), or “Bad” (e.g., 𝜇𝐵 = −2% per year), though our approach can

accommodate more than three skill groups. Accordingly, the observed cross-sectional

distribution of fund alphas is a mixture of the three distributions.

Figure 1 illustrates the mixture of these distributions. Given an observed distribution of

alphas (dashed line), our estimation algorithm identifies the three sub-distributions (solid lines)

that match the cross-sectional distribution when combined together.6 The shape of the cross-

sectional distribution dictates the number of skill groups and their distributional parameters. As

in Fama and French (2010), skill gives rise to fat tails in the distribution of alpha. As shown in

the figure, funds in the Good skill group can have bad realized performance.

We use a modified Expectation-Maximization (EM) algorithm to estimate the average

skill (𝜇𝑗), the variability of skill (𝜎𝑗), and the size of the group (𝜋𝑗) for each skill group j. These

parameter estimates not only describe the cross-sectional distribution of alphas across different

4 See Ferson and Chen (2015) for a refinement and generalization of the Barras et al. method, by using more of the

structure of the model suggested by Barras et al.

5 We focus on net-of-fee returns. Hence, we adopt an investor’s perspective in asking whether the manager can earn

a gross return that is sufficient to cover costs.

6 In this example, we set 𝜋𝐺 = 0.2, 𝜋𝑁 = 0.7, 𝜋𝐵 = 0.1; 𝜇𝐺 = 2%, 𝜇𝑁 = 0, 𝜇𝐵 = −2%, and 𝜎𝑗 = 0.7% for all

groups. This simple example does not incorporate estimation errors in alpha that are introduced later in the paper.

All the parameter estimates in our empirical analysis incorporate the effects of estimation errors.

3

skill groups, but also provides useful information to make inference about skill of individual

funds. In practice, fund alphas are estimated with noises. However, we show that the information

from the cross section can be combined with estimated alphas to make more accurate inference

for individual funds.

At the individual fund level, we construct a new performance measure—the conditional

probability a fund comes from the highest-skilled group. This performance measure incorporates

both a fund’s estimated alpha and the information about the cross-sectional fund skill. When

estimated alpha is very noisy with large estimation error, the measure relies more on the cross-

sectional information as opposed to estimated alpha. On the other hand, if estimated alpha has a

high precision, it receives a great weight in the performance measure.

This performance measure has advantages over the conventional way of using the t-

statistic to adjust for the precision of estimated alpha. Though the t-statistic tells how strongly we

can reject the null hypothesis of zero skill, it does not identify which funds are more skilled. For

example, a fund with a t-statistic of 3.0 does not necessarily have more skill than another fund

with a t-statistic of 2.0. This is because the t-statistic, as the product of estimated alpha and its

precision, does not differentiate between these two components. In contrast, by weighting the

fund’s estimated alpha and the prior information about the cross section, our approach

incorporates the magnitude of estimated alpha based on its precision and provides a ranking of

fund skill. Having such a ranking is important for investors (like funds of hedge funds) facing

capital constraints that limit the number of funds in which they can invest.

In our empirical analysis, we employ a sample of 8,695 hedge funds by merging two

major hedge fund databases—Lipper TASS and Hedge Fund Research—over the period of

1994–2011. We use the Fung and Hsieh (2004) seven-factor model to estimate alpha from

4

historical fund returns, and we consider alternative models for robustness. We mitigate hedge

fund data biases and propose a new way to correct backfill bias. Empirically, we find that a

mixture of four skill groups best fits the empirical distribution of actual fund performance

(compared with other numbers of skill groups), which we refer to as Excellent, Good, Neutral,

and Bad. The first two groups have positive mean alpha, including 9% excellent funds with �̂� =

0.72%/month and 38% good funds with �̂� = 0.35%/month. Meanwhile, 43% of the fund are

neutral funds with zero-alpha after fees (i.e., having skill just enough to cover their fees), and 9%

are deemed as bad funds with �̂� = −0.80%/month. This finding is consistent with the notion that

hedge fund skill tends to be heterogeneous. This result also depicts a remarkably different picture

about hedge fund skill than the limited evidence of skill that prior studies find for mutual funds

(e.g., Barras, Scaillet, and Wermers, 2010; Fama and French, 2010).

To identify superior individual funds, we use the performance measure that computes the

conditional probability a fund comes from each skill group, by combining the fund’s estimated

alpha with parameter estimates for the cross section. Specifically, in each month we form four

portfolios based on funds’ conditional probabilities of being excellent, good, neutral, and bad

estimated from the previous 24 months. Then, we examine “out-of-sample” performance of these

monthly-rebalanced portfolios. We find that the portfolio of the “predicted excellent” funds (i.e.,

those with the greatest likelihood of being excellent) subsequently realize high alpha over a long

horizon. In fact, the alpha spread between the predicted excellent and the predicted bad portfolios

remains significantly positive even three years post-formation. This suggests that our

performance measure is able to detect skill. Further, when comparing the investment value of our

approach with alternative strategies based on past estimated alpha and its t-statistic, we find that

our approach outperforms those competing strategies in out-of-sample tests.

5

Our paper makes several contributions to the literature. First, as an alternative to the false

discovery method applied in Barras, Scaillet, and Wermers (2010), we use the EM algorithm to

make inferences about mixture distributions of fund skill. Barras, Scaillet, and Wermers (2010)

allow luck to affect zero-skill funds (i.e., “false discoveries”), but by using a large test size (e.g.,

a size of 30%) they rule out the possibility that skilled fund can have zero-alpha due to bad luck.

Our method allows luck to affect both zero-skill funds and skilled funds. The fact that we

identify a larger fraction of skilled funds than simply counting statistically significant alphas

suggests that it is important to consider imperfect test power. More importantly, for each

individual fund, we construct a performance measure that combines the fund’s own estimated

alpha and the cross-sectional distribution of fund skill. Thus, the performance measure involves

learning about skill from other funds. Jones and Shanken (2005) demonstrate how learning

across funds affects the inference about the cross sectional distribution of fund skill. We extend

their intuition to a setting of asset allocation across many funds with different skill. While Jones

and Shanken (2005) consider one homogenous skill distribution, our method accommodates

multiple skill groups, which is a necessary condition for comparing skill across funds.

The rest of the paper proceeds as follows. In Section 2, we outline our approach to

inferring fund skill. Section 3 describes the data. Section 4 presents the empirical results about

the fractions of funds from different skill groups and fund performance persistence. Section 5

discusses additional analyses and robustness checks. Finally, Section 6 concludes.

6

2. Methodology

In this section, we first lay out the general setup for inferring the characteristics of the

skill groups and estimating the conditional probability that a fund belongs to the top skill group.

Next, we relate our method to existing studies and discuss some properties of our performance

measure. Finally, we describe our estimation procedure. Technical details about the estimation

approach and simulations are provided in the Appendix.

2.1. The model

We start by assuming that there is an unknown number J groups of funds with different

skill levels. For each group j (j = 1, 2, … , J), a representative fund is characterized by its alpha,

which is assumed to follow a Normal distribution 𝑁(𝜇𝑗, 𝜎𝑗2), where 𝜇𝑗 is the mean alpha in the

group and 𝜎𝑗 captures the dispersion in true skill across funds within the group. The clustering of

performance within a group around the mean (𝜇𝑗) can be attributed to common investment styles

(e.g., Brown and Goetzmann, 1997), while the variability 𝜎𝑗 is driven by fund-specific traits 𝜔𝑖

(e.g., infrastructure or trading intensity). Hence, the true alpha for manager i who belongs to

group j is 𝛼𝑖 = 𝜇 𝑗 + 𝜔𝑖. True fund skill can vary through time due to changes in fund

management or because the informational advantage that can generate alpha in one period erodes

over time.

We use 𝜋𝑗 to denote the fraction of the funds that come from skill group j, which is also

the unconditional probability a fund belongs to the group. Thus, the sum of the group fractions

equals one, i.e., ∑ 𝜋𝑗𝐽𝑗=1 = 1. Consequently, the J sets of triples {𝜇𝑗, 𝜎𝑗, 𝜋𝑗} jointly define a

composite distribution for fund i with the following density function:

7

𝑓(𝛼𝑖) = ∑ 𝜋𝑗𝜙𝐽𝑗=1 (𝛼𝑖; 𝜇𝑗 , 𝜎𝑗), (1)

where 𝛼𝑖 denotes skill of the fund, 𝜙(𝛼𝑖; 𝜇𝑗 , 𝜎𝑗) is the Normal probability density with mean 𝜇𝑗

and standard deviation 𝜎𝑗 evaluated at 𝛼𝑖.7 The probability of observing 𝛼𝑖 in a population equals

the weighted probability of observing 𝛼𝑖 in each group, weighted by that group’s fraction in the

population. As illustrated in Figure 1, the density function 𝑓(𝛼𝑖) also describes the cross-

sectional distribution of fund skill.

So far, we treat fund alpha 𝛼𝑖 as observable. However, empirical analysis in the literature

routinely uses ordinary least squares (OLS) estimated alpha with a sample-specific estimation

error 𝑒𝑖 for fund i. As a result, estimated alpha equals true alpha plus estimation error: �̂�𝑖 = 𝛼𝑖 +

𝑒𝑖 = 𝜇𝑗 + 𝜔𝑖 + 𝑒𝑖. Thus, estimated alpha satisfies the following density function:

𝑓(�̂�𝑖|𝛼𝑖) = 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖), (2)

where 𝑠𝑖 is the standard deviation of estimation error 𝑒𝑖 (i.e., the standard error of �̂�𝑖). The

estimation error, 𝑒𝑖, is assumed to follow a Normal distribution, which is a common assumption

of OLS. By combining Equations (1) and (2) and marginalizing the joint distribution, we obtain

the distribution of estimated alpha �̂�𝑖 as follows:

𝑓(�̂�𝑖) = ∫ 𝑓(�̂�𝑖|𝛼𝑖)𝑓(𝛼𝑖)𝑑𝛼𝑖+∞

−∞= ∑ 𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)

+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)𝑑𝛼𝑖

𝐽𝑗=1 . (3)

7 We assume a Normal distribution for several reasons. First, the parameters of Normal distribution have clear

economic meaning about the mean and variability of skill in our setting, as opposed to other distributions like t-

distribution or inverse gamma. Second, according to the central limit theorem, Normal distribution seems natural to

characterize true alpha as true alpha can be viewed as a sum of several random variables (e.g., fund-specific traits).

Third, Normal distribution provides technical tractability to derive the iteration scheme used in our approach (see

details in the Appendix A.1). Finally, even though each skill group is assumed to follow a Normal distribution, the

composite distribution is non-Normal with fat-tails, consistent with the empirical distribution for our data.

8

Next, evaluating the integral for each type j, we have:8

𝑓(�̂�𝑖) = ∑ 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗),𝐽𝑗=1

where (𝜎𝑖,𝑗)2 = (𝑠𝑖)2 + (𝜎𝑗)2.

(4)

This equation characterizes the density function for estimated alpha of fund i. Compared

with Equation (1), the combined variance 𝜎𝑖,𝑗 incorporates two sources of variation in estimated

alpha—fund-specific estimation error 𝑠𝑖 and within-group variation 𝜎𝑗. Empirically, we find the

average 𝑠𝑖 across funds (as shown in Table 2) to be of the same order of magnitude as 𝜎𝑗 (as

shown in Table 3). This suggests that the two sources of variations are roughly equally

important. Given the importance of estimation error, making inferences based on estimated alpha

alone without considering estimation error would lose a significant amount of information about

fund skill.

Figure 2 illustrates the effects of the two sources of variation from 𝑠𝑖 and 𝜎𝑗. Suppose we

obtain a positive estimated alpha �̂� for a fund. Here, both sources of variation in the estimated

alpha affect our inference. The fund may come from the zero-skill group but exhibit a positive �̂�

due to 𝜔 and estimation error, i.e., (𝜔0 + 𝑒0). Alternatively, this fund may come from the

positive-skill group, but a net negative (𝜔1 + 𝑒1) leads to an estimated alpha that is smaller than

the group mean. Thus, we are uncertain exactly which group the fund comes from. As such, our

method allows a zero-skill fund to appear to have positive estimated alpha, as well as allowing a

skilled fund to have small (close to zero) estimated alpha. In other words, luck (𝑒𝑖) can affect

both unskilled and skilled funds in our model.9

8 The detailed derivation of Equation (4) is provided in Equation (A6) of the Appendix A1.

9 This is different from Barras, Scaillet, and Wermers (2010), who assume perfect test power by using a large test

size (e.g., 30%) and hence rule out the possibility that luck can affect skilled fund.

9

Once we have the estimates of {𝜇𝑗 , 𝜎𝑗 , 𝜋𝑗} (using the estimation procedure described

below in Section 2.3), we can characterize 𝑓(�̂�𝑖) using by Equation (4) and make inference

about each fund’s skill. Specifically, we can make a probabilistic statement about how likely a

fund is from each skill group, and use the probability as the basis for our performance measure.

The conditional probability that the fund belongs to group j equals:10

𝑃𝑗 = 𝑃𝑟𝑜𝑏(fund 𝑖 is from group 𝑗|�̂�𝑖, �̂�𝑖) = 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗)/𝑓(�̂�𝑖). (5)

For ease of illustration, we order the groups such that 𝜇1 < 𝜇2 < ⋯ < 𝜇𝐽, and thus group

J has the highest mean skill. From the perspective of investment practice, our focus is on the

conditional probability that a fund comes from the group with highest mean skill, namely 𝑃𝐽. The

higher 𝑃𝐽 is, the more likely the fund has superior skill. The idea of this performance measure is

as follows. When we make inference about fund i’s skill, the estimation error of its alpha affects

the relative importance between our prior based on the cross-sectional distribution and the fund’s

estimated alpha. In Equation (4), the total variation of estimated alpha �̂�𝑖 is decomposed into

fund-specific estimation error 𝑠𝑖 and within-group variation 𝜎𝑗. Hence, if fund alpha is estimated

with high precision (i.e., when 𝑠𝑖 is small), our method assigns a relatively great weight to

estimated alpha. On the other hand, if estimated alpha is of low precision, we rely less on

estimated alpha and more on the prior. In the extreme case, when 𝑠𝑖 goes to infinity (i.e., no

precision), the 𝑃𝑗 measure converges to 𝜋𝑗 (i.e., our prior knowledge).

10 To avoid notational clutter, we do not index 𝑃𝐽 for fund i, though it is a function of fund i’s estimated alpha and

standard error.

10

2.2. Relation of our method to existing studies

Our estimation method builds on several existing studies. First, our paper is related to

Kosowski, Timmermann, White, and Wermers (2006) and Fama and French (2010), who control

for the effect of estimation error when inferring fund skill. However, unlike their studies that

focus on whether the performance of top-performed managers comes from skill or luck, we

estimate the probability that a fund comes from the highest-skilled group and our 𝑃𝐽 measure

allows us to rank and compare skill across many funds.

Second, our approach is closely related to Barras, Scaillet, and Wermers (2010), who

apply the false discovery method to infer fund skill. Similar to their work, we consider the

existence of multiple skill groups. However, unlike their study assuming perfect test power by

using a large test size (e.g., a size of 30%), we allow for imperfect power that an estimated alpha

close to zero could be a good (bad) manager with bad (good) luck.11 More importantly, while

their inference relies on the alpha t-statistic, we propose a fund-specific performance measure 𝑃𝐽

that accounts for estimated alpha and estimation error separately, and we show empirically in

Section 4 that this separation leads to a significant improvement in out-of-sample investment

value.

Third, our measure 𝑃𝐽 depends on both a fund’s own estimated alpha and the performance

of other funds. As a result, learning about skill of other funds provides useful information to infer

skill for a given fund. This is related to Jones and Shanken (2005), who study how learning

across funds affects the inference about the cross-sectional distribution of fund skill in a

Bayesian framework. Our study extends their idea of cross learning to a setting of asset

11 See Ferson and Chen (2015) for a study that considers imperfect power when inferring fund skill in the false

discovery framework.

11

allocation across many funds. In out-of-sample tests, we show that the information about the

cross sectional distribution of skill indeed has important implications for asset allocation across

funds. Furthermore, our method accommodates a mixture of multiple skill groups, while theirs

considers only one group. As shown below, our empirical analysis strongly rejects one skill

group and favors a mixture of multiple skill groups in hedge funds. In fact, the existence of

multiple skill groups is a necessary condition for obtaining a ranking of skill across funds.12

2.3. The estimation procedure

We now introduce the procedure to estimate the parameters with more technical details

provided in the Appendix A.1. As explained above, our goal is to estimate a set of parameters

{𝜇𝑗, 𝜎𝑗, 𝜋𝑗} that define the cross sectional distribution of true skill. To do so, we need to aggregate

information about all individual funds’ estimated alphas. However, these estimated alphas are

not true skill but estimates with noises. As a result, we refine our inference about each fund’s

skill by calculating the probability it belongs to each sub-distribution. As shown in equation (5),

such probability estimates in turn depend on the cross sectional distribution of true skill. Thus,

we have a simultaneous estimation problem.

12 Jones and Shanken (2005, p.545) state that “although two examples demonstrate the substantial effects of learning

on the allocation to a particular fund, the implications for asset allocation across funds remain unexplored.” Since

they assume one skill group, all funds come from the same distribution in their framework. As a result, learning

about the cross-sectional distribution of skill provides the same information for all funds, and ranking based on

posterior alpha would be the same as ranking based on the t-statistic. In our study, however, we consider multiple

skill groups and allow the probability of belonging to each group to be different across funds. Therefore, learning

from the cross sectional distribution is different for different funds.

12

We use a modified Expectation-Maximization (EM) algorithm to simultaneously estimate

the cross sectional parameters {𝜇𝑗, 𝜎𝑗 , 𝜋𝑗} and individual funds’ conditional probabilities.13 The

EM algorithm uses iterations of two steps. First, the expectation step calculates a conditional

probability that �̂�𝑖 is from group j, given previous estimates of the group parameters (or preset

initial values in the case of the first iteration) and the funds’ estimated alpha and standard error.

The expectation step refines our estimates of the fund’s skill based on the cross-sectional

information. Then, the maximization step aggregates the skill distribution of individual funds to

obtain updated estimates of the cross sectional parameters. These two steps iterate until

parameter estimates converge, and through the iteration process we solve the simultaneous

estimation problem.

In our setting, however, estimation errors in alphas complicate the EM algorithm, as the

estimator 𝜎𝑗 in the maximization step is highly non-linear without a closed-form solution. To

overcome this difficulty, we modify the EM algorithm by deriving a separate iteration scheme to

estimate 𝜎𝑗 until convergence. To the best of our knowledge, our method is the first to

incorporate estimation errors in the EM algorithm. The Appendix A.1 describes the details of the

algorithm.

Our method is flexible about the number of skill types. This is useful for inferring skill

among entities (such as hedge funds) with highly heterogeneous skill. Empirically, we use the

13 The original EM algorithm developed by Dempster, Laird, and Rubin (1977) has been widely used in estimation

and inference related to mixture distribution models and incomplete data. See, e.g., Wu (1983), Rudd (1991),

McLachlan and Peel (2000), and McLachlan and Krishnan (2008). The EM algorithm, though powerful, was rarely

used in finance research. Some exceptions are Kon (1984) who examines the distribution of daily stock returns and

Asquith, Jones, and Kieschnick (1998) who study the heterogeneity of IPO returns. However, these previous studies

examining stock returns do not deal with the complicating effects of estimation errors as we do, since in our setting

fund alphas are not directly observed but estimated with estimation errors.

13

Bayesian Information Criterion (BIC) to identify the number of skill groups that best fit the

actual data and confirm the existence of multiple skill groups in hedge funds.

We perform sensitivity tests to validate our method. First, to address the concern that our

parameter estimation of {𝜇𝑗, 𝜎𝑗 , 𝜋𝑗} might be sensitive to the choice of initial values, we

experiment with a grid of initial values and search for the global maxima of the likelihood

function. Second, we run simulations in which we generate artificial datasets with known

(population) group parameters, and compare the parameter estimates from the artificial datasets

to their population counterparts. We find that our parameter estimates are reasonably close to the

true parameters. Third, we run simulations in which we set values of alpha, and then rank the

funds using our performance measure 𝑃𝐽 in comparison with alternative measures of estimated

alpha and its t-statistic. The results show that the measure 𝑃𝐽 is better able to identify skilled

funds than the alternative measures. Furthermore, an investment strategy based on our

performance measure outperforms those based on the alternative measures. (In Section 4, we

perform this comparison with actual hedge fund data in out-of-sample tests.) These results

validate our estimation procedure. The Appendix A.2 provides the details of the simulations.

3. The Data

3.1. Hedge funds

For our empirical analysis, we employ a large sample of hedge funds by merging data

from the Lipper TASS and Hedge Fund Research (HFR) databases. Although these databases

contain fund returns going back to as early as 1977, they do not retain information of defunct

funds before 1994 and thus data in early years have survivorship bias (Fung and Hsieh, 2000;

14

Liang, 2000). To mitigate survivorship bias (see Brown, Goetzmann, Ibbotson, and Ross, 1992),

we focus on the period from January 1994 onwards. Following the hedge fund literature, we only

include funds that report net-of-fee returns on a monthly basis and have at least 24 months of

returns. We exclude funds of funds from our analysis. Over the period January 1994–December

2011, our sample contains 8,695 funds, of which 3,076 are alive as of the end of the sample

period and 5,619 are defunct funds.

Table 1 reports summary statistics of fund returns. The average fund age in our sample is

73 months, slightly longer than six years. The mean (median) return is 0.70% (0.65%) per month

or about 8.40% (7.80%) per year. At the top 25th percentile, the mean return is 1.01% per month

or about 12.12% per year. The average return volatility is 3.44% per month. Higher moments of

fund returns suggest negative skewness and fat tails relative to a Normal distribution. Consistent

with prior research, fund returns exhibit autocorrelation; the average first-order autocorrelation is

0.15. Such autocorrelation is interpreted in prior studies as an indication of illiquidity holdings or

return smoothing (e.g., Getmansky, Lo, and Makarov, 2004).

3.2. Correcting backfill bias

Hedge fund returns, as voluntarily reported, may have potential backfill bias (e.g., Fung

and Hsieh, 2000; Liang, 2000). This bias arises as historical returns are often backfilled when

new funds are added into a database. Since funds with good track records tend to join a database,

neglecting backfilling generates an upward bias in average fund return. This is similar to the

incubation bias (Evans, 2010). In the hedge fund literature, a typical treatment is to drop the first

one or two years’ data. When estimating alpha for funds, it is common to require a certain

number of observations (e.g., 24 months) to ensure test power. Thus, when early years are

15

truncated, only funds with relatively long track records remain in the analysis, which may

introduce a survivorship bias.

We propose a new way to correct backfill bias by simply adding an incubation dummy

variable to the factor regression when estimating alpha. Specifically, the dummy takes a value of

one for the backfill period, i.e. from the month when a fund’s return becomes available in the

data to the month when the fund joined the database. This dummy variable captures the

incremental return during the backfill period, which we allow to vary for each fund. The

advantages of our approach are that, first, we more accurately capture the actual backfill period

than applying a same backfill period to all funds, and second, it retains a more complete fund

history, which provides more information for the alpha estimates.

3.3. Factor model

As hedge funds trade across different asset classes, a multi-factor model is often used to

capture their risk exposures. We use the Fung and Hsieh (2004) seven-factor model as the

benchmark model to estimate fund alpha. The seven factors include an equity market factor, a

size factor, the change in the constant maturity yield of the ten-year Treasury, the change in the

spread between Moody’s Baa yield and the ten-year treasury, and three trend-following factors

for bonds, currency, and commodities.14 Our regression model has the following specification:

𝑟 𝑖,𝑡 = 𝛼 + 𝛼1𝐼(𝑡 ≤ 𝑡𝑖,𝐵) + 𝜷′𝒇𝑡 + 𝜀𝑖,𝑡, (6)

14 The bond, currency and commodity trend-following factors are constructed as portfolios of lookback straddle

options on these assets (see Fung and Hsieh, 2001). The data on these factors are obtained from David Hsieh’s

website at http://faculty.fuqua.duke.edu/_dah7/DataLibrary/TF-FAC.xls.

16

where 𝑟𝑖,𝑡 is fund i’s return in excess of the risk-free rate (proxied by the one-month T-bill rate)

in month t, 𝐼(∙) is an indicator function, 𝑡𝑖,𝐵 is the month when fund i starts to report a database,

and the vector f denotes the seven factors. The intercept 𝛼 is the fund’s alpha, and the coefficient

on the dummy variable controls for potential backfill bias.

3.4. Estimated alpha

Table 2 describes the result on estimated alpha, with and without controlling for backfill

bias. As can be seen, controlling for backfill bias is important. Without the control, the average

alpha for all funds is 0.39% per month, while with the control the average alpha shrinks to 0.11%

per month. Hence, neglecting backfill bias would inflate alpha estimate by about threefold.

Backfill bias appears particularly strong for defunct funds. The average alpha drops from

0.32% without the control to virtually zero after the adjustment. It turns out that the observed

positive alpha for defunct funds is mostly concentrated in the backfill period. This result suggests

that those funds that had no skill but joined the database after good incubation returns are more

likely to fail. Thus, the superior performance observed for early months is likely to reflect a

backfill bias.15 In addition, live funds substantially outperform defunct funds, confirming the

importance of controlling for the survivorship bias. The alpha estimates and their standard errors,

after adjusting for backfill bias, are used in our later analysis as the main inputs to make

inferences about hedge fund skill.

15 Aggarwal and Jorion (2010) find that, after adjusting for data biases, emerging hedge funds exhibit strong

performance, which may reflect new funds’ incentive to perform well.

17

4. Empirical Results

This section reports the main results from our empirical analysis. We first present the

estimates of the parameters governing the different skill distributions. We then explore the

performance persistence based on our performance measure. Finally, we compare the out-of-

sample performance of an investment strategy based on our performance measure with that of

alternative strategies based on estimated alpha and its t-statistic.

4.1. The distributions of fund skill

As we do not observe the number of skill groups in data, we estimate our model using

different values of 𝐽 = 2, 3, 4, 5, 6, 7 and then compare model fit using the BIC. The BIC result

indicates four skill groups in our sample. The BIC of 3, 4 and 5 groups are 16927.29, 16911.10

and 16956.31, respectively. The BIC of other numbers of groups are even greater than that of

five groups. For robustness, we conduct a Likelihood Ratio test for the null H0: J = 3 (and J=5)

against H1: J = 4, which rejects the null at the 1% significance level, suggesting that the case of

four groups is significantly better than other numbers of groups in terms of fitting the data.

Table 3 reports the parameter estimates. Since two skill groups have positive mean alpha,

we refer to the groups as “Excellent” (�̂� =0.72% per month), “Good” (�̂� = 0.35%), “Neutral” (�̂�

= 0% by assumption), and “Bad” (�̂� = −0.60%). The estimates of 𝜋𝑗 suggest the composition of

funds is 9.3% excellent, 38.4% good, 43.0% neutral, and 9.3% bad. For investing in hedge funds

in practice, the focus should be on the excellent group that has highest mean skill. The estimated

variability of skill, i.e., �̂�𝑗, of the excellent and bad groups is higher than that in the two middle

groups, suggesting that funds with extreme skill have less in common and more specific to

18

themselves. This is consistent with Sun, Wang, and Zheng (2012) who find that hedge funds with

distinctive return patterns from peer funds tend to have better performance. Equation (4)

decomposes the total variation in fund skill into the within-group variation and fund-specific

estimation error. We find that the estimated within-group variations �̂�𝑗 are in the same order of

magnitude as the fund-specific estimation error as reported in Table 2, which suggests that both

of them are important determinants of the total variation in estimated alphas.

The fraction of funds with positive skill from our estimation is significantly higher than

that judged by the t-statistic. Based on our method, 48% of the sample funds belong to either the

Excellent or the Good group, whereas only 20.3% of the funds have t-statistic greater than 1.65

(see Table 2). The false discovery rate based on the Barras, Scaillet, and Wermers (2010) method

is 3.7%, so after adjusting for the false discovery problem, the fraction of skilled funds inferred

by the t-statistic would be even smaller (i.e., 20.3%–3.7%=16.6%) at the size of 10%. Thus, the

result suggests that accounting for imperfect test power (i.e., allowing skilled funds to have bad

luck) is important.

We employ simulations to assess the statistical significance of our parameter estimates.

We construct 1,000 artificial samples by drawing from the original sample with replacement. We

estimate the parameters in each artificial sample, and then calculate standard errors as the

standard deviation of the parameter estimates across the simulations. The bootstrapped standard

errors, reported in parentheses Table 3, suggest that our parameters are estimated with reasonable

precision. For instance, in the excellent group �̂�𝐸 , �̂�𝐸 , and �̂�𝐸 are all more than three standard

errors above zero.

In sum, using a modified EM algorithm, we estimate skill distributions for hedge funds.

Our results suggest that a significant portion of hedge funds have skill more than just covering

19

their fees. The finding is in sharp contrast with previous results (e.g., Barras, Scaillet, and

Wermers, 2010; Fama and French, 2010) for mutual funds where few funds are found to deliver

alpha after fees.

4.2. Performance persistence

Now, we address another important question: Does superior performance persist in hedge

funds?16 In our setting, testing performance persistence is important because it can validate our

grouping technique. If our grouping contains no information about fund skill, then there would

be no persistence in performance identified by our method. Otherwise, if our method identifies

skilled funds with superior performance, we expect a certain level of performance persistence.

We examine performance persistence using portfolios formed in rolling windows. In each

month starting from January 1996, we estimate the group parameters {𝜇𝑗, 𝜎𝑗, 𝜋𝑗} and each fund’s

performance measure 𝑃𝑗 from the previous 24 months. Then, we assign funds into one of four

skill-based portfolios—Excellent, Good, Neutral, or Bad. Specifically, each fund receives four

conditional probabilities (i.e., 𝑃𝑗’s) corresponding to the different skill groups. In each of the

rolling 24-month subperiods, for the fraction of each skill group estimated by our algorithm in

that subperiod, we assign that fraction of funds that show the highest conditional probability of

belonging to that group.17 For example, if 10% of the funds are excellent in a subperiod

according to our estimation procedure, then we assign the 10% of funds with the highest

16 Prior research shows mixed evidence about performance persistence in hedge funds. Brown, Goetzmann, and

Ibbotson (1999) and Agarwal and Naik (2000) find little support for performance persistence, while Kosowski,

Naik, and Teo (2007) and Jagannathan, Malakhov, and Novikov (2010) document significant evidence of

performance persistence.

17 In reality, some years may have the number of skill groups different from four. However, assuming four groups

for the whole sample facilitates the presentation of the results. For robustness, we allow the group number to change

over time, and our inference about performance persistence is unchanged.

20

conditional probability of being excellent 𝑃𝐽 to the excellent group. This way, we assign funds

into each group sequentially so that no fund will be assigned into multiple groups. As a result,

four equal-weighted portfolios are formed with funds out of these groups. The portfolios are

rebalanced monthly and held for different periods from three months to three years. The group

parameters are re-estimated each month so that we only use information up to the month of

portfolio formation. Funds that disappear during a holding period are included in the equal-

weighted portfolio until they disappear, and then their weights are reallocated to the remaining

funds. In practice, it may not be realistic to immediately invest into these portfolios after

formation, so we insert a one-month waiting window between the formation period and the

holding period.

Table 4 presents strong evidence of performance persistence. The out-of-sample alpha of

the excellent portfolio is both economically and statistically significant. For example, for a 12-

month holding period, the excellent portfolio has an alpha of 0.46% per month (t-statistic =

6.47), or about 5.52% per year. Further, the excellent portfolio outperforms the other portfolios

significantly for as long as three years. The alpha spread between the excellent and the bad

portfolios is about 0.48% per month (t-statistic = 5.69) for a 12-month holding period. The result

of performance persistence suggests that our method groups funds with different skill well.

4.3. Comparing the 𝑷𝑱 measure with estimated alpha and t-statistic

Next, is our measure 𝑃𝐽 better at identifying skilled funds than the conventional measures

of estimated alpha and its t-statistic? A priori, we have good reasons to believe so. First, using

estimated alpha alone omits important information about estimation precision. Second, though

21

the t-statistic equals estimated alpha multiplied by its precision (i.e., the inverse of the standard

error of estimated alpha), it does not differentiate the contribution from the two components. As

a result, the t-statistic only tells whether we can reject the null hypothesis of zero skill, but it

cannot be used to rank individual fund skill. In contrast, our performance measure adjusts for

precision of estimated alpha more efficiently by weighing the prior information and estimated

alpha.

Here, we make a comparison based on actual data. In particular, since most funds of

hedge funds (FOFs) invest in about 20-80 hedge funds (Brown, Gregoriou, and Pascalau, 2012),

we form a strategy by selecting top 20 funds based on the 𝑃𝐽 measure, and compare its out-of-

sample performance with that of alternative strategies picking top 20 funds based on estimated

alpha and t-statistic. Similar to the procedure in Table 4, in each month starting from January

1996, we compute the performance measure 𝑃𝐽 for each fund from the previous 24 months.

Then, we form an equal-weighted portfolio of investing in top 20 funds ranked by 𝑃𝐽.18 In a

similar way, we form two other equal-weighted portfolios by selecting top 20 funds based on

estimated alpha and t-statistic from the previous 24 months. The three portfolios are all

rebalanced monthly and held for different periods.

Table 5 reports the out-of-sample performance for the three strategies. The strategy based

on the 𝑃𝐽 measure significantly outperforms the other two for up to 24 months. For example, for

a six-month holding period, the portfolio of top funds ranked by our 𝑃𝐽 measure generates a risk-

adjusted return of 0.72% (t-statistic = 7.36), whereas the other two strategies yield 0.45% (t-

18 Note that if we hold the portfolio for multiple months, the actual number of funds held will exceed 20 since some

of the top 20 funds in one month will not stay in top 20 in later months. For example, for a three-month holding

period, the number of funds held in the portfolio will fall in the range of 20-60, depending on the transition of top

funds over time. We report the transition probabilities for the hedge funds in our sample in the next section.

22

statistic = 2.52) and 0.35% (t-statistic = 6.03), respectively. In untabulated test, we find that the

strategy based on the 𝑃𝐽 measure selects quite different funds from the other two strategies.19

This confirms that our 𝑃𝐽 measure is substantially different from both estimated alpha and its t-

statistic, and it provides more accurate information about fund skill.

Hedge funds with longer lockup periods generally have better performance (Aragon

2007; Agarwal, Daniel and Naik, 2009). Hence, we are concerned that the 𝑃𝐽 measure may

simply select funds with long lockup periods that restrict money redemption. As a robustness

check, in Panel B of Table 5, we repeat the analysis by removing the funds with lockup periods

longer than three months, and our inference is unchanged. Thus, the superior performance of

𝑃𝐽 is not driven by funds with long lockup periods.

5. Additional Tests and Robustness Checks

In this section, we conduct additional tests to gain further insights about hedge fund skill

as well as check the robustness of our results. We start with examining the transition

probabilities across the skill groups. Next, we test the relation between fund skill and investor

flows. Then, we link fund skill to fund characteristics. Finally, we examine the sensitivity of our

results to alternative factor models.

19 In fact, the fraction of common funds selected by the 𝑃𝐽 measure and t-statistic is 48.5% for the sample, and the

fraction of common funds selected by 𝑃𝐽 and estimated alpha is only 25.7%.

23

5.1. Transition probabilities across skill groups

In Table 6, we present the transition probabilities across the skill groups. For each month

we assign funds into one of four skill groups based on their 𝑃𝐽 measures from the previous 24

months. We then check how likely funds in each group remain in the group in 3, 6 and 12

months conditional on fund survival. As shown in Panel A, in three months, about 59% of

excellent funds will remain to be excellent and 71% of good funds remain to be good.

Furthermore, it is highly unlikely for an excellent fund in the current month to become a neutral

or bad fund in the future. As discussed above, it is natural to expect some decay in skill as

informational advantages erode over time. On the other hand, most bad funds either remain bad

or improve to become neutral if they continue to survive.

5.2. Fund skill and investor flows

How do investors affect and respond to fund skill? Given the persistence of performance,

can investors infer fund skill from past performance? We answer the questions by examining the

relation between fund skill and both prior and subsequent investor flows. As before, we use a 24-

month period to evaluate fund skill. Then, we examine fund flows subsequent to the evaluation

period as well as prior to the period. Following prior research (e.g., Sirri and Tufano, 1998), we

measure fund flows as the percentage change of fund total assets adjusting for fund returns.

Table 7 reports the results. First, investor flows chase past fund performance, as indicated

by a significantly higher level of money flowing into recent excellent and good funds than into

the other two groups. This is consistent with prior findings of Goetzmann, Ingersoll and Ross

(2003) and Getmansky, Liang, Schwarz, and Wermers (2010), suggesting that hedge fund

24

investors infer managerial skill from past fund performance. Meanwhile, the level of money

flows prior to the evaluation period is similar across the skill groups.

5.3. Skill type and fund characteristics

Table 8 relates the skill types to fund characteristics by regressing the performance

measures (i.e., conditional probabilities) on various fund characteristics. Funds with high

probability of being excellent or good tend to be large funds. These funds charge high

management and incentive fees and have long lockup and redemption notice periods. On the

other hand, neutral and bad funds seem unable to retain capital and have small fund size; they

also charge less incentive fee, perhaps indicating a lack of confidence in adding value. The

results are consistent with prior studies (e.g., Aragon 2007; Agarwal, Daniel and Naik, 2009).

Since our approach separates funds into different skill groups, our analysis has the richness to

check the relation between each skill type and fund characteristics, rather than examining the

association between one performance measure (e.g., estimated alpha) and fund characteristics. In

untabulated tests, we assign funds into four skill groups and perform probit regressions and the

main results are unchanged. Given the relationship between fund skill and fund characteristics,

future work can incorporate fund characteristics into the estimation procedure. We leave this

extension for future research.

5.4. Fund skill by investment styles

As differential investment styles may have different skill levels, we now examine the

skill distribution within each investment style. Since the two databases—TASS and HFR—use

25

somewhat overlapping but not identical classifications for investment strategies. We follow

Agarwal, Daniel and Naik (2009) to reclassify the funds into four broad styles: directional trade,

relative value, security selection, and multiple strategies. We exclude 479 funds in the analysis,

as their strategy information is either undefined or missing. Table 9 reports the results. Overall,

we observe a certain extent of variation in skill distribution across the styles.20 The directional

trade strategy group has the highest fraction (14.3%) of excellent funds and the skill distribution

among the remaining strategies tends to be similar in general. Meanwhile, we find that the

variability of skill in the directional trade style is also higher than that for the other styles.

Finally, the result also suggests that our skill types do not simply reflect the difference in

investment styles, since no single style seems to dominate others in terms of skill distribution.

5.5. Alternative factor models

So far, we have estimated fund alpha using the Fung and Hsieh seven-factor model. In

untabulated tests, we confirm the robustness of our results to alternative factor models. First,

Agarwal and Naik (2004) show that returns of several hedge fund strategies bear significant

exposure to factors built on returns on S&P 500 index options. We augment the Fung-Hsieh

factor model with two out-of-money call and put option factors proposed by Agarwal and Naik

(2004). Second, given their dynamic strategies, hedge funds’ risk exposures can vary over time.

To control for the potential impact of time-varying risk exposures on alpha estimate, we use the

Ferson and Schadt (1996) conditional model in which funds’ market beta varies with lagged

macro variables such as the three-month T-bill rate, a term spread, a default spread, and the

20 The variability of alpha and bootstrap standard errors for the parameters are not reported in the table to conserve

space, but they are available upon request.

26

dividend yield of the S&P index. Third, Getmansky, Lo, and Makarov (2004) show that hedge

fund returns exhibit substantial serial correlation. Serial correlation in returns can bias the

estimate of risk exposure (e.g., Scholes and Williams, 1977). In the spirit of Scholes and

Williams, we alleviate this concern by adding the one- and two-month lagged market returns to

the base model. Overall, our inference remains unchanged by using these alternative models. The

test details are untabulated to conserve space but available upon request.

6. Conclusions

In this paper, we present a new approach as well as empirical evidence about hedge fund

skill. Our study is motivated by the existence of skill and its heterogeneity among hedge funds.

Our method groups funds by skill based on their estimated alphas and standard errors, allowing

for the influence of luck. By assuming each fund comes from one of several distinct skill groups,

we use a modified EM algorithm to infer the number of skill groups, the fraction of funds in each

group, and the conditional probability that a fund belongs to each group.

Using data on monthly returns of a large sample of hedge funds, we find that about 48%

of the funds have positive skill, 43% zero skill, and 9% negative skill. Further, an investment

strategy based on our performance measure generates highly significant out-of-sample risk-

adjusted performance. We also show that our performance measure outperforms traditional

measures of estimated alpha and t-statistic in identifying skilled funds. Our inference is robust to

a wide array of sensitivity checks.

Our approach is flexible about the number of skill groups, which need not be restricted to

any particular number. Also, the only inputs required in our approach are estimated alpha and its

27

estimation error. Given these features, our approach can be applied to detect skill for other types

of investment vehicles such as private equity funds and other forms of skill such as market

timing ability, as long as skill is heterogeneous and so multiple skill groups exist.

28

References

Ackermann, Carl, Richard McEnally, and David Ravenscraft, 1999, The performance of hedge

funds: Risk, return, and incentives, Journal of Finance 54, 833–974.

Agarwal, Vikas and Narayan Naik, 2000, Multi-period performance persistence analysis of

hedge funds, Journal of Financial and Quantitative Analysis 35, 327–342.

Agarwal, Vikas and Narayan Naik, 2004, Risk and portfolio decisions involving hedge funds,

Review of Financial Studies 17, 63–98.

Agarwal, Vikas, Naveen Daniel, and Narayan Naik, 2009, Role of managerial incentives and

discretion in hedge fund performance, Journal of Finance 64, 2221−2256.

Aggarwal, Rajesh, Philippe Jorion, 2010, The performance of emerging hedge funds and

managers, Journal of Financial Economics 96, 238–256.

Aragon, George, 2007, Share restrictions and asset pricing: Evidence from the hedge fund

industry, Journal of Financial Economics 83, 33–58.

Asquith, Daniel, Jonathan Jones, and Robert Kieschnick, 1998, Evidence on price stabilization

and underpricing in early IPO returns, Journal of Finance 53, 1759–1773.

Bali, Turan, Stephen Brown, and Mustafa Caglayan, 2011, Do hedge funds’ exposures to risk

factors predict their future returns? Journal of Financial Economics 101, 36–68.

Barras, Laurent, Olivier Scaillet, and Russ Wermers, 2010, False discoveries in mutual fund

performance: Measuring luck in estimated alphas, Journal of Finance 65, 179–216.

Brown, Stephen and William Goetzmann, 1997, Mutual fund styles, Journal of Financial

Economics 43, 373–399.

Brown, Stephen, William Goetzmann, and Roger Ibbotson, 1999, Offshore hedge funds:

Survival and performance, 1989-95, Journal of Business 72, 91–117.

Brown, Stephen, William Goetzmann, Roger Ibbotson, and Stephen Ross, 1992, Survivorship

bias in performance studies, Review of Financial Studies 5, 553–580.

Brown, Stephen, Greg Gregoriou and Razvan Pascalau, 2012, Diversification in funds of hedge

Funds: Is it possible to overdiversify? Review of Asset Pricing Studies 2, 89-110.

Dempster, Arthur, Nan Laird, and Donald Rubin, 1977, Maximum likelihood from incomplete

data via the EM algorithm, Journal of the Royal Statistical Society Series B 39, 1–38.

Evans, Richard, 2010, Mutual fund incubation, Journal of Finance 65, 1581–1611.

http://www.afajof.org/

29

Fama, Eugene and Kenneth French, 2000, Luck versus skill in the cross-section of mutual fund

returns, Journal of Finance 65, 1915–1947.

Ferson, Wayne and Yong Chen, 2015, How many good and bad fund managers are there, really?,

Working paper, University of Southern California and Texas A&M University.

Ferson, Wayne and Rudi Schadt, 1996, Measuring fund strategy and performance in changing

economic conditions, Journal of Finance 51, 425–460.

Fung, William and David Hsieh, 2000, Performance characteristics of hedge funds and

commodity funds: Natural vs spurious biases, Journal of Financial and Quantitative Analysis 35,

291–307.

Fung, William and David Hsieh, 2001, The risk in hedge fund strategies: Theory and evidence

from trend followers, Review of Financial Studies14, 313–341.

Fung, William and David Hsieh, 2004, Hedge fund benchmarks: A risk-based approach,

Financial Analysts Journal 60, 65–80.

Getmansky, Mila, Bing Liang, Chris, Schwarz, and Russ Wermers, 2011, Share restrictions and

investor flows in the hedge fund industry, working paper, University of Massachusetts and

University of Maryland.

Getmansky, Mila, Andrew Lo, and Igor Makarov, 2004, An econometric model of serial

correlation and illiquidity in hedge fund returns, Journal of Financial Economics 74, 529–609.

Goetzmann, William, Jonathan Ingersoll, and Stephen Ross, 2003, High-water marks and hedge

fund management contracts, Journal of Finance 58, 1685–1717.

Jagannathan, Ravi, Alexey Malakhov, and Dmitry Novikov, 2010, Do hot hands exist among

hedge fund managers? An empirical evaluation, Journal of Finance 65, 217–255.

Jones, Christopher and Jay Shanken, 2005, Mutual fund performance with learning across funds,

Journal of Financial Economics 78, 507–552.

Kon, Stanley, 1984, Models of stock returns: a comparison, Journal of Finance 39, 147–165.

Kosowski, Robert, Narayan Naik, and Melvyn Teo, 2007, Do hedge funds deliver alpha? A

bayesian and bootstrap analysis, Journal of Financial Economics 84, 229–264.

Kosowski, Robert, Alan Timmermann, Halbert White, and Russ Wermers, 2006, Can mutual

fund “stars” really pick stocks? New evidence from a bootstrap analysis, Journal of Finance 61,

2551–2595.

Li, Haitao, Xiaoyan Zhang, and Rui Zhao, 2011, Investing in talents: Manager characteristics

and hedge fund performances, Journal of Financial and Quantitative Analysis 46, 59–82.

30

Liang, Bing, 1999, On the performance of hedge funds, Financial Analysts Journal 55, 72–85.

Liang, Bing, 2000, Hedge funds: The living and the dead, Journal of Financial and Quantitative

Analysis 35, 309-326.

McLachlan, Geoffrey and Thriyambakam Krishnan, 2008, The EM algorithm and extensions

(2nd edition), John Wiley & Sons, New York.

McLachlan, Geoffrey and David Peel, 2000, Finite mixture models, John Wiley & Sons, New

York.

Rudd, Paul, 1991, Extensions of estimation methods using the EM algorithm, Journal of

Econometrics 49, 305–341.

Scholes, Myron and Joseph Williams, 1977, Estimating betas from nonsynchronous data,

Journal of Financial Economics 5, 309–328.

Schwarz, Gideon, 1978, Estimating the dimension of a model, Annals of Statistics 6, 461–464.

Sirri, Eric and Peter Tufano, 1998, Costly search and mutual fund flows, Journal of Finance 53,

1589–1622.

Sun, Zheng, Ashley Wang, and Lu Zheng, 2012, The road less traveled: Strategy distinctiveness

and hedge fund performance, Review of Financial Studies 25, 96–143.

Titman, Sheridan and Cristian Tiu, 2011, Do the best hedge funds hedge? Review of Financial

Studies 24, 123–168.

Wu, C. F. Jeff, 1983, On the convergence properties of the EM algorithm, Annals of Statistics

11, 95–103.

31

Table 1

Summary statistics

This table summarizes the sample of 8,695 hedge funds. Each fund is required to have at least 24 monthly

returns. T is the number of monthly observations for each fund. Backfill is the length of the backfill

period in months. �̅� is the average monthly return, and 𝜎 is the standard deviation of fund returns. 𝜌1 and

𝜌2 are the first- and second-order autocorrelations of fund returns, respectively. The sample period is

from January 1994 to December 2011.

Mean Median Std Dev 10% 25% 75% 90%

T 73 60 43 30 40 94 140

Backfill 28.95 21.00 26.63 3.00 9.00 40.00 72.00

�̅� 0.70 0.65 0.69 −0.01 0.33 1.01 1.48

𝜎 3.44 3.15 1.83 1.30 2.03 4.61 6.02

Skewness −0.20 −0.10 1.08 −1.40 −0.66 0.37 0.86

Ex. Kurtosis 2.83 1.44 4.77 −0.15 0.44 3.33 6.92

𝜌1 0.15 0.14 0.21 −0.10 0.01 0.27 0.42

𝜌2 0.06 0.05 0.18 −0.16 −0.06 0.17 0.29

32

Table 2

Estimated alphas and standard errors

This table summarizes estimated alphas and their standard errors from the sample funds based on the Fung-Hsieh seven-factor model. Alpha is

estimated as the intercept from the regression of fund excess returns on the factors. Panel A does not control for backfill bias, while Panel B

controls for backfill bias by including a dummy variable for each fund’s backfill period in the factor model regression. We report mean, median

and standard deviation of estimated alphas, as well as the percentage of funds exceeding the specified thresholds of t-statistic. Alpha is in percent

per month. N is the number of funds. The sample period is from January 1994 to December 2011.

Estimated Alpha Standard Error t-statistics

N Mean Median Std Dev Mean Median Std Dev % t<–1.96 % t<–1.65 % t>1.65 % t>1.96

Panel A: Without control for backfill bias

All funds 8,695 0.39 0.35 0.72 0.43 0.36 0.26 1.98 3.08 35.80 29.21

Live HF 3,076 0.52 0.47 0.63 0.40 0.34 0.23 0.91 1.33 45.12 36.48

Defunct HF 5,619 0.32 0.29 0.75 0.44 0.37 0.26 2.56 4.04 30.70 25.24

Panel B: With control for backfill bias

All funds 8,695 0.11 0.16 1.04 0.59 0.48 0.38 4.51 6.60 20.30 15.41

Live HF 3,076 0.32 0.31 0.83 0.55 0.45 0.36 2.47 3.84 27.31 20.87

Defunct HF 5,619 0.00 0.07 1.12 0.60 0.49 0.39 5.62 8.12 16.46 12.42

33

Table 3

Parameter estimates for fund skill distributions

This table presents the parameter estimates for different skill groups in the sample funds. Estimated alpha

is from the Fung-Hsieh seven-factor model with control for backfill bias. The Bayesian information

criterion suggests a mixture of four skill groups that are labeled as Excellent, Good, Neutral, and Bad,

respectively. The parameters �̂�𝑗, �̂�𝑗 and �̂�𝑗 are mean, variability, and the fraction of each skill group.

Alpha is in percent per month. Bootstrap standard errors are reported in parentheses. The sample period is

from January 1994 to December 2011.

�̂�𝑗 �̂�𝑗 �̂�𝑗

Excellent 0.722 0.362 0.093

(0.218) (0.128) (0.029)

Good 0.348 0.253 0.384

(0.132) (0.119) (0.036)

Neutral 0.000 0.177 0.430

( — ) (0.072) (0.033)

Bad −0.804 0.596 0.093

(0.221) (0.240) (0.026)

34

Table 4

Performance persistence

This table reports the result of performance persistence. We present the out-of-sample alpha for equal-

weighted portfolios consisting of hedge funds in different skill groups. For each fund in each month from

January 1996 through December 2011, we compute the fund’s conditional probability of coming from

each of the four skill groups (Excellent, Good, Neutral, and Bad), given the estimates from the previous

24 months. Then, the fund is assigned into a skill group depending on its conditional probabilities. Next,

we form equal-weighted portfolios of funds in the four skill groups. The portfolios are rebalanced

monthly and held for different holding periods. The out-of-sample alpha is estimated using the Fung-

Hsieh factor model. Alpha is in percent per month.

3 mo. 6 mo. 9 mo. 12 mo. 24 mo. 36 mo.

Excellent alpha 0.555 0.531 0.496 0.456 0.400 0.380

t-stat 7.407 7.472 7.004 6.471 5.882 5.794

Good alpha 0.287 0.271 0.263 0.254 0.243 0.239

t-stat 4.903 4.659 4.549 4.419 4.263 4.252

Neutral alpha 0.031 0.051 0.065 0.077 0.090 0.088

t-stat 0.509 0.853 1.088 1.281 1.491 1.446

Bad alpha −0.151 −0.099 −0.061 −0.027 0.014 0.041

t-stat −1.983 −1.335 −0.824 −0.372 0.190 0.566

Excellent–Good alpha 0.268 0.260 0.232 0.202 0.157 0.141

t-stat 6.921 7.575 7.022 6.234 5.215 4.899

Excellent–Neutral alpha 0.525 0.480 0.430 0.379 0.310 0.292

t-stat 7.254 7.045 6.387 5.740 4.940 4.830

Excellent–Bad alpha 0.707 0.629 0.556 0.483 0.387 0.339

t-stat 7.314 7.061 6.349 5.692 4.749 4.275

35

Table 5

Out-of-sample performance comparison

This table reports the out-of-sample alpha for three equal-weighted portfolios consisting of top 20 hedge

funds ranked by the 𝑃𝐽 measure, estimated alpha, and t-statistic, respectively, from January 1996 through

December 2011. 𝑃𝐽 is a fund’s conditional probability of being Excellent given the parameter estimates

from the previous 24 months. The portfolios are rebalanced monthly and held for different holding

periods. The out-of-sample alpha is estimated using the Fung-Hsieh factor model. Similarly, we form two

equal-weighted portfolios of top 20 funds ranked by estimated alpha and its t-statistic from the previous

24 months. Panel A presents the results using top 20 funds, and Panel B presents the results using only

top 20 funds whose lockup periods are shorter than 3 months.

Panel A

3 mo. 6 mo. 9 mo. 12 mo. 24 mo. 36 mo.

𝑃𝐽 alpha 0.820 0.716 0.652 0.607 0.552 0.491

t-stat 8.02 7.36 6.75 6.33 5.87 5.34

Est. alpha alpha 0.430 0.446 0.395 0.361 0.340 0.377

t-stat 2.25 2.52 2.36 2.24 2.21 2.55

t-statistic alpha 0.360 0.352 0.341 0.342 0.357 0.359

t-stat 5.79 6.03 6.29 6.58 7.00 7.05

𝑃𝐽– Est. alpha alpha 0.389 0.270 0.256 0.246 0.212 0.113

t-stat 2.56 1.95 2.03 2.09 1.97 1.12

𝑃𝐽– t-statistic alpha 0.459 0.364 0.310 0.266 0.195 0.132

t-stat 4.42 3.77 3.33 2.93 2.24 1.59

Panel B

3 mo. 6 mo. 9 mo. 12 mo. 24 mo. 36 mo.

𝑃𝐽 alpha 0.806 0.711 0.666 0.626 0.555 0.497

t-stat 7.58 6.98 6.69 6.36 5.85 5.37

Est. alpha alpha 0.434 0.439 0.453 0.425 0.348 0.362

t-stat 2.23 2.43 2.65 2.57 2.25 2.43

t-statistic alpha 0.365 0.346 0.33 0.326 0.324 0.313

t-stat 5.83 5.57 5.46 5.55 5.60 5.53

𝑃𝐽– Est. alpha alpha 0.372 0.272 0.213 0.200 0.207 0.136

t-stat 2.58 2.05 1.81 1.80 2.05 1.45

𝑃𝐽– t-statistic alpha 0.441 0.365 0.336 0.299 0.231 0.184

t-stat 4.26 3.70 3.50 3.20 2.64 2.18

36

Table 6

Transition probabilities

This table reports transition probabilities across the four skill groups from the current month to the next 3,

6 and 12 months. In each month from January 1996 through December 2011, we use a rolling window of

the previous 24 months to evaluate fund skill and form four groups based on funds’ conditional

probabilities of being Excellent, Good, Neutral, and Bad. Then, for each skill group we report the portion

of its funds that are Excellent, Good, Neutral, or Bad in the next 3, 6, and 12 months.

Excellent Good Neutral Bad

Panel A: Next 3 months

Excellent 58.64% 38.54% 2.59% 0.23%

Good 9.92% 70.77% 18.60% 0.70%

Neutral 0.81% 18.63% 69.23% 11.33%

Bad 0.33% 3.29% 37.20% 59.18%

Panel B: Next 6 months

Excellent 45.29% 47.32% 6.65% 0.73%

Good 11.70% 60.45% 25.62% 2.23%

Neutral 2.13% 25.29% 58.89% 13.69%

Bad 0.85% 8.99% 45.17% 44.99%

Panel C: Next 12 months

Excellent 30.23% 50.84% 16.60% 2.34%

Good 12.12% 50.32% 32.10% 5.45%

Neutral 4.90% 31.88% 49.06% 14.16%

Bad 2.88% 19.75% 49.37% 28.00%

37

Table 7

Fund skill and investor flows

This table reports the relation between hedge fund skill and investor flows. We use a 24-month period to

evaluate fund skill and form four portfolios based on funds’ conditional probability of being Excellent,

Good, Neutral, and Bad. Then, the average fund flows prior and subsequent to the evaluation period are

reported for each skill portfolio. For example, when the evaluation period is January 2000–December

2001, “−1” denotes December 1999, “−12” denotes the year of 1999 from January to December, “+1”

denotes January 2002, and “+12” denotes the year of 2002 from January to December, respectively. Fund

flows are in percent (of fund total assets) per month. Newey-West t-statistics are reported in parentheses.

Prior flows (%/mo.) Subsequent flows (%/mo.)

–12 mo. –6 mo. –3 mo. –1 mo. +1 mo. +3 mo. +6 mo. +12 mo.

Excellent 0.236 0.363 0.420 0.480 1.216 1.179 0.976 0.693

Good 0.254 0.305 0.325 0.380 0.336 0.317 0.223 0.075

Neutral 0.163 0.189 0.221 0.329 −0.692 −0.712 −0.715 −0.701

Bad 0.205 0.262 0.306 0.403 −1.179 −1.162 −1.099 −1.061

Excel–Bad 0.031 0.101 0.114 0.076 2.395 2.341 2.075 1.755

(0.23) (0.93) (1.24) (0.89) (18.00) (17.22) (14.46) (16.51)

38

Table 8

Skill type and fund characteristics

This table reports the results from the panel regressions of the conditional probability (in percentage) of

being Excellent, Good, Neutral, and Bad on hedge fund characteristics. Management fee and Incentive fee

are the management fees and incentive fees, respectively. High-water mark dummy is an indicator

variable equal to one if the fund uses high water mark and zero otherwise. Lockup period is the capital

lockup period. Notice period is the advance notice period required for money redemption. Fund Age is the

age of the fund. Fund Asset is the monthly log asset. t-statistics, reported in parentheses, are calculated

using standard errors clustered at both fund and monthly levels.

Excellent Good Neutral Bad

Management fee (%) 0.91 1.07 −1.96 −0.02

(4.52) (3.15) (−4.98) (−0.14)

Incentive fee (%) 0.09 0.11 −0.15 −0.06

(3.56) (2.22) (−2.64) (−3.18)

High-water mark dummy −0.42 0.30 0.23 −0.11

(−1.41) (0.67) (0.46) (−0.59)

Lockup period (year) 0.80 0.78 −1.40 −0.18

(2.98) (2.75) (−3.76) (−1.67)

Notice period (year) 1.54 1.92 −2.85 −0.62

(3.04) (3.00) (−3.42) (−3.24)

Fund age 0.03 0.01 −0.13 0.09

(0.63) (0.10) (−1.58) (3.26)

Fund asset 0.58 1.49 −1.43 −0.64

(10.30) (14.15) (−12.62) (−15.19)

Fund flow 0.36 0.67 −0.74 −0.30

(12.23) (11.46) (−12.67) (−9.75)

Strategy effect Yes Yes Yes Yes

Adj. R2(%) 3.25 5.94 5.57 4.81

39

Table 9

Distribution parameters by investment style

This table reports parameter estimates for different skill groups across hedge fund investment styles:

directional trade (DT), security selection (SS), relative value (RV), and multiple strategy (MS). Estimated

alphas are from the Fung-Hsieh seven-factor model with control for backfill bias. The parameter 𝜇𝑗

denotes mean of alpha for each skill group, and 𝜋𝑗 is the fraction of each group. Alpha is in percent per

month.

DT SS RV MS

�̂�𝐸 0.967 0.618 0.923 0.890

�̂�𝐺 0.215 0.246 0.290 0.397

�̂�𝑁 0.000 0.000 0.000 0.000

�̂�𝐵 −0.204 −0.447 −0.554 −0.394

�̂�𝐸 0.143 0.065 0.057 0.056

�̂�𝐺 0.358 0.456 0.443 0.459

�̂�𝑁 0.353 0.388 0.414 0.374

�̂�𝐵 0.147 0.091 0.086 0.111

N 1,844 3,744 994 1,634

40

Figure 1. This figure shows three performance groups (solid lines) and the composite distribution (dashed

line). The distribution parameters are set as 𝜋𝐺 = 0.2, 𝜋𝑁 = 0.7, 𝜋𝐵 = 0.1; 𝜇𝐺 = 2%, 𝜇𝑁 = 0, 𝜇𝐵 =

−2%, and 𝜎𝑗 = 0.7% for all groups. The distributions are based on true alphas.

41

Figure 2. This figure shows the effects of two sources of variability in fund skill—within-group skill

variability 𝜔𝑖 and estimation error 𝑒𝑖. A fund with estimated alpha equal to �̂� could be from the zero-skill

group (solid line) if the combined effect 𝜔0 + 𝑒0 is positive, or from the positive mean alpha group

(dashed line) if the combined effect 𝜔1 + 𝑒1 is negative.

42

Appendix

In this appendix, we first describe our estimation procedure, and then we present

simulation results that confirm the validity of our approach. Finally, we provide some additional

evidence on the potential investment value of our method.

A.1 The estimation method

We index all the estimated alphas for N funds as �̂�1, �̂�2, … … , �̂�𝑁 and their corresponding

standard errors as 𝑠1, 𝑠2, … … , 𝑠𝑁. As described in Equation (4), the combined variance (𝜎𝑖,𝑗)2 =

(𝑠𝑖)2 + (𝜎𝑗)2. Further, we use a random variable 𝑧𝑗ϵ{0,1} to denote the unconditional probability

that any fund belongs to group j, i.e., 𝑃(𝑧𝑗 = 1) = 𝜋𝑗 . Next, we rewrite the marginal distribution

for each �̂�𝑖 as:

𝑓(�̂�𝑖) = ∑ 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)

𝐽

𝑗=1

= ∑ 𝑃(𝑧𝑗 = 1)𝑃(�̂�𝑖|𝑧𝑗 = 1). (A1)

𝐽

𝑗=1

We use an Expectation-Maximization (EM) iteration, similar in spirit to Dempster, Laird,

and Rubin (1977) to find the maxima. The expectation step in the iteration is essentially

calculating the conditional probability that �̂�𝑖 comes from group j, given the estimates of

{𝜇𝑗, 𝜎𝑗, 𝜋𝑗} from the previous step:

𝑧𝑖𝑗 = 𝑃(𝑧𝑗 = 1|�̂�𝑖, 𝑠𝑖) =𝑃(𝑧𝑗 = 1)𝑃(�̂�𝑖|𝑧𝑗 = 1)

∑ 𝑃(𝑧𝑘 = 1)𝑃(�̂�𝑖|𝑧𝑘 = 1)𝐽𝑘=1

=𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗)

∑ 𝜋𝑘𝜙(�̂�𝑖; 𝜇𝑘 , 𝜎𝑖,𝑘)𝐽𝑘=1

. (A2)

The maximization step is from the first order conditions of the log likelihood function.

The likelihood function of these N data points is:

𝐿𝑁 = ∑ ln 𝑓

𝑁

𝑖=1

(�̂�𝑖) + 𝜆(∑ 𝜋𝑗 − 1

𝐽

𝑗=1

) = ∑ ln [

𝑁

𝑖=1

∑ 𝜋𝑗𝜙

𝐽

𝑗=1

(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)] + 𝜆(∑ 𝜋𝑗 − 1

𝐽

𝑗=1

). (A3)

43

Following the EM algorithm, the first order conditions for 𝜇𝑗, 𝜎𝑗 , and 𝜋𝑗 are:

∑ 𝑧𝑖𝑗

𝜇𝑗 − �̂�𝑖

(𝜎𝑖,𝑗)2

𝑁

𝑖=1

= 0,

∑ 𝑧𝑖𝑗

𝑁

𝑖=1

[(𝜇𝑗 − �̂�𝑖)

2

(𝜎𝑖,𝑗)4−

1

(𝜎𝑖,𝑗)2] = 0,

1

𝑁∑ 𝑧𝑖𝑗

𝑁

𝑖=1

= 𝜋𝑗.

(A4)

In the equations above, mean 𝜇𝑗 is calculated by taking estimated alpha for each fund, weighted

by the precision of the estimate and the probability that this fund comes from group j. The weight

𝑧𝑖𝑗 pays more attention to funds that are deemed more likely to come that group. The parameter

𝜋𝑗 is the average of all the conditional probabilities that the funds come from group 𝑗.

However, the variance 𝜎𝑗2, as part of (𝜎𝑖,𝑗)2, is difficult to estimate directly, because it

does not have a closed-form solution unlike 𝜇𝑗 and 𝜋𝑗 from Equation (A4). To address this

difficulty, we use the first-order condition from Equation (A3) and obtain

0 =𝑑𝐿𝑁

𝑑𝜎𝑗

=𝑑 ∑ ln 𝑓𝑁

𝑖=1 (�̂�𝑖)

𝑑𝜎𝑗

= ∑

𝑑𝑓(�̂�𝑖)𝑑𝜎𝑗

𝑓(�̂�𝑖)

𝑁

𝑖=1

= ∑𝑑(∑ 𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)

+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗) 𝑑𝛼𝑖)/𝑑𝜎𝑗

𝐽𝑗=1

𝑓(�̂�𝑖)

𝑁

𝑖=1

= ∑𝜋𝑗𝑑(∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)

+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗) 𝑑𝛼𝑖)/𝑑𝜎𝑗

𝑓(�̂�𝑖)

𝑁

𝑖=1

= ∑𝜋𝑗(−

1

𝜎𝑗∫ 𝜙(�̂�𝑖;𝛼𝑖,𝑠𝑖)

+∞

−∞𝜙(𝛼𝑖;𝜇𝑗,𝜎𝑗) 𝑑𝛼𝑖+∫ 𝜙(�̂�𝑖;𝛼𝑖,𝑠𝑖)

+∞

−∞𝜙(𝛼𝑖;𝜇𝑗,𝜎𝑗)

(𝛼𝑖−𝜇𝑗)2

𝜎𝑗3 𝑑𝛼𝑖)

𝑓(�̂�𝑖)𝑁𝑖=1 . (A5)

44

Further, we recognize that

∫ 𝜙(�̂�𝑖; 𝛼𝑖 , 𝑠𝑖)+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗) 𝑑𝛼𝑖

= ∫1

√2𝜋𝑠𝑖

𝑒−

(�̂�𝑖−𝛼𝑖)2

2𝑠𝑖2

+∞

−∞

1

√2𝜋𝜎𝑗

𝑒−


2𝜎𝑗2

𝑑𝛼𝑖

= ∫1

2𝜋𝑠𝑖𝜎𝑗

𝑒−

(�̂�𝑖−𝛼𝑖)2

2𝑠𝑖2 −


2𝜎𝑗2

+∞

−∞

𝑑𝛼𝑖

= ∫1

2𝜋𝑠𝑖𝜎𝑗

𝑒−𝛼𝑖

2(1

2𝑠𝑖2+

1

2𝜎𝑗2)+𝛼𝑖(

�̂�𝑖

𝑠𝑖2+

𝜇𝑗

𝜎𝑗2)−

(�̂�𝑖)2

2𝑠𝑖2 −

(𝜇𝑗)2

2𝜎𝑗2

+∞

−∞

𝑑𝛼𝑖

=𝑒

12

(𝑀2−(�̂�𝑖)2

𝑠𝑖2 −

(𝜇𝑗)2

𝜎𝑗2 )

√2𝜋𝜎𝑖,𝑗

∫1

√2𝜋𝑒

−(𝛽𝑖−𝑀)2

2𝑠𝑖2

+∞

−∞

𝑑𝛽𝑖

=1

√2𝜋𝜎𝑖,𝑗

𝑒−

(�̂�𝑖−𝜇𝑗)2

2(𝑠𝑖2+𝜎𝑗

2)

= 𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗), (A6)

where 𝛽𝑖 = 𝛼𝑖𝜎𝑖,𝑗

𝑠𝑖𝜎𝑗 and 𝑀 =

�̂�𝑖𝜎𝑗

𝑠𝑖𝜎𝑖,𝑗+

𝜇𝑗𝑠𝑖

𝜎𝑗𝜎𝑖,𝑗. Using Equation (A6), we rewrite (A5) as:

𝜎𝑗2 ∑

𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗 , 𝜎𝑖,𝑗)

𝑓(�̂�𝑖)

𝑁

𝑖=1

= ∑

𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)

(𝛼𝑖 − 𝜇𝑗)2

𝜎𝑗2 𝑑𝛼𝑖

𝑓(�̂�𝑖)

𝑁

𝑖=1

. (A7)

We can further write it using Equations (A1) and (A2):

𝜎𝑗2 ∑

𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)

𝑓(�̂�𝑖)

𝑁

𝑖=1

45

= 𝜎𝑗2 ∑

𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)

∑ 𝜋𝑗𝜙(�̂�𝑖; 𝜇𝑗, 𝜎𝑖,𝑗)𝐽𝑘=1

𝑁

𝑖=1

= ∑ 𝜎𝑗2𝑧𝑖𝑗

𝑁

𝑖=1

= ∑𝜋𝑗 ∫ 𝜙(�̂�𝑖; 𝛼𝑖, 𝑠𝑖)

+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)(𝛼𝑖 − 𝜇𝑗)2𝑑𝛼𝑖

𝑓(�̂�𝑖)

𝑁

𝑖=1

. (A8)

Finally, using the expression of 𝜋𝑗 in Equation (A4), we obtain:

𝜎𝑗2 =

1

𝑁∑

∫ 𝜙(�̂�𝑖; 𝛼𝑖 , 𝑠𝑖)+∞

−∞𝜙(𝛼𝑖; 𝜇𝑗, 𝜎𝑗)(𝛼𝑖 − 𝜇𝑗)2𝑑𝛼𝑖

𝑓(�̂�𝑖)

𝑁

𝑖=1

. (A9)

From this equation, we use the 𝜎𝑗 from previous step to evaluate the right-hand-side expression,

and use this result as our new estimate of 𝜎𝑗. We iterate to solve for 𝜎𝑗 recursively until

convergence. To avoid local maxima, we use different initial values for the iterations and select

the set of parameters with the maximum likelihood. To ensure economic meaningfulness, we

restrict the difference in mean alpha 𝜇𝑗 between two skill groups to be at least 0.17% per month

(i.e., 2% per year) and 𝜋𝑗 to be at least 5%. Finally, for each fund, we obtain 𝑠𝑖 as the estimated

standard error of alpha from the factor model.

When estimating the parameters in actual data, we do not directly observe the number of

the underlying distributions. Hence, we compare estimates from different numbers of

distributions and select the one that has the lowest Bayesian Information Criterion (BIC) as

proposed by Schwarz (1978). The BIC, defining the model fit, is

𝐵𝐼𝐶= −2𝑙𝑛(𝐿) + 𝑘𝑙𝑛(𝑁), (A10)

46

where 𝐿 is the maximized likelihood function, 𝑘 is the number of free parameters, and 𝑁 is the

number of funds.

A.2 The simulations

To check the performance of our estimation procedure, we use simulations to compare

our estimates with true values of the parameters. The simulations as outlined below.

Step 1: We select J sets of group parameters {𝜇𝑗, 𝜎𝑗 , 𝜋𝑗} as true values.

Step 2: Based on {𝜇𝑗, 𝜎𝑗, 𝜋𝑗}, the composite distribution of true alphas (𝑓(𝛼𝑖)) is known.

We randomly draw N samples {𝛼1, 𝛼2, … … , 𝛼𝑁} from the composite distribution, representing

the N managers’ true skill. We set N equal to 9,000, which roughly matches the number of funds

in our sample.

Step 3: We then add random noise to each of manager’s skill using the empirical

estimation errors from the actual data. Specifically, suppose the empirical standard errors are

{𝑠1, 𝑠2, … … , 𝑠𝑁}, then the managers’ skill after adding noise is {𝛼1 + 𝜀1, 𝛼2 + 𝜀2, … … , 𝛼𝑁 +

𝜀𝑁}, where 𝜀𝑖~𝑁(0, 𝑠𝑖2). Denote this vector of skill, after adding noise, as {�̂�1, �̂�2, … … , �̂�𝑁},

which serve as “observed” alpha in our simulation.

Step 4: We use {�̂�1, �̂�2, … … , �̂�𝑁} as the inputs to our estimation procedures and start the

expectation and maximization iteration. Since it is possible that the iterations are sensitive to the

initial values, for each iteration we generate 1,000 random initial values of {𝜇𝑗, 𝜎𝑗, 𝜋𝑗} where

𝜇𝑗 ∈ U[−1.5%, 1.5%] per month, 𝜎𝑗 ∈ U[0.05,1] and 𝜋𝑗 ∈ U[5%, 50%] where U[a,b] is the

uniform distribution on interval [a,b]. We use the same procedure to generate initial values when

we estimate our parameters from the actual data.

47

Table A1 presents the results of four skill groups with one of them having zero mean

alpha. (The results using three and five groups are similar.) To check the sensitivity of our

estimation to different parameter values, we select three sets of true values for the parameters

that are close to the estimates from the actual data (as reported in Table 3). For each set of true

values, we perform 1,000 simulations from which we obtain means and standard deviations of

the estimates. Overall, our algorithm produces estimates that, while imperfect, are largely

accurate in characterizing different skill groups among funds.

In Table A2, we check how well the 𝑃𝐽 measure identifies Excellent managers, since we

know whether or not the managers come from the Excellent group in the simulations. In each

simulation, funds are ranked by 𝑃𝐽 and the top 9.3% of the funds are selected. We report the

identification rate of 𝑃𝐽, defined as the percentage of the selected funds that are truly Excellent.

Similarly, we report the identification rates for the top 9.3% of funds when ranked by estimated

alpha and t-statistic. The results shows that the 𝑃𝐽 measure is better able to identify skilled funds

than the alternative measures.

In addition, the average alpha for the group of funds identified by 𝑃𝐽 is higher than those

based on the alternative measures. Note that top 9.3% contains more than 800 funds, given that

the simulations use 9,000 funds. When holding so many funds, the effects of good luck and bad

luck tend to cancel out, which attenuates the benefit of our approach to some extent. In practice,

however, an investor (such as a fund of hedge funds) typically invest in a much smaller number

of hedge funds (e.g., Brown, Gregoriou, and Pascalau, 2012). In Section 4.3, with actual hedge

fund data, we evaluate the performance difference for a portfolio that starts by holding 20 hedge

funds, and the results are reported in Table 5.

48

Table A1

Comparing parameter estimates with true values

This table presents the estimates of distribution parameters for a set of simulated hedge funds for which

the true parameters are known. Excellent, Good, Neutral, and Bad are the skill groups, and True stands

for true values. The parameters �̂�𝑗, �̂�𝑗 and �̂�𝑗 are the mean, variability and the fraction of skill groups.

Standard errors estimated from 1,000 simulations are reported in parentheses. Panels A, B, and C use

three different sets of true values for the parameters that are close to the actual estimates from our sample.

Panel A

�̂�𝑗 �̂�𝑗 �̂�𝑗

Excellent 0.930 0.657 0.103

(0.291) (0.099) (0.028)

True 1.000 0.700 0.100

Good 0.283 0.764 0.384

(0.108) (0.144) (0.052)

True 0.300 0.700 0.400

Neutral 0.000 0.688 0.412

( — ) (0.204) (0.112)

True 0.000 0.700 0.400

Bad −0.884 0.624 0.101

(0.325) (0.251) (0.029)

True −1.000 0.700 0.100

Panel B

�̂�𝑗 �̂�𝑗 �̂�𝑗

Excellent 0.710 0.936 0.096

(0.251) (0.144) (0.033)

True 0.600 1.000 0.100

Good 0.301 0.298 0.402

(0.056) (0.022) (0.082)

True 0.300 0.300 0.400

Neutral 0.000 0.302 0.406

( — ) (0.170) (0.042)

True 0.000 0.300 0.400

Bad −0.703 0.926 0.097

(0.197) (0.275) (0.037)

True −0.600 1.000 0.100

49

Table A1, continued

Panel C

�̂�𝑗 �̂�𝑗 �̂�𝑗

Excellent 0.580 0.910 0.066

(0.182) (0.258) (0.023)

True 0.600 1.000 0.060

Good 0.293 0.293 0.409

(0.067) (0.101) (0.099)

True 0.300 0.300 0.440

Neutral 0.000 0.295 0.460

( — ) (0.095) (0.124)

True 0.000 0.300 0.440

Bad −0.329 0.953 0.065

(0.096) (0.317) (0.022)

True −0.400 1.000 0.060

50

Table A2

Identification rate of different methods

This table reports the identification rate (i.e., the ability to identify skilled funds) using three different

measures—the 𝑃𝐽 measure, estimated alpha, and t-statistic. The true alphas are generated using the

parameter estimates in Table 3. We add noise to each true alpha using estimation error 𝑠𝑖, and the noise is

assigned randomly to each fund. Then, funds ranked as top 9.3% by each measure are selected.

Identification rate reports the percentage of the selected funds that are truly Excellent. Performance is the

average alpha (in percent per month) of the selected funds identified by each measure.

Top 𝑃𝐽 Top est. alpha Top t-stat

Identification rate 36.18% 28.51% 31.27%

Performance (%/mo.) 0.70 0.53 0.64

Documents

Hedge Funds: The Good, the Bad, and the Lucky · 2015. 9. 8. · Hedge Funds: The Good, the Bad, and the Lucky August 5, 2015 Abstract We develop a new method to evaluate hedge fund