34
JOURNAL OF FINANCIAL AND OUANTITATIVE ANALYSIS VOL. 35, NO. 3, SEPTEMBER 2000 COPYRIGHT 2000. SCHOOL OF BUSINESS ADMINISTRATION. UNIVERSITY OF WASHINGTON. SEATTLE. WA 98195 Momingstar Ratings and Mutual Fund Performance Christopher R. Blake ancj Matthew R. Morey* Abstract This study examines the Momingstar rating system as a predictor of mutual fund perfor- mance for U.S. domestic equity funds. We also compare the predictive abilities of the Momingstar rating system with those of altemative predictors. The results indicate find- ings that are robust across different samples, ages and styles of funds, and performance measures. First, low ratings from Momingstar generally indicate relatively poor future performance. Second, there is little statistical evidence that Momingstar's highest-rated funds outperform the next-to-highest and median-rated funds. Third, Momingstar ratings, at best, do only slightly better than the altemative predictors in forecasting future fund performance. I. Introduction In recent years, increasing attention has been paid to the persistence of mu- tual fund performance in the finance literature.' Yet, to date, there has been con- siderably less attention devoted to the predictive abilities of the Momingstar five- star mutual fund rating service that many investors use as a guide in their mutual fund selections. This study attempts to fill that void by examining the ability of the Momingstar ratings to predict both unadjusted and risk-adjusted retums in comparison with performance metrics common in the performance literature. The question of whether Momingstar ratings predict out-of-sample perfor- mance is an important one, given that several studies in the performance literature have documented that new cash flows from investors are related to past perfor- mance ratings (see, e.g., Sirri and Tufano (1998) and Gruber (1996)). In fact, there is evidence that high-rated funds experience cash inflows that are far greater * Blake, Graduate School of Business Administration, Fordham University, 113 West 60th St., New York, NY 10023; Morey, Lubin School of Business, Pace University, 1 Pace Plaza, New York, NY 10038. Morey acknowledges financial support from the Economic and Pension Research Department of TIAA-CREF. We thank Will Goetzmann and Charles Trzcinka for data and Stephen Brown (the editor), Edwin Elton, Steve Foerster, Doug Fore, Martin Gruber, Mark Hulbert, Zoran Ivkovid (the referee), Richard C. Morey, Derrick Reagle, Emily Rosenbaum, H. D. Vinod/ Mark Warshawsky, and .seminar participants at the Securities and Exchange Commission and the 1999 European Finance Association Meetings (Helsinki) for helpful comments and suggestions. 'For example, see Hendricks, Patel, and Zeckhauser (1993), Goetzmann and IbboLson (1994), Malkiel (1995), Brown and Goetzmann (1995), Elton, Gruber, and Blake (1996a), and Carhart (1997). 451

Morningstar ratings and fund performance blake morey

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Morningstar ratings and fund performance blake morey

JOURNAL OF FINANCIAL AND OUANTITATIVE ANALYSIS VOL. 35, NO. 3, SEPTEMBER 2000COPYRIGHT 2000. SCHOOL OF BUSINESS ADMINISTRATION. UNIVERSITY OF WASHINGTON. SEATTLE. WA 98195

Momingstar Ratings and Mutual FundPerformance

Christopher R. Blake ancj Matthew R. Morey*

Abstract

This study examines the Momingstar rating system as a predictor of mutual fund perfor-mance for U.S. domestic equity funds. We also compare the predictive abilities of theMomingstar rating system with those of altemative predictors. The results indicate find-ings that are robust across different samples, ages and styles of funds, and performancemeasures. First, low ratings from Momingstar generally indicate relatively poor futureperformance. Second, there is little statistical evidence that Momingstar's highest-ratedfunds outperform the next-to-highest and median-rated funds. Third, Momingstar ratings,at best, do only slightly better than the altemative predictors in forecasting future fundperformance.

I. Introduction

In recent years, increasing attention has been paid to the persistence of mu-tual fund performance in the finance literature.' Yet, to date, there has been con-siderably less attention devoted to the predictive abilities of the Momingstar five-star mutual fund rating service that many investors use as a guide in their mutualfund selections. This study attempts to fill that void by examining the ability ofthe Momingstar ratings to predict both unadjusted and risk-adjusted retums incomparison with performance metrics common in the performance literature.

The question of whether Momingstar ratings predict out-of-sample perfor-mance is an important one, given that several studies in the performance literaturehave documented that new cash flows from investors are related to past perfor-mance ratings (see, e.g., Sirri and Tufano (1998) and Gruber (1996)). In fact,there is evidence that high-rated funds experience cash inflows that are far greater

* Blake, Graduate School of Business Administration, Fordham University, 113 West 60th St., NewYork, NY 10023; Morey, Lubin School of Business, Pace University, 1 Pace Plaza, New York, NY10038. Morey acknowledges financial support from the Economic and Pension Research Departmentof TIAA-CREF. We thank Will Goetzmann and Charles Trzcinka for data and Stephen Brown (theeditor), Edwin Elton, Steve Foerster, Doug Fore, Martin Gruber, Mark Hulbert, Zoran Ivkovid (thereferee), Richard C. Morey, Derrick Reagle, Emily Rosenbaum, H. D. Vinod/ Mark Warshawsky,and .seminar participants at the Securities and Exchange Commission and the 1999 European FinanceAssociation Meetings (Helsinki) for helpful comments and suggestions.

'For example, see Hendricks, Patel, and Zeckhauser (1993), Goetzmann and IbboLson (1994),Malkiel (1995), Brown and Goetzmann (1995), Elton, Gruber, and Blake (1996a), and Carhart (1997).

451

Page 2: Morningstar ratings and fund performance blake morey

452 Journal of Financial and Quantitative Analysis

in size than the cash outflows experienced by low-rated funds (see, e.g., Sirri andTufano (1998) and Goetzmann and Peles (1997)). Hence, examining performanceacross funds grouped by Momingstar rankings will indicate if these cash flows arejustified by subsequent relative performance.

As evidence of the importatice of the Momingstar five-star rating service(where a five-star rating is the best and a one-star rating is the worst), considera recent study, reported in both the Boston Globe and The Wall Street Journal,^which found that 97% of the money flowing into no-load equity funds betweenJanuary and August 1995 was invested into funds that were rated as five- or four-star funds by Momingstar, while funds with less than three stars suffered a netoutflow of funds during the same period. Moreover, the heavy use of Momingstarratings in mutual fund advertising suggests that mutual fund companies believethat investors care about Momingstar ratings. Indeed, in some cases, the onlymention of retum performance in the mutual fund advertisement is the Mom-ingstar rating. Finally, the importance of Momingstar ratings has been under-scored by some recent high-profile publications (e.g., Blume (1998) and Sharpe(1998)) that have investigated the underlying properties of the Momingstar ratingsystem.

Despite its importance, there is, to our knowledge, only one extant academicstudy on the predictive abilities of the Momingstar ratings service. Khorana andNelling (1998) examine the question of persistence of the Momingstar ratingsand compare the Momingstar ratings from a group of funds in December 1992to the June 1995 ratings of those same funds. Although they find evidence ofpersistence, in that high-rated funds are still high rated and low-rated funds arestill low rated, there are a number of problems with the study. First, there is asurvivorship bias problem, since the funds were selected at the end of the sampleperiod rather than at the beginning. Hence, any fund that had merged, liquidated,or changed its name between the beginning and ending of the sample period wasnot included. Second, because Momingstar uses a 10-year risk-adjusted retumas a major component of its ratings, and because there are only two and one-half years of data between the beginning and end of the Khorana and Nellingsample, the ratings are based on overlapping data. Consequently, the findings ofpersistence in the ratings are endemic to the data. Finally, Khorana and Nellingonly examine performance persistence as measured by Momingstar ratings; theydo not examine how well Momingstar ratings predict more standard measures ofperformance.^

In this paper, we examine whether the Momingstar five-star system has anypredictive power for the future performance of funds. Our data and methodologyare sensitive to the following key issues in mutual fund research.

^Charles Jaffe, "Rating the Raters: Flaws Found in Each Service," Boston Globe, Aug. 27, 1995,p. 78. The same survey was also reported by Karen Damato, "Momingstar Edges Toward One-YearRatings," The Wall Street Journal, April 5, 1996.

^It should be noted that Momingstar reports an in-house study, conducted by Laura Lallos in 1997,in which 45% of the five-star funds in 1987 receive five stars in 1997. However, no other comparisonsare provided and few details of the study are reported.

Page 3: Morningstar ratings and fund performance blake morey

Blake and Morey 453

i) We use a mutual fund data set generated at the time the funds were actuallyrated by Momingstar and then follow the out-of-sample performance ofthese funds. This methodology allows us to circumvent the well-knownsurvivorship bias problem described by Brown, Goetzmann, Ibbotson, andRoss (1992), Elton, Gruber, and Blake (1996b), and others.

ii) We adjust retums for front-end and deferred loads, because the Momingstarrating system also adjusts for loads.

iii) We compare the predictive abilities of the Momingstar ratings with those ofaltemative predictors: in-sample historical average monthly retums, one-and four-index in-sample alphas, and in-sample Sharpe (1966) ratios.

iv) We examine out-of-sample one-, three-, and five-year horizons, so that wecan give both short- and long-term analyses of the predictive abilities ofMomingstar ratings and the altemative predictors. Moreover, these timehorizons are consistent with the historical retums that prospective investorsare often provided when considering a mutual fund.

v) We examine the predictive abilities of the Momingstar ratings and the alter-native predictors at different times to observe how well they predict in upand down markets.

vi) We address the issue of performance predictability by separating domesticequity funds according to investment style (i.e., aggressive growth, equity-income, growth, growth-income, and small company funds) at the time theywere rated (see Brown (1999), Brown and Goetzmann (1997), Elton, Gm-ber. Das, and Hlavka (1993), and Goetzmann and Ibbotson (1994)).

vii) We explore whether the age of a fund affects performance predictability byseparating funds into young (three to five years), middle-aged (five to 10years), and seasoned (10 years or more) groups.

viii) We measure out-of-sample performance using several well-known perfor-mance metrics, including the Sharpe ratio, mean monthly excess retums, amodified version of Jensen's alpha (1968), and a four-index alpha (Elton,Gruber, and Blake (1996a)).

ix) We analyze the results using parametric and non-parametric tests.

The paper is organized as follows. Section II describes the data and relateshow the funds were chosen, how Momingstar calculates their ratings, and howthe retums data were collected and calculated. Section III describes the method-ology of the paper. Section IV presents the Momingstar rating results. Section Vpresents the altemative predictor results, and Section VI provides the conclusion.

II. Data

A. Sample Groups and Fund Selection Criteria

We examine two broad sample groups in this study. For simplicity, we termthese samples seasoned funds 1992-1997 and complete funds 1993.

Page 4: Morningstar ratings and fund performance blake morey

454 Journal of Financial and Quantitative Analysis

1. Seasoned Funds 1992-1997

For the first sample group, we use the beginning-of-the-year MomingstarOn-Disk or Principia programs from 1992 to 1997 to select mutual funds.** Weuse the beginning-of-the-year disks to simplify the data so that we always examinecalendar years. Moreover, we start at the beginning of the year 1992 since thiscorresponds to the first beginning-of-the-year On-Disk program.'

By using the actual Momingstar disks, we know all the funds available toinvestors selecting funds based on Momingstar ratings at the time of the Mom-ingstar evaluation, thus circumventing any possible survivorship bias problems.Data prior to the beginning of the On-Disk program are available from Mom-ingstar on a proprietary basis, however, these data include only the survivingfunds. Funds that were rated at the time of the Momingstar rating that weremerged or liquidated at some later date are not available.^ Since the use of suchdata would introduce a severe survivorship bias, they are not used in our study.

From the beginning-of-the-year disks we then select funds based on three cri-teria. First, we select only "domestic equity" funds as identified by Momingstar's"Investment Class." From the domestic equity funds, we then select all fundswithin each of the following five Momingstar "investment objectives" (styles):aggressive growth, equity-income, growth, growth and income, and small com-pany. This allows us to examine whether there is a "style effect" on fund per-formance predictability. It is important to note that the designation for the "in-vestment objective" is determined by Momingstar, usually based on the wordingin the fund's prospectus. However, in some cases, Momingstar may give a fundan investment objective different from that implied by the fund's name or by thefund's prospectus if Momingstar determines that the fund invests in a way that isinconsistent with the wording in its prospectus.

Since we examine the out-of-sample performance, we also examine if thefunds retain their classifications by Momingstar in the out-of-sample periods. Inevery sample examined, at least 85% ofthe funds retain their style classification atthe end of the sample period.^ Hence, according to Momingstar, the vast majorityof funds do not change their style of management.

Second, each fund had to have at least 10 years of retums at the time itwas rated by Momingstar; e.g., funds rated by Momingstar in January 1993 hadto have retums data starting from, at the latest, January 1983. We use the 10-year cutoff because the 10-year in-sample period utilizes Momingstar's base-linerating system. As stated earlier, Momingstar provides each mutual fund with aone- to five-star summary rating. To obtain this summary rating, Momingstartakes a weighted average of the three-, five-, and 10-year risk-adjusted retums,where the weights are 20% on the three-year retum, 30% on the five-year retum.

••These corre.spond to the January 1992 On-Disk, January 1993 On-Disk, January 1994 On-Disk,January 1995 On-Disk, January 1996 On-Disk, and the January 1997 Principia. (In October 1996On-Disk changed to Principia.)

'The On-Disks begin in October 1991.*We thank Peter Carrillo of Momingstar for this point.^To obtain this percentage, we examined only the funds in the sample that did not merge or liqui-

date during the out-of-sample period.

Page 5: Morningstar ratings and fund performance blake morey

Blake and Morey 455

and 50% on the 10-year retum. Due to its importance in Momingstar's rankings,we used the 10-year time period as a criterion in selecting funds.

Third, funds had to be open at the time they were rated by Momingstar. Anyfund that was closed to new investors at the time of the rating was excluded fromour analysis to maintain a sample of funds that could actually be invested in at thetime of the ratings.*

2. Complete 1993 Sample

The above sample group excludes many Momingstar-rated funds simply be-cause of our criterion that funds must have 10 years of in-sample data at the timethey are rated. While their base-line rating system uses a combination of thethree-, five-, and 10-year retums, Momingstar still rates funds with less than 10years of retums. So long as a fund has at least three years of retums history, it canreceive a summary star rating from Momingstar. Funds with three-five years ofretums history are rated using a system that puts a 100% weighting on their three-year past performance; funds with five-10 years of retums history are rated usinga system that puts a 40% weighting on the three-year retum and a 60% weightingon the five-year retum.

By excluding younger funds, we miss out on another interesting aspect ofthe Momingstar rating system. Since younger funds are rated only on short-termretums (i.e., the three-year retum) whereas older funds are rated on a combinationof the three-, five-, and 10-year retums, the younger fund ratings are particularlysensitive to the overall performance of the market. In a bull market (as in thelate 1990s), young equity funds could receive higher ratings not because of bettershort-term performance, but rather because the rating system only evaluates themduring a time when the market is doing exceptionally well.^ This could alter thepredictive ability of the ratings.

The problem with including younger funds in our first sample group is thatthe number of funds to examine is onerous. As a tradeoff, we create anothersample in which we include virtually all open aggressive growth, equity-income,growth, growth-income, and small company funds that were rated by Momingstarin January 1993.'" By using the 1993 data, we only have to examine the out-of-sample performance of 635 funds as opposed to well over 2000 funds if wewere to use the 1996 or 1997 On-Disk/Principia programs. Furthermore, by usingJanuary 1993 rated funds, we are still able to follow out-of-sample performanceout to five years.

In summary, the complete funds 1993 sample includes all open funds ratedby Momingstar and listed as aggressive growth, equity-income, growth, growth-income, or small company. Hence, this includes young funds (between three andfive years of retum history at the time they were rated), middle-aged funds (be-

*The number of closed funds that met our other criteria was as follows: January 1992 sample:11 funds; January 1993 sample: 11 funds; January 1994 sample: 19 funds; January 1995 sample: 19funds; January 1996 sample: 28 funds; January 1997 sample: 37 funds.

'Blume (1998), in a study utilizing only 1996 data, provides some evidence that there is a relativelyhigh percentage of young funds that are classified as five-star or one-star funds.

'"Twenty-four funds met our other criteria, yet were listed a.s closed funds in January 1993; weexcluded them from the sample.

Page 6: Morningstar ratings and fund performance blake morey

456 Journal of Financial and Quantitative Analysis

tween five and 10 years of historical retums at the time they were rated), andseasoned funds (10 or more years of retums at the time they were rated).

As with the seasoned funds samples, we also examine the number of fundsthat change their investment objective in the out-of-sample periods. Similar to theseasoned funds sample, at least 85% of the funds do not change their Momingstarinvestment objective over the course of the out-of-sample period."

B. Problem Funds

To examine the out-of-sample forecast ability of Momingstar ratings, weobtain the out-of-sample monthly retums of these funds. For a majority of thefunds, obtaining the out-of-sample retums is simply a matter of following thepreviously rated fund. However, because a minority of funds have either gonethrough a name change, a merger, a combination of both, or because they haveliquidated, identifying out-of-sample retums for those funds is more complicated.We describe how we handle these problematic funds.

We use the Momingstar data'^ and The Wall Street Journal to identify thename changes. We then simply use the newly named fund's retums as the out-of-sample retums.

For the merged funds, we used the Momingstar data and The Wall StreetJournal to ascertain the month of the fund merger. However, when these twosources did not provide the necessary information, we called the individual mu-tual fund companies. Once the merger month was identified, we then collectedthe out-of-sample retums by the following procedure. First, until the fund merges,we simply use the out-of-sample retums of the fund in question. After the fundmerges into its partner fund, we assume the investor randomly reinvests into oneof the other surviving funds with the same investment objective as the mergedfund in our sample. Hence, the out-of-sample retums from the merger monthonward are equally-weighted monthly averages of the retums of all the other sur-viving funds in our sample with the same investment objective as the mergedfund. The assumption of random reinvestment into any surviving fund regardlessof its ranking may at first blush seem irrational, given that investors should prefersuperior funds. But we are examining Momingstar predictability, not only for su-perior, but also for inferior performance. Forcing random reinvestment into onlyhigh-ranked funds could bias the predictability results. Furthermore, an investormay be interested in using Momingstar rankings not only for investment in supe-rior funds, but also for avoiding investment in inferior funds. (Nevertheless, wedid examine the results obtained by assuming an investor randomly chose a sur-viving fund from those rated only three stars or better; the results were virtuallyidentical.)'^

" As with our seasoned funds sample, we examined only the funds in the sample that did not mergeor liquidate during the out-of-sample period to obtain this percentage.

'^The Momingstar On-Disk and Principia disks (after 1993) each provide a list of funds that haverecently undergone name changes, mergers, and liquidations.

'^An additional issue in using our "random reinvestment" assumption is that an investor may incurcapital gains or losses when closing out an investment in a merged fund and then investing in anotherfund, inducing a tax effect. However, even for surviving funds, potential tax effects exist becausefunds must by law distribute all income distributions and intemally generated capital gains to their

Page 7: Morningstar ratings and fund performance blake morey

Blake and Morey 457

For the liquidated funds, we first determined when the fund was liquidated.Again, this information was obtained from Momingstar or The Wall Street Jour-nal. As with the merged funds, from the month of liquidation and onward, weassume the investor randomly re-invests in the current sample of funds with thesame investment objective as the merged fund.

C. Momingstar Ratings

To calculate its ratings, Momingstar first classifies funds into one of fourcategories: domestic equity, foreign equity, municipal bond, and taxable bond.'''The ratings are then based upon an aggregation of the three-, five-, and 10-yearrisk-adjusted retum for funds with 10 years or more of retum history, three- andfive-year risk-adjusted retums for funds with five to less than 10 years of retumdata, and three-year risk-adjusted retums for funds with three to less than fiveyears of retum data. To calculate the risk-adjusted retum, Momingstar first calcu-lates a load-adjusted retum for the fund by adjusting the retums for expenses suchas 12b-l fees, management fees, and other costs automatically taken out of thefund, and then by adjusting for front-end and deferred loads.'^ Next, Momingstarcalculates a "Momingstar retum" in which the expense- and load-adjusted excessis retum divided by the higher of two variables: the excess average retum of thefund category (domestic stock, intemational stock, taxable bond, or municipalbond) or the average 90-day U.S. T-bill rate,

(Expense- and Load-Adjusted Retum on the Fund - T-Bill)max[{Average Category Retum - T-Bill), T-Bill]

Momingstar divides through by one of these two variables to prevent distortionscaused by having low or negative average excess retums in the denominator ofequation (1). Such a situation might occur in a protracted down market.'^

investors, and, even if an investor chooses to automatically reinvest a distribution back into the fund,it is a taxable event (unles.s the fund is held in a tax-sheltered plan such as an IRA or Keogh plan).This study (and most other extant studies on fund performance) analyzes pre-tax retums, so any taxeffects of our reinvestment assumption do not affect our results. An altemative approach that does nothave the potential tax effect of our reinvestment approach for merged funds is the "follow-the-money"approach introduced in Elton, Gruber, and Blake (1996b), where a merged fund's retums are splicedto its "merge partner" fund's retums to form a complete time series. (The investor essentially staysinvested without having to close out and then reinvest.) But because of the way we calculate our out-of-sample performance alpha.s for disappearing funds (see Section III for details), we would require acomplete in-sample time series of retums for the "merge partner" fund, and in some cases the partnerfund did not exist long enough to obtain such a series.

'••originally Momingstar used only three categories: domestic equity, municipal bond, and taxablebond. The foreign equity funds were placed in the domestic equity category. The foreign equity cat-egory was started in 1996. The category definitions we describe in this paper are from Momingstar's1998 manual.

"Blume (1998), pp. 4-5, provides an excellent description of how Momingstar accounts for loadsin the Momingstar retums. The load adjustment process is the following. Assume L is the loadadjustment. If there is no load of any type, then L is equal to 1. If there is a load, L is less than one;e.g., a 4% front-end load would make L equal to 0.96. The load-adjusted retum is then equal to thereturn of the fund times L Note that the front-end load is always assumed to the be the maximumpossible load. The deferred load adjustment is reduced as the holding period is increased. In SectionII, we explain in more detail how we adjust the retum data for loads.

'^Principia Manual, p. 97.

Page 8: Morningstar ratings and fund performance blake morey

458 Journal of Financial and Quantitative Analysis

Momingstar then calculates a "Momingstar risk" measure, which is calcu-lated differently from traditional risk measures, such as beta and standard devia-tion that both see greater-than and less-than-expected retums as added volatility.Momingstar believes that most investors' greatest fear is losing money, whichMomingstar defines as underperforming the risk-free rate of retum an investorcan eam from the 90-day Treasury bill. Hence, their risk measure only focuseson downside risk.'' To calculate risk, Momingstar plots monthly retums in re-lation to T-bill retums, adds up the amounts by which the fund trails the T-billretum each month, and then divides that total by the time horizon's total numberof months. This number, the average monthly underperformance statistic, is thencompared with those of other funds in the same broad investment category to as-sign the risk scores. The resultant Momingstar risk score expresses how risky thefund is relative to the average fund in its category.'^

To calculate a fund's summary star rating, Momingstar calculates the three-,five-, and 10-year Momingstar retum and risk. For each time horizon, the Mom-ingstar risk scores are then subtracted from the Momingstar retum scores. Thethree numbers (one for each time horizon) are then given subjective weights."The three-year number receives a 20% weighting, the five-year a 30% weighting,and the 10-year a 50% weighting. As stated above, in the case of young funds(funds with three to less than five years of retum data), the three-year number re-ceives a 100% weighting; in the case of middle-aged funds (funds with five to lessthan 10 years of retum data), the three-year number receives a 40% weighting andthe five-year number receives a 60% weighting. With these weights, Momingstarcalculates the weighted average of the numbers. The resulting number is thenplotted along a bell curve to determine the fund's star rating. If the fund scoresin the top 10% of its broad investment category, it receives a rating of five stars;if the fund falls in the next 22.5%, it receives four stars; if it falls in the middle35%, it receives three stars; if it lies in the next 22.5%, the fund receives two stars,and if it is in the bottom 10%, it receives one star. Momingstar, with a few minorexceptions, has used this same summary rating system throughout its history.^°

Several features of the data should be noted.^'

i) For the seasoned fund subsamples, the number of funds in each samplegrows. This is not surprising, since with each year the number of fundsthat meet the criteria grow.

'^Focusing only on downside risk is neither unique to Momingstar nor new; it was explored byMarkowitz (1959) and incorporated into an asset-pricing model by Bawa and Lindenberg (1977).

"principia Manual, p. 98."Morey and Morey (1999) present a methodology that endogenously determines these weights.^"The Momingstar technical staff verified this point. See Blume (1998), p. 3, for more on this

issue.^' Detailed tables describing the breakdowns and averages of star ratings, loads, styles, and ages for

all of our data samples are available from the authors upon request.

Page 9: Morningstar ratings and fund performance blake morey

Blake and Morey 459

ii) There are more five-star funds than one-star funds and the average star ratingof each sample is above three. This skewness in the ratings of the sample in-dicates that aggressive growth, equity-income, growth, growth-income, andsmall company funds with 10 years or more of retums performed slightlybetter than other funds in the Momingstar domestic equity category.^^

iii) The standard deviation of the ratings is about the same in each sample, in-dicating that the distribution of the ratings does not differ much from onesample to another.

iv) For the load-funds, most have front-end loads and relatively few have de-ferred loads.

v) Most of the funds are grouped within the growth and growth-income invest-ment objectives.

When we examined the average star ratings by style, we found that, for mostof our sample years, aggressive growth and small company funds have fewerfunds and lower averages than the other styles. Moreover, in many samples, theaggressive growth, equity-income, and small company styles have few, if any,funds in the lowest or highest star categories. Also, the standard deviations areabout the same in each subsample within each investment style, with the notableexception being the 1993 aggressive growth subsample.

For the complete fund 1993 sample, as with the seasoned funds sample, theaverage star rating is above three and there are relatively more five-star funds thanthere are one-star funds. The average rating exceeding three stars is a result ofother investment objectives being grouped into the domestic equity category (seefootnote 22). Other interesting features ofthe complete fund 1993 sample follow.

i) Most of the funds are in the seasoned and middle-aged category; only 14%of the funds are young funds.

ii) More than one-half of the funds are load funds. Again, loads seem to be animportant factor to consider.

iii) As with the seasoned funds sample, most of the funds are clustered in thegrowth and growth-income styles.

iv) Aggressive growth funds fare worse than the other investment objectives interms of star ratings.

v) There are no substantial differences in the average star ratings of young,middle-aged, and seasoned funds, yet the respective distributions are quitedifferent. In fact, there are no one-star young funds. As stated above, youngfunds receive their stars based upon the past three years of retums, so it ispossible that the three years prior to January 1993 did not drive any newfunds into the bottom rating category.

higher average star ratings could be due to seasoned funds performing slightly better, orit could be a result of other investment objectives (styles), besides those used in this study, beinggrouped into the domestic equity category. These other investment objectives include domestic hybridfunds, convertible bond funds, fund.s termed by Momingstar to be miscellaneous funds, and evenintemational funds up until 1996. Blume (1998) has documented that these other investment objectivefunds generally have lower performance and are rated lower than the Aggressive growth, equity-income, growth, growth-income, and small company funds.

Page 10: Morningstar ratings and fund performance blake morey

460 Journal of Financial and Quantitative Analysis

D. Momingstar Scores

Since January 1994, Momingstar has provided the three-, five-, and 10-yearMomingstar retum and risk numbers for all the mutual funds it evaluates. Withthat information, plus the subjective weights (20%, 30%, and 50% for the three-,five-, and 10-year horizons), we can calculate the resultant scores and numeri-cally rank the funds evaluated here. These scores then allow us to conduct non-parametric rank correlation tests. (Since the data are not provided before 1994,we do not conduct these tests either for the seasoned 1992 and 1993 samples, orfor the complete funds 1993 sample.)

E. Alternative Predictors

We compare the Momingstar rankings and scores with those of four alter-native predictors. Each altemative predictor is calculated during the in-sampleperiod just prior to fund selection, either during the 10-year period prior to theout-of-sample evaluation periods when examining the seasoned funds 1992-1997sample, or during the three-year period prior to the out-of-sample evaluation peri-ods when examining the complete funds 1993 sample only. For a naive predictor,we use the fund's average monthly in-sample retum. The second altemative pre-dictor we use is the in-sample Sharpe ratio.

(2) Sharpe, =

where /?,• - Rp is the mean excess (net ofthe 30-day T-bill rate) monthly retum forthe rth mutual fund during the in-sample period, and cr, is the standard deviationof the excess monthly retums for the ith mutual fund during the in-sample period.

For two additional altemative predictors, we use Jensen single-index andfour-index alphas, obtained from the following time-series regression model.

(3) Ri, = ai-^

where

Ri, = the excess total retum (net of the 30-day T-bill retum) for fund i inin-sample month f,

a,- = the alpha for fund i, used as a performance predictor;

Pik = the sensitivity of fund i's excess retum to index k;

Ik, = the retum for index k in in-sample month t; and

£,-, = the random error for fund / in in-sample month t.

For Jensen alphas, K—\ and h, = the excess total retum ofthe S&P 500 in monthf. For the four-index alphas, K = 4, I\, = the excess total retum of the S&P 500in month t, I2, = the excess total retum of the Lehman Aggregate Bond Index inmonth t, h, = the difference in retum between a small-cap and large-cap stockportfolio based on Prudential Bache indexes in month t, and U, = the difference

Page 11: Morningstar ratings and fund performance blake morey

Blake and Morey 461

in retum between a growth and value stock portfolio based on Prudential Bacheindexes in month t.^^ We utilize the four-index model because, as Elton, Gruber,and Blake (1996a) show, this model provides better risk adjustment for mutualfunds than the single-index model.

R Out-of-Sample Evaiuation Periods

When evaluating performance, investors are typically presented with theone-, three-, five-, and (if available) 10-year past performance windows. Sim-ilarly, we use one-, three-, and five-year periods to examine the out-of-sampleforecasting ability of Momingstar's ratings. (The 10-year window is outside thebounds of our sample.) This provides 12 subsamples for performance evaluationfor our seasoned funds 1992-1997 sample and three additional subsamples forour complete funds 1993 sample.^''

G. Returns Data and Load Adjustments

For the out-of-sample and in-sample retums used in the altemative predic-tors, the data consist of monthly retums from the Momingstar On-Disk and Prin-cipia programs. These retums data are adjusted for management, administrative,12b-l fees, and other costs automatically taken out of fund assets. However, un-like the Momingstar risk-adjusted ratings, the monthly retum data do not adjustfor sales charges such as front-end and deferred loads.^' Consequently, if we usethe monthly retum data for the out-of-sample retums, the retums on load fundswill be overstated.

Very little attention in the mutual fund performance literature is given to thetreatment of loads in retum data. Although some authors (e.g., Gruber (1996))have presented results separately for load and no-load funds, most studies (e.g.,Hendricks, Patel, and Zeckhauser (1993), Elton, Gruber, and Blake (1996a),Malkiel (1995), and Carhart (1997)) provide no direct adjustment for loads intheir retums data. Loads may be important, especially in this paper since theMomingstar ratings encompass load-adjusted retums, but the question is how todeal with them. Should one use front-end loads, deferred loads, or both? Whenand for how long should one apply the load? What if the mutual fund has reducedits load over time (especially the deferred load)? Should one use an average loadadjustment for each month or an annualized load? If one decides to use an annu-alized load, what interest rate should one use to discount the load factor?

In light of these issues, we adjust the monthly retums of each mutual fundusing an approach similar to Rea and Reid (1998). For both front-end and deferredloads, we consider an investor who buys and holds the load shares for a fixednumber of months, i.e., 12 months (one year), 36 months (three years), or 60months (five years). For front-end loads, the investor buying the fund pays the

^'See Elton, Gruber, and Blake (1996a) for a detailed description ofthe Prudential Bache portfoliosused in the four-index model.

^"•A detailed table showing the number of funds, number of merged and liquidated funds, andnumber of funds that changed their Momingstar styles for all out-of-sample periods and both samplegroups is available from the authors upon request.

2'Principia Manual (1998), p. 107.

Page 12: Morningstar ratings and fund performance blake morey

462 Journal of Financial and Quantitative Analysis

load in a lump sum at the time of purchase. To spread the front-end load acrossthe period that the shares are held, we use Rea and Reid's assumption that theinvestor borrows the amount necessary to pay the load up front and then repaysthe loan as an annuity in equal, monthly installments during the holding period.Hence, the monthly load adjustment reflects the amount that was borrowed andthe interest on the loan. Mathematically, our front-end load adjustment process is

(4) r =

7=1

where r is the monthly interest rate (the monthly geometric average of the one-,three-, or five-year Treasury yield over the holding period),/ is the front-end load(expressed as a percent), h is the number of months the fund is held, and/"* is themonthly front-end load adjustment. Hence, the front-end load adjusted retumsare

(5) Ri, — Kit -J ,

where /?„ is the monthly retum of fund i in month t and Rj^^ is the monthlyfront-end-load-adjusted retum of fund i in month f.

As an example of the adjustment, consider a one-year investment in Fi-delity's Magellan fund starting in January 1992 when Magellan had a front-endload of 3%, and the one-year Treasury yield was 3.84%, giving a monthly aver-age rate of 0.31%. Therefore, for the one-year holding (out-of-sample) period,/= 3%, r = 0.0031, and /i = 12, giving/"" = 0.255%. We then subtract 0.255%from each of the Magellan fund's 12 monthly retums during 1992 to obtain theload-adjusted retums.

For the deferred-load adjustment, the process is slightly different in the factthat the payment of the deferred load does not occur until the end of the hold-ing period. To convert the deferred load into a monthly payment, the investor isassumed to have prepaid the load in equal monthly installments. The amount ofthe monthly prepayment reflects the deferred load less the interest eamed on theprepayments. Thus, the equation for the monthly deferred-load adjustment is

(6)

7=1

where d is the deferred load (expressed as a percent) and d"" is the monthlydeferred-load adjustment. Hence, the deferred-load-adjusted retums are

(7) R°^^ = Rit-d"",

where /?,-, is the monthly retum of fund i in month t and R^^'^ is the monthlydeferred-load-adjusted retum of fund i in month t.

As with the front-end loads, we use the monthly geometric average of theone-, three-, or five-year Treasury yield over the holding period for the interest

Page 13: Morningstar ratings and fund performance blake morey

Blake and Morey 463

rate. However, in contrast to the front-end load adjustment, we reduce the amountof the deferred load as the holding period, h, increases. We do this because Mom-ingstar also reduces the deferred load as the holding period increases. Hence, fora holding period of 12 months, the full amount of the deferred load is imposed.For the 36-month holding period, we apply only half of the original deferred loadand in the 60-month holding period, the deferred load completely disappears.

III. Methodology

To measure out-of-sample performance we use four performance metrics. Toexamine out-of-sample performance of the Momingstar ratings and the altemativepredictors, we use two methods: dummy variable regression analysis and the non-parametric Spearman-Rho rank correlation test.

A. Out-of-Sample Performance Measurement

We use four performance metrics from the existing performance literature tomeasure out-of-sample performance: the Sharpe (1966) ratio, the mean monthlyexcess retum, a modified version of Jensen's (1968) alpha, and a modified versionof a four-index alpha (Elton, Gmber, and Blake (1996a)). For each performancemetric, we examine both non-load-adjusted and load-adjusted versions. We reportresults only for the load-adjusted Sharpe ratio, the load-adjusted mean monthlyexcess retum, the non-load-adjusted modified Jensen alpha, and the non-load-adjusted modified four-index alpha.^*

The load-adjusted Sharpe ratio for fund i is

(8) Sharpe, =

where Rf^ — RF is the mean excess (net of the 30-day T-bill rate) load-adjustedmonthly retum for the z'th mutual fund during the evaluation (out-of-sample) pe-riod and CT/ is the standard deviation of the excess load-adjusted monthly retumsfor the Ith mutual fund during the evaluation period. The non-load-adjustedSharpe ratio is essentially the same as equation (2), except that it uses the out-of-sample period. The load-adjusted mean monthly excess retum is R]-^ - Rf.The non-load-adjusted mean monthly excess retum is Ri - RF.

The non-load-adjusted modified Jensen and four-index alphas are calculatedusing a methodology similar to Elton, Gruber, and Blake (1996a). Specifically, foreach seasoned funds subsample, we utilize a time-series period of monthly non-load-adjusted retums going back 10 years from the selection date and forward tothe end of the out-of-sample evaluation period to obtain an estimate of the inter-cept from either the single-index or four-index model regression (equation (3)).For our complete funds 1993 sample group, we utilize a time-series period of

^^The results for the metrics that are not reported in this paper, i.e., those for the non-toad-adjustedSharpe ratio, the non-load-adjusted mean monthly excess retum, the load-adjusted modified Jensenalpha and the load-adjusted modified four-index alpha, are essentially the same as their load/non-loadcounterparts. These results are available from the authors upon request.

Page 14: Morningstar ratings and fund performance blake morey

464 Journal of Financial and Quantitative Analysis

monthly non-load-adjusted retums going back three years from the selection dateand forward to the end of the out-of-sample evaluation period to obtain an esti-mate of the intercept from either the single-index or four-index model regression(equation (3)).

To obtain the alphas, we add the average monthly residual during the evalu-ation period to the intercept. For example, to obtain a modified Jensen alpha fora seasoned fund's one-year out-of-sample performance measure in the 1992 sub-sample, we mn the one-index model on monthly retums starting in January 1982and ending in December 1992 (11 years) to obtain an estimate of the intercept.We then add the average of the fund's residuals during the one year after the se-lection date (the evaluation period) to the estimated intercept to obtain the fund'smodified Jensen alpha.

To obtain alphas for funds that merged or liquidated during the evaluationperiod, we first run two regressions: i) a regression using the fund's retums goingback either 10 or three years from the selection date and ending in the month priorto the fund's disappearance, and ii) a regression run over the entire regression pe-riod using the retums on an equally-weighted portfolio formed each month fromthe existing funds in the sample. We then form a weighted average of: i) the fund'sestimated intercept plus the fund's average residual during the time it survived inthe evaluation period, and ii) the estimated intercept plus the average residual dur-ing the remaining time in the evaluation period of the equally-weighted portfolio,where the fund's weight is the fraction of the evaluation period it survived andthe equally-weighted portfolio's weight is the remaining fraction. This provides aperformance measure for an investor who buys a remaining fund in the sample atrandom if the original fund merges or liquidates.

For the load-adjusted modified Jensen and four-index alphas, we do not useload-adjusted retums, since we use both out-of-sample and in-sample data forthese measures. We could apply loads to the in-sample data, however, doing sowould raise a number of problems. First, the loads during the in-sample periodmay be quite different from those in the out-of-sample period. Second, and moreimportantly, it is not clear how we should deal with loads before an investor ownsa fund. Again, our assumption in this paper is that the investor selects the fundsat the time they are rated by Momingstar. Moreover, our load adjustment dependsupon how long the investor holds the fund. If we were to assume that the investoralready owned the fund before the out-of-sample period started, and paid loadsduring the in-sample period, it would be difficult to determine the correct load toassess for the out-of-sample period.

As an altemative, we adjust the single-index and four-index alphas for loadsby using an added (0,1) dummy variable in equation (9), where 1 = load fundsand 0 = no-load funds.

B. Dummy Variable Regression Anaiysis

The first method we use to examine out-of-sample predictive performance isa cross-sectional dummy variable regression analysis that allows us to examinethe Momingstar star ranking group differences in performance predictability. Tomake the results for the altemative predictors comparable to those for the Mom-

Page 15: Morningstar ratings and fund performance blake morey

Blake and Morey 465

ingstar star groups, we divide the funds into five subgroups after ranking themin descending order by each of their altemative predictors. These five altema-tive predictor subgroups are not quintiles, since we wanted to preserve the samenumber of funds in each altemative predictor subgroup as we have in each ofthe five Momingstar star groups. As an example, consider our January 1992 sea-soned fund subsamples. The same 263 funds are in each of these subsamples: 18five-star funds, 93 four-star funds. 111 three-star funds, 33 two-star funds, and 8one-star funds. Therefore, for our 1992 seasoned fund subsamples, for any oneof our altemative predictors, group five has 18 funds with the highest altemativepredictor, group four has the next highest 93 funds, etc.

We estimate the following equation for each of the 12 subsamples for theseasoned funds and for each of the three subsamples for the 1993 complete set,

(9) Si = jo-i-'yiD4i + j2D3i + j3D2i + j4Dli + Ui,

where:

5, = out-of-sample performance metric for fund ;, i.e., the load-adjustedSharpe ratio, the load-adjusted mean monthly retum, the non-load ad-justed single index alpha, the non-load adjusted four-index alpha;

D4 — 1 if a four-star fund or if in altemative predictor group four, 0 if not;

Z)3 = 1 if a three-star fund or if in altemative predictor group three, 0 if not;

D2 = 1 if a two-star fund or if in altemative predictor group two, 0 if not;

Dl = 1 if a one-star fund or if in altemative predictor group one, 0 if not;

J = 1 through A , where N is the total number of funds in the subsample.

In equation (9), the five-star fund group or the altemative predictor groupfive is the reference group for the dummy variable regression.^^ Hence, whenusing the load-adjusted Sharpe ratio as the out-of-sample performance measure,the coefficient 70 represents the expected load-adjusted Sharpe ratio when all thedummy variables are equal to 0, and coefficients 71 through 74 represent the dif-ferences between the dummy variables and the reference group. The f-statisticson the coefficients provide a test of the significance of the difference between anindividual dummy group and the reference group.

We use the five-star funds or altemative predictor group five as a referencegroup because they provide a ceiling from which we can compare the performanceof the lower group funds. If the star ratings or altemative predictors accuratelyforecast out-of-sample performance, we should see increasingly negative (andsignificant) coefficients as we move from 71 to 74.

C. Spearman-Rho Rani< Correlation Test

As a final test, we use the two-tailed Spearman-Rho rank correlation test toexamine the rank correlations of both the Momingstar scores and the altemative

^'We also performed all of the dummy variable regressions using the three-star funds or the alter-native predictor group 3 as the reference group. The results, which did not change when using thisreference group, are available from the authors.

Page 16: Morningstar ratings and fund performance blake morey

466 Journal of Financial and Quantitative Analysis

predictors with the out-of-sample performance measures. Since Momingstar pro-vides the data to rank the funds beginning in 1994, we only examine this test forsamples that begin in 1994 or later. The Spearman-Rho has a null hypothesis ofno correlation between the two rankings and is a non-parametric test.

For this test, we follow the methodology of Elton, Gruber, and Blake (1996a).For each fund in the sample, we examine the four different out-of-sample mea-sures: the (load-adjusted) Sharpe ratios, the (load-adjusted) mean monthly excessretums, the Jensen alphas, and the four-index alphas. We first sort all the fundsin descending order by either their in-sample Momingstar scores or, in the caseof the altemative predictors, by their in-sample predictor's performance. Next,we organize the data into deciles and compute the average for each decile. Ourgoal is then to examine whether the decile ranking given by either the Mom-ingstar scores or by the altemative predictors corresponds to the decile rankingsof the four out-of-sample performance measures. If the Momingstar system orthe altemative predictors forecast well out-of-sample, then there should be closecorrelation between the in-sample and the out-of-sample rankings.

IV. Momingstar Rating Results

We present the predictive ability of the Momingstar ratings in two broad sec-tions. First, we report the results using the 1992-1997 seasoned funds subsam-ples. In Section IV.A, we discuss the dummy variable results for the overall sam-ples, the dummy variable results for the samples organized by style groups, theSpearman-Rho rank correlation results for the overall samples, and the Spearman-Rho rank correlation test for the samples organized by style groups. In SectionIV.B, we report the results of the complete funds 1993 sample. All the regres-sions in Section IV were tested for heteroskedasticity using the White (1980) test.None of the regression residuals exhibited evidence of heteroskedasticity at the10% level.

A. 1992-1997 Seasoned Funds Sample Results

In this subsection, we discuss the dummy variable regression results on ourseasoned funds groups, both for the overall samples and for the samples brokenfurther down into style subgroups. For the overall samples, we do not examinethe mean monthly retum results, since funds with different styles will likely havedifferent mean monthly retums.

1. Dummy Variable Regression Analysis on Overall Seasoned Fund Samples

The Load-Adjusted Sharpe Ratio. Table 1 presents the dummy variable regres-sion analysis in which we examine how well the Momingstar stars predict out-of-sample fund performance, as measured by the load-adjusted Sharpe ratio, forthe overall seasoned fund samples. First, the Sharpe ratio results in Table 1 showthat the 70 coefficients, the constants in the dummy variable regressions, differfrom sample to sample. The 1992 constant is close to zero and insignificant, the1994 constant is well below zero and significant, and the 1993 and 1995-1997constants are all positive and significant. These results indicate that the reference

Page 17: Morningstar ratings and fund performance blake morey

Blake and Morey 467

group (the five-star funds) performs quite differetitly in different years. The up-and-down performance of the five-star Sharpe ratios is consistent with the perfor-mance of the S&P 500 index's mean monthly excess retums. Second, the resultsshow that the four- and three-star funds do not diverge from the five-star fundsin terms of out-of-sample performance. Only three of the 24 coefficients (71 and72 for the 12 samples) are significant, indicating no significant difference in out-of-sample performance of median- and top-rated funds. In many cases, even thesigns on the coefficients are the opposite of what one would expect. Third, thereis some evidence that the Momingstar ratings predict the low-performing funds.The 73 and 74 coefficients are generally negative and significant (12 of the 24 73and 74 coefficients), indicating that the performance of one- and two-star funds issignificantly worse than the five-star funds. Fourth, the R^ and F-statistic valuesfor the samples differ dramatically—the 1992 one-year sample has an R^ of 0.02while the 1997 one-year sample has an/?^ of 0.17.

Sample

1992

1993

1994

1995

1996

1997

Three-Year1992

1993

1994

1995

Five-year1992

1993

Dummy Variable

To (constant)

0.06(1.09)

0.26*(4.13)

.-0.18*(3.92)

0.69-(9.29)

0.25*(8.19)

0.39-(12.69)

0.01(0.07)

0.28*(8.99)

0.26"(8.87)

0.42*(12.61)

0.24*(9.91)

0.34*(12.65)

Tl (<.star)

0.11(1.76)

-0.03(0.42)

-0.05(1.02)

0.14*(1.77)

0.05(1.51)

-0.03(0.91)

0.05(1.50)

-0.03(1.02)

-0.01(0.01)

0.03(0.74)

0.01(0.18)

-0.04(1.31)

TABLE

Regressions

T2 (3-star)

0.07(1.12)

-0.05(0.78)

-0.05(1.02)

0.10(1.37)

0.02(0.55)

-0.07*(2.23)

0.04(1.36)

-0 .06(1.70)

-0.02(0.61)

0.01(0.34)

-0.01(0.32)

-0.06*(2.06)

1

Using Momingstar Stars

T3 (2-sta/)

0.06(0.78)

-0.11(1.51)

-0.09(1.69)

-0.03(0.35)

-0.03(0.92)

-0.15*(4.36)

0.06(1.84)

-0.14*(3.66)

-0.08*(2.23)

-0.04(1.10)

-0.01(0.31)

-0.13*(4.03)

~l« ('-star)

0.01(0.10)

-0.21(1.40)

-0.32*(3.74)

-0.42*(3.51)

-0.15*(2.67)

-0.35*(6.72)

0.01(0.21)

-0.12(1.57)

- 0 3 2 *(5.68)

-0.26*(4.79)

- 0 . 1 1 *(2.50)

-0.08(1.25)

0.02

0.02

0.05

0.11

0.06

0.17

0.02

0.08

0.15

0.12

0.03

0.08

F-Stal.

1.13

1.27

4.11*

10.31*

5.82*

20.28*

1.01

5.51*

12.88*

11.66*

2.30

6.09*

Sample: Funds wiin 10 years or more of in-sample returns (Seasoned Funds 1992-1997 Sample Group).Out-ol-Sample Perlormar^ce Measure: Load-Adjusted Sharpe Ratio,(-statistics are in parentheses.*indicates significance at the 5% level.

The Modified Jensen Alpha and Four-Index Alpha. Results for the modifiedJensen and four-index alphas continue to demonstrate the same pattems as theSharpe ratio: little, if any, significant difference between the five-, four-, andthree-star rated funds (with the 1993 five-year sample providing the only evidenceof significance in the right direction), some evidence of negative and significant

Page 18: Morningstar ratings and fund performance blake morey

468 Journal of Financial and Quantitative Analysis

differences between the low-rated and the five-star funds, and wide swings in theconstant and R^ values. In addition, the one- and four-index models show that, inmost cases, the five-star funds have negative (and sometimes significant) alphas(the 70 coefficient).^^

2. Dummy Variable Regression Analysis on Samples Organized by StyleGroups

We also examined the ability of the Momingstar stars to predict out-of-sample performance when the samples are broken into the five style groups. Theresults are very similar across out-of-sample performance measures and also tothe dummy variable analysis on the unbroken sample.^'

First, there is very little ability to predict significant negative differencesbetween the five-, four-, and three-star funds. In fact, for the out-of-sample load-adjusted mean monthly retum, 33 of the 60 coefficients for 72 (the three-star fund)are positive. Second, the growth and growth-income styles ratings show someability to predict low-performing funds. However, this result does not extend tothe other styles: the low ratings of aggressive growth, equity-income, and smallcompany funds show relatively little ability to detect significant differences inout-of-sample performance. Third, there are vast differences in the constant termacross styles and samples. For example, using the 1994 one-year sample, thesmall company five-star funds post a solid gain, while every other style shows anegative value for the constant.

The results for the aggressive growth and equity income styles, and to alesser extent for small company funds, should be interpreted carefully since thereare relatively few of these funds in the seasoned funds samples. In fact, for anumber of samples, the equity-income style does not have a single one-star fund.The small sample may be the reason that the stars for growth and growth-incomefunds predict low future performance better than the other styles.

3. Spearman-Rho Rank Correlation Tests for Overall Seasoned Funds Samples

Table 2 displays the Spearman-Rho rank correlation test results for the over-all seasoned fund samples. For each of the six samples in which we have avail-able Momingstar scores. Table 2 shows the Spearman-Rho rank correlations ofthe in-sample Momingstar scores with the out-of-sample decile averages of theperformance measures across all 10 deciles, and across both the top five decilesand the bottom five deciles. The results show the same basic pattem found inthe dummy regression analysis on the overall sample: the low scores predict poorfuture performance and the high scores have, at best, only mixed ability to pre-dict future performance. In examining the rank correlation coefficients on all 10deciles, several performance measures are relatively well correlated with the in-sample Momingstar scores. In fact, in four of the six samples for the Sharpe

^^Tables on the re.sulLs for the modified Jen.sen and four-index alpha.s are available from the authors.AI.SO, .since the samples are not divided by investment objective, we do not report the load-adjustedmean monthly retum results in this section. Funds with different styles will likely have different meanmonthly returns. In the sections in which we organize .samples into their respective style groups, weuse the load-adjusted mean monthly retum as one of the out-of-sample measures.

^'Detailed tables of the resulLs for the out-of-sample performance measures when the samples arebroken into style groups are available upon request.

Page 19: Morningstar ratings and fund performance blake morey

Blake and Morey 469

ratio, one of the six samples for the Jensen single-index alpha, and two of thesix samples for the four-index alpha, we cannot reject the null hypothesis of nocorrelation in the rankings at the 95% confidence level. However, examining thecorrelation coefficients of the top five and bottom five deciles, we see that overallrank correlation results are largely based on the ability ofthe low scores to predictpoor future performance. In most cases, the correlation coefficients for the bottomfive decile are much larger than those for the top five decile. Generally, the rankcorrelation coefficients for the top five deciles are actually negative, indicatingthat high scores do not accurately predict superior future performance.

TABLE 2

Spearman-Rho Rank Correlations Using Decile Averages of In-Sample Momingstar Scoreswith Decile Averages of Out-of-Sample Performance Measures

Sample

One-Year1994

1995

1996

1997

Three-Year1994

1995

Deciles

AllTop 5Bottom 5

AllTop 5Bottom 5

AllTop 5Bottom 5

AllTop 5Bottom 5

AllTop 5Bottom 5

AllTop 5Bottom 5

Load-AdjustedSharpe Ratio

0.806*-0.200

0.600

0.430-0.700

1.000*

0.758*-0.400

0.900*

0.927*0.5000.900*

0.685*-0.100

0.700

0.491-1.000*

1.000*

Rank Correlation

Non-Load-AdjustedJensen Alpha

0.600-0.700

0.600

0.467-0.500

0.700

0.370-0.900*

0.700

0.648*-0.700

1.000*

0.673-0.200

0.700

0.382-0.600

1.000*

Non-Load-Adjusted4-lndex Alpha

0.806*-0.400

0.800

0.648*0.6000.300

0.079-0.900*

0.300

-0.297-0.900*

0.700

0.442-0.700

0.600

0.261-0.100

0.400

All funds have 10 or more years of In-sample returns (seasoned funds 1992-1997 group).*indicates significance at the 5% level.

4. Spearman-Rho Rank Correlation Tests for Samples Organized by StyleGroups

We also examined the results of the Spearman-Rho rank correlation testsfor the samples when broken into their respective style groups.-'" As with thedummy variable results for the samples broken into style groups, we examinedthe out-of-sample load-adjusted mean monthly retum, the load-adjusted Sharperatio, the non-load-adjusted single-index alpha, and the non-load-adjusted four-index alpha. The results mirror the dummy variable results when the samplesare organized by style. For the aggressive growth, equity-income, and, to a lesserextent, the small company sample groups, we do not see much positive correlationbetween the Momingstar scores and the out-of-sample metrics. This is true for therank tests ofthe overall 10 deciles, the top five deciles, and the bottom five deciles.

^"A detailed table containing the resulLs of that examination is available from the authors.

Page 20: Morningstar ratings and fund performance blake morey

470 Journal of Financial and Quantitative Analysis

Again, the reason for this may be that the sample sizes are not large.-" However,with the growth and growth-income samples, we see the pattem suggested by theoverall Spearman-rho rank correlations tests in Table 2: low correlations usingthe top five deciles and higher correlations using the bottom five deciles. In fact,in the growth fund sample, every bottom five decile rank correlation is higher invalue than the top five decile rank correlation. The results again suggest that theMomingstar scores are weak in terms of predicting high future performance andyet have some ability to predict underperforming funds.

B. Complete Funds 1993 Sample Results

1. Dummy Variable Regression Analysis on the Overall Complete Funds 1993Samples

Table 3 presents the results from the dummy variable regressions for ouroverall complete funds 1993 sample and shows the same pattems detected in theseasoned funds overall sample results. First, there is a relatively strong ability topredict low-performing funds, especially in the longer out-of-sample terms. Ofthe 18 coefficients for 73 (two-star) and 74 (one-star), 15 are negative and sig-nificant, indicating that low-rated funds do perform significantly worse in termsof risk-adjusted performance. Second, there is only weak ability to predict high-performing funds. Only two of the nine coefficients for 72 (three-star) and zeroof the nine coefficients for 71 (four-star) are negative and significant. In fact,only five of the nine 71 (four-star) coefficients have the "correct" negative sign.Third, the Momingstar stars do a slightly better job of predicting out-of-sampleperformance when using the four-index alpha.

2. Dummy Variable Regressions for Samples Organized by Age

Panels A-C of Table 4 present the results for the dummy variable regressionsin which we use samples organized by age. Table 4, Panel A reports the resultsfor young funds (three to less than five years of in-sample retums); Panel B re-ports the middle-aged funds (five to less than 10 years of in-sample retums); PanelC reports the seasoned funds (10 or more years of in-sample retums). There isevidence of an ability to predict poor future performance, especially among sea-soned and middle-aged funds. Using the four-index alpha for the middle-agedfunds shows that the Momingstar stars have a strong ability to predict weak per-formance, as most of the coefficients for the lower rated funds are negative andstrongly significant. Among the young funds, we do not see much evidence ofability to predict weak performance, but this is probably because there are noone-star funds in the young funds sample group and, in general, relatively fewyoung funds in the sample.

In predicting high-performing funds, the Momingstar stars are, at best, mildlysuccessful. In the middle-aged and seasoned fund subsamples, only five of the 18coefficients for 72 (three-star) are significant and negative, yet two of the 18 are

' 'For the equity-income sample, we did not perform the test over many samples since there werenot enough observations to create the deciles. We required that there be at least 20 ob.servations .sothat each decile would have at least two observations.

Page 21: Morningstar ratings and fund performance blake morey

Blake and Morey 471

Sample

Dummy Variable 1

To (constant) T1 (4-star)

TABLE 3

Regressions Using Momingstar Stars

72 (3-star)

Oul-oi-Sample Perlormahce Measure: Load-Adjusted Sharpe Ratio

1 year

3 year

0.25*(6.98)

0.26*(16.48)

0.01(0.36)

-0.01(0.25)

-0 .03(0.66)

-0.02(1.14)

T3 (2-star)

-0.08(172)

-0.08*(3.79)

5 year 0.28* 0.02 0.01 -0.05*(20.86) (1.10) (0.49) (2.66)

Out-ol-Sample Perlormahce Measure: Non-Load-Adjusted Jehsen Ihdex Alpha

1 year

3 year

5 year

0.26*(3.37)

-0.10*(2.30)

- 0 . 2 1 *(5.45)

-0.01(0.01)

0.02(0.35)

0.07(1.70)

-0 .05(0.57)

-0.04(0.76)

0.03(0.65)

-0.10(1.04)

-0.16*(2.97)

-0.10*(1.94)

Out-ol-Sample Perlormahce Measure: Noh-Load-Adjusted 4-lhdex Alpha

1 year

3 year

5 year

0.13*(2.00)

0.05(1.18)

0.06"(1.99)

-0.02(0.23)

-0.06(1.20)

-0.04(1.25)

-0.04(0.59)

-0.10*(2.22)

-0.09*(2.64)

-0.20*(2.36)

-0.20*(3.67)

-0.15*(3.88)

tA (1-slar)

-0.07(0.70)

-0.13*(2.72)

-0.08(1.94)

0.01(0.01)

-0.34*(2.67)

-0.26*(2.29)

-0.58*(2.96)

- 0 . 4 1 *(3.26)

- 0 . 3 1 *(3.35)

0.01

0.05

0.08

0.01

0.04

0.04

0.07

0.04

0.04

F-Stat.

2.14

7.90*

6.00*

0.59

6.72*

7.31*

4.47*

6.20*

7.05*

Sample: All funds from the complete funds 1993 sample group (635 funds),/-statistics are In parentheses.*indicates significance at tfie 5% level.

significant and positive, indicating that the three-star funds perform better out-of-sample than the five-star funds. Among the young funds, many of the coefficientshave the predicted negative signs, yet there is little ability to detect significantlydifferent performance between median and high-rated funds.

3. Dummy Variable Regressions for Samples Organized by Style

We also examined the results from the dummy variable regressions when thesamples are organized by style.^^ The results for low-performing funds are verysimilar to the seasoned fund samples' results when the samples are organized bystyle. Only in the growth and growth-income styles is there a relatively strongability to predict low performance, as many of the coefficients for 73 (two-star)and 74 (one-star) are negative and significant, particularly in the longer out-of-sample periods.

In predicting high-performing funds, the small company, aggressive growth,and equity income funds do not demonstrate much ability. However, for thegrowth and growth-income funds, there is evidence of ability to predict winningfunds. For both the growth and growth-income subsamples, almost all coefficientsshow the postulated negative sign, and many of the coefficients are significant forthe growth-income subsample. Of course, the success of these subsamples maybe largely related to the sample period. In our earlier analysis of the seasonedfunds samples, the 1993 subsamples provide some ofthe strongest support (albeit

'^A detailed table of those resulLs is available from the authors.

Page 22: Morningstar ratings and fund performance blake morey

472 Journal of Financial an(d Quantitative Analysis

TABLE 4

Dummy Variable Regressions Using Momingstar Stars Organized by Age

Sample/Age To (constant) Vi (4-star) Tg (3-star)

Pane/ A. Youhg Funds (less thah 5 years ol irt-sampte returns)

Out-ol-Sample Performance MeasureOne-year

Three-year

Five-year

0.33-(3.33)

0.28*(7.27)

0.29*(8.39)

.• Load-Adjusted Sharpe Ratio-0.05(0.44)

-0.08(1.31)

-0.01(0.15)

-0.04(0.40)

0.01(0.02)

0.03(0.88)

TS (2-star)

-0.25*(1.99)

-0.06(1.31)

- 0 0 1(0.06)

Out-of-Sample Performance Measure: Non-Load-Adjusted Jensen AlphaOne-year

Three-year

Five-year

0.42-(1.99)

-0.04(0.46)

-0.15(1.79)

Out-of-Sample Performance fi^easure.One-year

Three-year

Five-year

0.24(1.43)

0.09(0.99)

0.12(1.75)

-0.16(0.61)

-0.10(0.92)

-0.01(0.03)

-0.18(0.75)

-0.01(0.01)

0.07(0.74)

-0.31(1.14)

-0.07(0.63)

0.07(0.64)

: Noh-Load-Adjusted 4-lnciex Alpha-0.13(0.64)

-0.14(1.30)

-0.09(1.10)

-0.11(0.60)

-0.11(1.12)

-0.14(1.89)

-0.35(1.63)

-0.17(1.52)

-0.13(1.56)

'l'4(1-star)

NA

NA

NA

NA

NA

NA

NA

NA

NA

Panel B. Middle-Aged Funds (greater than 5 and less than 10 years ol in-sample returns)

Out-of-Sample Performance Measure.One-year

Three-year

Five-year

0.21'(4.45)

0.24-(12.38)

0.25*(15.14)

Out-of-Sample Performanoe Measure.One-year

Three-year

Five-year

0.23*(2.23)

-0.13*(2.39)

-0.30*(5.73)

.• Load-Adjusted Sharpe Ratio0.08(1.42)

0.03(1.44)

0.05*(2.74)

0.01(0.05)

-0.01(0.01)

0.04*(2.06)

0.01(0.22)

-0.04(1.46)

-0.01(0.52)

: Non-Load-Adjusted Jensen Alpha0.C7(0.57)

0.07(1.08)

0.17*(2.86)

-0.03(0.23)

-0.01(0.24)

0.13*(2.25)

-0.09(0.66)

-0.11(1.53)

-0.04(0.62)

Out'Of-Sampfe Performance Measure: Non-Load-Adjusled 4-lndex AlphaOne-year

Three-year

Five-year

0.13(1.36)

0.08(1.42)

0.07(177)

0.02(0.15)

-0.04(0.65)

-0.01(0.12)

-0.08(0.73)

-0.14*(2.25)

-0.10*(2.37)

-0.24(1.90)

-0.20*(2.77)

-0.16*(3.10)

0.05(0.33)

-0.14*(2.49)

-0.09(1.77)

0.10(0.34)

-0.40*(2.53)

-0.27(1.79)

-0.49(1.82)

-0.47*(2.98)

-0.35*(3.08)

R2

0.07

0.06

0.03

0.02

0.02

0.02

0.04

0.03

0.04

0.02

0.07

0.08

0.01

006

009

0.04

0.07

0.09

F-Stat.

2.06

1.97

0.89

0.44

0.72

0.50

1.10

0.83

1.34

1.24

5.09*

6.25*

0.69

4.42*

7.03*

2.53*

4.86*

7.11*

(continued on next page)

not that strong) for Momingstar stars predicting high-performing funds, but it isquestionable whether these results would carry over to other sample periods.^^

C. Load/No Load Counterparts for Out-of-Sample Data

All results in Section IV were calculated using the load/no-load counterpartsof the out-of-sample performance measures, i.e., non-load-adjusted Sharpe ra-tios, non-load-adjusted mean monthly excess retums, and load-adjusted (using a

1993 seasoned fund sample.? show more predictability than most other samples whetherusing the Momingstar stars or the altemative predictors (see Section V).

Page 23: Morningstar ratings and fund performance blake morey

Blake an(d Morey 473

TABLE 4 (continued)

Dummy Variabie Regressions Using Momingstar Stars Organized by Age

Sample/Age To (constant) T1 (4-star) T2 (3-star)

Panel C. Seasoned Funds (10 years or more of in-sample returns)

Out-of-Sample Performance Measure: Load-Adjusted Sharpe RatioOne-year

Three-year

Five-year

0.26*(4 13)

028*(8.97)

0.34*(12.61)

-0.03(0.42)

-0.03(1.02)

-0.04(1.30)

-0.05(0.78)

- 0 0 6(1.69)

-0.06*(2.04)

T3 (2-star)

-0.11(1.48)

-0.14*(3.62)

-0.13*(4.01)

Out-of-Sample Performance Measure: Non-Load-Adjusted Jensen Index AlphaOne-year

Three-year

Five-year

0.20(1.45)

-0.08(0.86)

-0 .09(1.18)

0.01(0.10)

-0.01(0.09)

-0.06(0.71)

0.01(0.04)

-0.09(0.96)

-0.15(1.85)

-0.01(0.01)

-0.28*(2.61)

-0.29*(3.22)

Out-of-Sample Performance Measure: Non-Load-Adjusted 4-lndex AlphaOne-year

Three-year

Five-year

0.07(0.58)

-0 .03(0.32)

0.01(0.13)

0.02(0.16)

-0.01(0.11)

-0 .04(0.55)

0.05(0.37)

-0.03(0.29)

-0.03(0.46)

-0 .06(0.44)

-0.18(1.71)

-0.14(1.77)

T4(i-star)

-0.19(1.28)

-0.11(1.38)

-0.08(1.22)

-0.04(0.12)

-0.25(1.13)

-0.27(1.46)

-0.62*(2.17)

-0.28(1.32)

-0.21(1.35)

FT

0.02

0.07

0.08

0.01

0.06

0.07

0.03

0.03

0.03

F-Stat.

1.15

5.28*

6.00*

0.01

4.18*

5.03*

1.88

2.08

1.70

All from complete funds 1993 sample group,t-statistics are in parentheses.*indicates significance at the 5% level.

dummy variable for loads in equation (9)) modified Jensen and four-index alphas.The results were generally the same as those reported above and are availableupon request.

V. Alternative Predictor Resuits

The results so far indicate that Momingstar ratings do not generally pre-dict superior fund performance but do have some predictive power for poor-performing funds. Can an investor do as well by choosing funds based on al-temative predictors?-''' To answer this question, we examine a naive predictor thatuses in-sample mean monthly retums, an in-sample Sharpe ratio, an in-samplesingle-index alpha, and an in-sample four-index alpha.

As with the Momingstar star and score tests, we use the altemative predictorson both the seasoned funds 1992-1997 sample group and the complete funds 1993sample group. Hence, again we have two different sets of results. Since presentingall of the results for each altemative predictor would result in an unwieldy numberof additional tables, we summarize our results in Tables 5-8 for the seasonedfunds 1992-1997 sample and Tables 9-12 for the complete funds 1993 sample.All the regression results reported in Section V were tested for heteroskedasticityusing the White (1980) test. None of the regression residuals exhibited evidenceof heteroskedasticity at the 10% ^

^We thank Stephen Brown for suggesting an examination of that question.•"Section V's results are primarily for the overall samples. Except in Table 12, we do not report

the results for the sample.s that are organized by .style or age. The altemative predictor re.sulLs for the

Page 24: Morningstar ratings and fund performance blake morey

474 Journal of Financial and Quantitative Analysis

A. 1992-1997 Seasoned Funds Sample Results

1. Dummy Variable Regression Results

We rank the funds based on the in-sample altemative predictor and then putthem into five groups that match the number of funds in each of the five Mom-ingstar star groups. This way we can construct the same dummy variable regres-sion analysis we used for the Momingstar stars.

Although there are nominally four altemative predictors, we actually havefive since we use two variants for the naive predictor. The first variant allocatesthe rankings on the basis ofthe in-sample mean monthly retums: if there were 15five-star funds and 25 four-star funds, the highest 15 funds according to their in-sample mean monthly retum would receive five's and the next 25 would receivefour's. In the second variant, we first examine how many Momingstar stars aregiven within each style group and then rank order the funds by their in-samplemean monthly retum within their various style groups. For example, in the 1992sample there are 24 aggressive growth funds of which zero funds received fivestars, five funds received four stars, eight funds received three stars, six fundsreceived two stars, and five funds received one star. Hence, we would rank the24 aggressive growth funds by their in-sample mean monthly retum and then givethe top five aggressive growth funds four's, the next eight aggressive growth fundsthree's, etc. This way we use mean monthly retums as an altemative predictor andyet can still be sensitive to style differences.

For each of the five altemative predictors, equation (9) is then estimated forthe 12 samples and for the three different out-of-sample performance measures.Hence, for each altemative predictor we calculate results that are similar in formto those in Table 1. ^

Table 5 summarizes the significance level results. The left-hand column re-ports the number of times out of 144 coefficients (four coefficients, 12 samples,and three out-of-sample performance measures) that the predictor produces a sig-nificantly negative coefficient for 71, 72, 73, or 74. The next column reports thenumber of times that the predictor produces a significantly positive coefficient for7i. 72. 73, or 74. Hence, high numbers in the first column indicate a considerableamount of predictive ability for the predictor and high numbers in the second col-umn indicate that the predictor is not very successful. The other columns indicatewhich of the coefficients, 71, 72,73, or 74, are significantly negative.

Table 5's results show several interesting findings. First, the Momingstarstars are in the middle in terms of predicting future performance. The naive pre-dictor that uses the styles and the single-index alpha predictor have very similarpredictive performance to the Momingstar stars. The naive predictor, in whichno adjustment is made for styles, and the four-index alpha generally do worse,and the Sharpe ratio does considerably better than the Momingstar stars. Second,for every predictor, including the Momingstar stars, the ability to predict high-performing funds is quite weak, yet the ability to predict low-performing fundsis quite high. This result is consistent with those found in some other studies

samples organized by style and age were generally similar to those presented in Section IV. They areavailable from the authors.

' 'We summarize the.se results in tables available from the authors.

Page 25: Morningstar ratings and fund performance blake morey

Blake ancj Morey 475

TABLE 5

Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to ForecastOut-of-Sample Performance—Significance Levels

Predictor

10-yearmeanreturns;starsallocatedby style

10-yearmeanreturns; noadjustmentfor styles

10-yearSharperatio

10-yearsingle-indexalpha

10-year4-indexalpha

fvlorningstar

# of times(out of 144)

the predictorproduces asignificantly*

negativecoefficientfor 7, .72.T3.Or74

35

39

49

36

27

39

U of times(out of 144)

the predictorproduces a

significantly*positive

coefficientfor 7 , . 72.T3. ° ' T4

14

38

1

7

48

10

* cf cases(out of 36)

thecoefficient.

T1 (4-star) • issignificant

and negative

1

5

3

0

1

0

# of cases(out of 36)

thecoefficient.

T2 (3-star). issignificant

and negative

2

5

8

3

2

4

# of cases(out of 36)

thecoefficient.

T3 (2-star). issignificant

and negative

8

7

9

6

5

13

# of cases(out of 36)

thecoefficient.

T4(1-star).iSsignificant

and negative

24

22

29

27

19

22

Sampfes Examined: The 12 Seasoned Fund Samples (not broken up into style categories), i.e.. 1992 (1-. 3-. and 5-year).1993 (1-. 3-. and 5-year). 1994 (1- and 3-year). 1995 (1- and 3-year). 1996 (1-year), and 1997 (1-year).

Out-of-Sample Metrics Examined: Load-Adjusted Sharpe Ratio. Non-Load Adjusted Single-Index Alpha. Non-Load Ad-justed 4-lndex Alpha.

Test Examined: The alternative predictor is used to allocate the stars. The frequency of the stars is the same as withthe Morhingstar stars except that they are allocated on the basis of the alternative predictor rather than the Momingstarmethod. With these stars, we then examine equation (9). Hence, there are 36 equations estimated (12 samples. 3 out-of-sample performance metrics) of which each equation has 4 coefficients. 7 i . 72. T3. T4. (hot including the constant).Hence, there are 144 coefficients examined for each alternative predictor.

*Significant at the 10% level.

on performance predictability (see, e.g., Carhart (1997)) that show it is generallypossible to predict losers (but not winners) in terms of mutual fund performance.

Tables 6 and 7 complement Table 5. Table 6 provides information on wherethe negative and significant cases are located with respect to the out-of-samplemeasures. In general, the results are spread out relatively evenly among the threeout-of-sample performance measures.

Table 7 examines the relative coefficient signs instead of the significancelevels, and specifically reports the number of times that the coefficient sign forhighly rated funds is greater than that for funds that are two levels worse in termsof ratings. That is, it examines the number of cases (out of a total of 36) where

70 (5-star) > 72 (3-star)i7l (4-star) > 73 (2-star). Or 72 (3-star) > 74 (I-star)-

The results in Tables 6 and 7 are similar to those presented in the rest ofthe paper. First, on the basis of these coefficient signs, the Momingstar stars donot illustrate significantly better predictive ability than the other predictors. The

Page 26: Morningstar ratings and fund performance blake morey

476 Journal of Financial anti Quantitative Analysis

TABLE 6

Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to ForecastOut-of-Sample Performance—Significance Levels Organized by

Out-of-Sample Performance Measure

Predictor

10-year mean returns;stars allocated by style

10-year mean returns; noadjustment for styles

10-year Sharpe ratio

10-year single-index alpha

10-year 4-index alpha

Momingstar

# of times out of 48(12samples; 4 coefficients)

the predictor produces asignificantly* negative

coefficient for 7 i . 72.73.or 74. using the

load-adjusted Sharpe ratioas the cut-of sampleperformance metric

9

7

17

11

5

15

# of times out of 48 (12samples; 4 coefficients)the predictor produces a

significantly* negativecoefficient for 71. 72. 73.

or 74. using thenon-ioad-adjusted

single-index alpha as theout-of-sample

performance metric

11

9

21

15

5

13

# of times out of 48 (12samples; 4 coefficients)

the predictor produces asignificantly* negative

coefficient for 71. 72.73.or 74. using the

non-load-adjusted 4-indexalpha as the out-of sample

performance metric

15

23

11

10

17

11

Samples Examined: Same as Table 5.Out-of-Sample Metrics Examined: Same as Table 5.Test Examined: Same as Table 5.

*Significant at the 10% level.

TABLE 7

Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to ForecastOut-of-Sample Performance—Coefficient Signs

Predictor

# of cases out of 36 # of cases out of 36(12 samples; 3 out-of-sample (12 samples; 3 out-of-sample

performance metrics) performance metrics)in which the coefficient in which the coefficient

signs are such that: signs are such that;To (6-star) > T2 (3-star) T1 (4-star) > ^3 (2-star)

# of cases out of 36(12 samples; 3 out-of-sample

performance metrics)in which the coefficient

signs are such that:T2 (3-star) > T4 (i-star)

10-year mean returns;stars allocated by style

10-year mean returns;no adjustment for styles

10-year Sharpe ratio10-year single-index alpha10-year 4-index alpha

Momingstar

14(2)

14(5)30(9)20(3)10(2)

18(4)

28 (8)

21 (7)33(18)35(16)20(13)

29(19)

34(27)

33 (28)35 (28)33 (30)35 (29)36 (24)

Samples Examined: Same as Table 5.Out-of-Sample Metrics Examined: Same as Table 5.Test Examined: Same as Table 5.

Parentheses indicate the number of the cases in which the coefficiehts were as indicated and the difference in the coeffi-cients was significant at the 10% level using a Wald Test.

Momingstar stars are again in the middle in terms of their success at predict-ing future performance. Second, all the predictors, regardless of what type, havemore ability to predict low-performing funds. In at least 90% of the cases, the72 (3-star) > 74 (1-star) condition is satisfied for every predictor. Third, all predic-tors, with the notable exception of the Sharpe ratio, have problems in predictinghigh-performing funds. For most predictors, the 70 (5-star) > 72 (3-star) conditionis satisfied 50% of the time or less.

Page 27: Morningstar ratings and fund performance blake morey

Blake and Morey 477

2. Spearman-Rho Rank Correlation Results

Table 8 summarizes the Spearman-Rho rank correlation results for the al-temative predictors.'^ The Spearman-Rho rank correlation tests are the same asthose in Section IV.A.3, except that we use the altemative predictors to rank thefunds instead of the Momingstar scores. Again we use the decile averages de-scribed in Section III.D. There are six samples and three out-of-sample perfor-mance metrics. Table 8's results show essentially the same findings as the dummyvariable results for the altemative predictors. The Momingstar scores are similarin predictive ability to other altemative predictors. All the predictors, with theexception of the Sharpe ratio, have much higher Spearman-Rho rank correlationsin the bottom five deciles than in the top five deciles, indicating that the predictorsforecast low-performing funds better than high-performing funds.

TABLE 8

Spearman-Rho Summary Results for Alternative Predictors and Momingstar Scores

#of cases (out of 18) #of cases (out of 18) # of cases (out of 18)in which the in which the in which the

Spearman-rho rank Spearman-rho rank Spearman-rho ranktest is greater test is greater test is greater

than 0.5 across than 0.5 across than 0.5 acrossPredictor all 10 deciles the top 5 deciles the bottom 5 deciles

10-year mean monthly returns; 2 1 5

no adjustment for styles

10-year Sharpe ratio 13 10 9

10-year single-index alpha 9 4 15

10-year 4-index alpha 5 0 5

Momingstar score 9 1 15

Samples Examined: Post 1993 Seasoned Fund Samples: 1994 (1- and 3-year). 1995 (1- and 3-year). 1996 (1-year), 1997(1-year).

Out-of-Sample Metrics Examined: Same as Table 5.

Test Examined: Spearman-Rho tests based on decile averages. We test all 10 deciles, the top 5 deciles, and the bottom5 deciles. Since there are 6 samples with 3 out-of-sample performance metrics, there are 18 total tests for each predictor

B. Complete Funds 1993 Sample Results

Dummy Variable Regression Analysis

For the complete funds 1993 sample, the altemative predictors are the sameas those used above except that we use three years of in-sample data because theyoung and middle-aged funds do not have the necessary 10 years of in-sampledata and all the funds must have a minimum of three years of historical retumsto be rated by Momingstar. The results for the altemative predictors using thecomplete funds 1993 sample are presented in Tables 9-12, which provide thesame kind of information that Tables 5-7 provide for the 1992-1997 seasonedfunds sample. However, the number of cases is much smaller since we only havethree samples (rather than 12). The results show that, unlike those from the sea-soned funds sample, the Momingstar star method does significantly better thanthe altemative predictors at predicting future performance. Table 9 reports that.

^'We do not use the predictor in which we allocate ranking.s using mean monthly retums by their.style because it is impossible to rank order all the funds using this method.

Page 28: Morningstar ratings and fund performance blake morey

478 Journal of Financial antd Quantitative Analysis

even though the altemative predictors have roughly the same number of signifi-cantly negative coefficients as the Momingstar stars, they generally produce manymore significantly positive coefficients. The Momingstar method may be supe-rior, since it does not produce nearly as many prediction errors.

TABLE 9

Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to ForecastOut-of-Sample Performance—Significance Levels

Predictor

3-year meanreturns;starsallocatedby style

3-year meanreturns; noadjustmentfor styles

3-yearSharperatio

3-yearsingle-indexalpha

3-year4-indexalpha

f^orningstar

# of times(out of 36)where thepredictor

produces asignificantly*

negativecoefficientfor 71. 72.T3.°^T4

16

19

18

19

20

17

# of times(out of 36)where thepredictor

produces asignificantly*

positivecoefficientfor 71 . 72.

6

7

3

6

4

1

# of cases(out of 9)where thecoefficient.

Tl (4-star). issignificant

and negative

5

4

5

4

4

0

# of cases(out of 9)where thecoefficient.

T2 (3-star). issignificant

and negative

3

4

3

4

4

2

# of cases(out of 9)where thecoefficient.

T3 (2-star). issignificant

and negative

3

4

3

4

4

8

# of cases(out of 9)where thecoefficient.

T4 (1-star). issignificant

and negative

5

7

7

7

8

7

*Significant at the 10% level.Samples Examined: The 3 Complete Funds 1993 Samples (All Ages included), i.e.. 1993 (1-. 3-. and 5-year).Out-of-Sample Metrics Examined: Load-Adjusted Sharpe Ratio. Non-Load Adjusted Singie-lndex Alpha. Non-Load Ad-justed Four-Index Alpha.7"es( Examined: The alternative predictor is used to allocate the stars. The frequency of the stars is the same as withthe fvlorningstar stars except that they are allocated on the basis of the alternative predictor rather than the Momingstarmethod. With these stars, we then examine equation (9). Hence, there are 9 equations estimated (3 samples. 3 out-of-sample performance metrics) of which each equatioh has 4 coefficients. 7 i . 72. T3. T4. ("ot including the constant).Hence, there are 36 coefficients examihed for each alternative predictor

Moreover, Table 10 shows the significant and negative coefficients generatedby the altemative predictors tend to be clustered when using the non-load-adjustedfour-index out-of-sample performance metric. The Momingstar stars, by contrast,have significantly negative coefficients spread more evenly across the three out-of-sample measures.

Table 11 further demonstrates the apparent success of the Momingstar starssystem. The Momingstar stars system produces coefficient signs that are in linewith what one would expect if they had predictive ability. On the other hand, thealtemative predictors do not have such strong results, particularly in the 70 (5-star) >72 (3-star) and 7l (4-star) > 73 (2-star) CaSeS.

A natural question arises at this stage: why does the Momingstar method farebetter against the altemative predictors in the complete fund 1993 sample, when

Page 29: Morningstar ratings and fund performance blake morey

Blake and Morey 479

TABLE 10

Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to ForecastOut-of-Sample Performance—Significance Levels Organized by

Out-of-Sample Performance Metric

Predictor

10-year mean returns;stars allocated by style

10-year mean returns; noadjustment for styles

10-year Sharpe ratio

10-year single-index aipha

10-year 4-index aipha

Mornihgstar

# of times out of 12(3samples; 4 coefficients)the predictor produces a

significantly* negativecoefficient for 71. 72. 73.

or 74. using theload-adjusted Sharpe ratio

as the out-of-sampleperformance metric

3

2

2

2

5

# of times out of 12(3samples; 4 coefficients)the predictor produces a

sighificantly* negativecoefficient for 71.72.73.

or 74. using thenon-load-adjusted

single-index alpha as theout-of-sample

performance metric

2

4

3

5

6

4

# of times out cf 12(3samples; 4 coefficients)the predictor produces a

significantly* negativecoefficient for 71 . 72. 73.

or 74, using thenon-ioad-adjusted 4-indexalpha as the out-of-sample

performance metric

11

12

12

12

12

8

Samples Examined: Same as Table 9.Out-of-Sampfe Metrics Examined: Same as Table 9.Test Examined: Same as Table 9.

•Significant at the 10% level.

TABLE 11

Summary of the Ability of the Alternative Predictor Stars and Momingstar Stars to ForecastOut-of-Sample Performance—Coefficient Signs

Predictor

3-year mean returns;stars allocated by style

3-year mean returns;no adjustment for styles

3-year Sharpe ratio

3-year single-index aipha

3-year 4-index alpha

f\/lorningstar

# of cases out of 9 (3samples; 3 out-of-sampleperformance metrics) in

which the coefficient signsare such that;

To (5-star) > T2 (3-slar)

4(3)

4(4)

5(3)

4(4)

4(4)

7(2)

it of cases out of 9 (3samples; 3 out-of-sampleperformance metrics) in

which the coefficient signsare such that;

Tl (4-star) > 73 (2-star)

3(0)

3(1)

3(2)

3(0)

4(1)

9(8)

# of cases out of 9 (3samples; 3 out-of-sampleperformance metrics) in

which the coefficient signsare such that;

T2 (3-star) > T4 (1-star)

8(6)

9(8)

9(8)

9(7)

9(7)

8(7)

Samples Examined: Same as Table 9.Out-of-Sample Metrics Examined: Same as Table 9.Test Examined: Same as Table 9.

Parentheses indicate the number of the cases in which the coefficients were as indicated and the differehce in the coeffi-cients was significant at the 10% level using a Wald Test.

its predictive abilities are very similar to the altemative predictors in the 1992-1997 seasoned funds sample? A possible answer is that the Momingstar stars arebased on up to 10 years of retum data—a fund that has 10 years of data or morewill be judged not only on its three-year retums, but also on its five- and 10-yearretums; a fund with more than five years of retum data will be judged on thethree- and five-year retums. However, our altemative predictors in the completefunds 1993 sample are all based on just three years of retum data. Hence, for

Page 30: Morningstar ratings and fund performance blake morey

480 Journal of Financial and Quantitative Analysis

the majority of the funds (545 out of 635), Momingstar uses more information toallocate their stars than our altemative predictors.

TABLE 12

Comparison of Momingstar Stars Against an Altemative Predictor Organized by AgeYoung and Seasoned Funds

Predictor

# of times out of36 (3 samples; 3

out-of-sampleperformance

metrics; 4coefficients)where thepredictor

produces asignificantly*

negativecoefficient for 71.

T2.T3.or74.

# of times out of36 (3 samples; 3

out-of-sampleperformance

metrics; 4coefficients)where thepredictor

produces asignificantly*

positivecoefficient for 71.

72. 73. or 74.

# of cases out of9 (3 samples; 3cut-cf-sampleperformance

metrics) wherethe coefficiehtsigns are such

that:To (5-star) >

T2 (3-star)

# of cases out of9 (3 samples; 3out-of-sampleperformance

metrics) wherethe coefficientsigns are such

that:Tl (4-star) >

T3 (2-star)

# of cases out of9 (3 samples; 3out-of-sampleperformance

metrics) wherethe coefficientsigns are such

that:T2(3-siar) >

T4 (1-star)

Panel A. Young Funds (n = 90) (Funds with less thah 5 years of returhs as of Jah. 1993)

2 0 7(1) 6(0)3-year meanreturns; starsallocated bystyle'

r^orningstar Stars 2 0 6(1) 5(1)

Panel B. Seasoned Funds (n = 269) (Funds with 10 or more years of returns as of Jan. 1993)

3-year meanreturns; starsallocated bystyle^

10-year meanreturns; starsallocated bystyle"

Momingstar 10

2(0)

8(2)

7(3)

1(0)

7(6)

7(6)

NA

9(9)

9(5)

9(1)

*Significant at the 10% level.Samples Examined: Same as Table 9.Out-of-Sample Metrics Examined: Same as Table 9.Test Examined: Same as Table 9.

^Star allocation based on 635 funds of which only the young or seasoned funds are tested,

"star allocation based cn 269 funds (seasoned fund sample).

Parentheses indicate the humber of the cases in which the coefficients were as indicated and the differehce in the coeffi-cients was significant at the 10% level using a Wald Test.

To explore this issue further, we constructed Table 12. In Panel A (youngfunds), we examine the ability of the Momingstar stars and the three-year meanretum predictor that utilizes style differences to produce significantly negativecoefficients. When examining only young funds, the Momingstar stars do nothave an informational advantage since they use the same three years of retumdata history. Panel A, Table 12 shows clearly that there is very little difference inpredictive ability between the Momingstar stars and the three-year mean retumpredictor when examining only the young funds.

In Panel B (seasoned funds), we examine the predictive ability of the Mom-ingstar stars, the three-year mean retum predictor (in which we use three yearsof in-sample retum history), and the 10-year mean retum predictor (in which weuse 10 years of in-sample retum history). The results illustrate and support ourhypothesis. The altemative predictor that uses just three years of in-sample retumdata fares quite poorly relative to the Momingstar stars at predicting future per-

Page 31: Morningstar ratings and fund performance blake morey

Blake and Morey 481

formance. In nine of the 36 cases, it produces a significantly positive coefficientas compared to none for the Momingstar stars. However, when we compare theMomingstar stars to the altemative predictor that utilizes 10 years of in-sampleretum data, the results are quite similar. Hence, it appears that the superior abil-ity of the Momingstar stars reported in Tables 9-11 is related to the fact that theMomingstar stars use more information than the altemative predictors based onthree years of in-sample data.

C. Load/No Load Counterparts for Out-of-Sample Data

As mentioned in Section III.A, all results in Section V were calculated usingthe load/no load counterparts of the out-of-sample performance measures, i.e.,non-load-adjusted Sharpe ratios, non-load-adjusted mean monthly excess retums,and load-adjusted (using a dummy variable for loads in equation (9)) modifiedJensen and four-index alphas. The results were generally the same as those re-ported above and are available upon request.

VI. Conclusions

This paper investigates the degree to which the well-known Momingstar five-star rating system is a predictor of out-of-sample mutual fund performance. Thisis an important issue because several studies (e.g., Sirri and Tufano (1998) andGoetzmann and Peles (1997)) have shown that highly ranked funds attract thegreatest investor cash infiow. We use a data set, based on domestic equity mu-tual funds, that is free from survivorship bias, adjusted for load fees, and whichallows us to examine the predictive abilities of the rating system over differenttime horizons, periods, fund investment styles, fund ages, and with different out-of-sample performance metrics. We also compare the predictive abilities of theMomingstar rating system with those of altemative predictors: a naive predictorof in-sample historical average monthly retums, one- and four-index in-samplealphas, and in-sample Sharpe ratios.

Our investigation results in several main findings. First, Momingstar is ableto "predict" low-performing funds. Funds with less than three stars generally havemuch worse future performance than other groups. This result is relatively robustover different samples, ages of funds, styles of funds, out-of-sample performancemeasures, and whether load or non-load adjusted retums are used for the out-of-sample retums. Second, there is only weak statistical evidence that the five-star(highest-rated) funds outperform the four- and three-star funds (next-to-highestand median-rated funds). Again, these results are robust over different samples,ages, out-of-sample performance measures, load assumptions, and styles. Third,the Momingstar ratings, at best, do only slightly better than altemative predictorsin foretelling future fund performance. These altemative predictors include onesthat are relatively naive, such as those that use mean monthly retums, as well asSharpe ratios, and Jensen and four-index alphas. These results suggest that otherapproaches to developing predictors, such as the "style" approach (e.g.. Brownand Goetzmann (1997) and Sharpe (1992)), may be more informative.

Page 32: Morningstar ratings and fund performance blake morey

482 Journal of Finanoial and Quantitative Analysis

Our first two results are broadly consistent with much of the mutual fund per-formance persistence literature: while it is relatively easy to predict poor perfor-mance, it is much more difficult to predict superior performance. Our results alsosuggest that investors should be very cautious about associating a highly ratedfund with superior future performance. Although previous studies have shownthat highly rated funds attract the bulk of investor cash inflows, our results suggestthat those cash inflows are not necessarily justified by subsequent performance.

Finally, our results do not refute the Momingstar rating system. In almost allof their publications, Momingstar states that the star ratings are not predictors offuture performance, but rather "achievement" marks. Many investors and mutualfunds nevertheless use the ratings as indicators of future performance. Studiesshow that high Momingstar ratings are strongly related to large capital inflowsand are well used in marketing mutual funds to the public. In summary, thisresearch answers an important question that investors should ask: Do the starratings actually predict out-of-sample performance?

References

Bawa, V. S., and E. B. Lindenberg. "Capital Market Equilibrium in a Mean-Lower Partial MomentFramework." Journal of Financial Economics, 5 (1977), 189-200.

Blume, M. "An Anatomy of Momingstar Ratings." Financial Analysts Journal (March/April 1998),19-27.

Brown, S. J. "Mutual Fund Styles." Conference Proceedings from Computational Finance Confer-ence, New York Univ. (Jan. 6-8, 1999).

Brown, S. J., and W. Goetzmann. "Performance Persistence." Journal of Finance, 50 (1995), 679-698."Mutual Fund Styles." Journal of Financial Economics, 43 (1997), 373-399.

Brown, S. J.; W. M. Goetzmann; R. G. IbboLson; and S. Ross. "Survivorship Bias in PerformanceStudies." Review of Financial Studies, 5 (1992), 553-580.

Carhart, M. M. "On The Persistence Of Mutual Fund Performance." Journal of Finance, 52 (1997),57-82.

Damato, K. "Momingstar Edges Toward One-Year Ratings." Wall Street Journal (April 5, 1996), Cl.Elton, E. J.; M. J. Gruber; and C. R. Blake. "The Persistence of Risk-Adjusted Mutual Fund Perfor-

mance." Journal of Business, 69 (1996a), 133-157."Survivorship Bias and Mutual Fund Performance." Review of Financial

Studies, 9 (\9<)(ib), 1097-1120.Elton, E. J.; M. J. Gmber; S. Das; and M. Hlavka. "Efficiency with Costly Information: A Reinter-

pretation of Evidence fi-om Managed Portfolios." Review of Financial Studies, 6 (1993), 1-22.Goetzmann, W. N., and R. G. Ibbotson. "Do Winners Repeat?" Journal of Portfolio Management

(Winter 1994), 9-18.Goetzmann, W. N., and N. Peles. "Cognitive Dissonance and Mutual Fund Investors." Journal of

Financial Research, 20(1997), 145-158.Gmber, M. J. "Another Puzzle: The Growth in Actively Managed Mutual Funds." Journal of Finance,

5U1996), 783-810.Hendricks, D.; J. Patel; and R. Zeckhauser. "Hot Hands in Mutual Funds: Short-Run Persistence of

Relative Performance, 1974-1988." Journal of Finance, 48 (1993), 93-130.Jaffe, C. "Rating the Raters: Flaws Found in Each Service." Boston Globe (Aug. 27, 1995), 78.Jensen, M. "The Performance of Mutual Funds in the Period 1945-1964." Journal of Finance, 23

(1968), 389^16.Khorana, A., and E. Nelling. 'The Determinants and Predictive Ability of Mutual Fund Ratings." The

Journal of Investing (Nov. 1998), 61-^6.Lallos, L. Momingstar Newsletter. Chicago, IL: Momingstar Inc. (1997).Malkiel, B. G. "Retums from Investing in Equity Mutual Funds 1971 to 1991." Journal of Finance,

51 (1995), 783-810.Markowitz, H. M. Portfolio Selection: Efficient Diversification of Investments. Cambridge, MA: Basil

Blackwell, Inc. (1959).

Page 33: Morningstar ratings and fund performance blake morey

Blake and Morey 483

Morey, M. R., and R. C. Morey. "A Mutual Performance Appraisals: A Multi-Horizon Perspectivewith Endogenous Benchmarking." Omega: The International Journal of Management Science, 27(1999), 241-258.

Momingstar Principia Manual. Chicago, IL: Momingstar Publication.? (1998).Rea, J. D., and B. K. Reid. "Trend.? In The Ownership Cost of Equity Mutual Funds." Investment

Company Institute Perspective, 4 (Nov. 1998), 1-15.Sharpe, W. "Mutual Fund Performance." Journal of Business, 39 (1966), 119-138.

"Asset Allocation: Management Style and Performance Measurement." Jour-nal of Portfolio Management (Winter 1992), 7-19.

_. "Momingstar Performance Measure.?." Financial Analysts Journal (July/Aug.1998), 21-33.

Sirri, E. R., and P Tufano. "Costly Search and Mutual Fund Flows." Joumai of Finance, 53 (1998),1589-1622.

White, H. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedastic-ity." Econometrica, 48 (1980), 817-838.

Page 34: Morningstar ratings and fund performance blake morey