34
American Economic Association The Redesign of the Matching Market for American Physicians: Some Engineering Aspects of Economic Design Author(s): Alvin E. Roth and Elliott Peranson Source: The American Economic Review, Vol. 89, No. 4 (Sep., 1999), pp. 748-780 Published by: American Economic Association Stable URL: http://www.jstor.org/stable/117158 Accessed: 11/08/2009 07:52 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=aea. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected]. American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to The American Economic Review. http://www.jstor.org

American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

American Economic Association

The Redesign of the Matching Market for American Physicians: Some Engineering Aspects ofEconomic DesignAuthor(s): Alvin E. Roth and Elliott PeransonSource: The American Economic Review, Vol. 89, No. 4 (Sep., 1999), pp. 748-780Published by: American Economic AssociationStable URL: http://www.jstor.org/stable/117158Accessed: 11/08/2009 07:52

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unlessyou have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and youmay use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=aea.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with thescholarly community to preserve their work and the materials they rely upon, and to build a common research platform thatpromotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to TheAmerican Economic Review.

http://www.jstor.org

Page 2: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

The Redesign of the Matching Market for American Physicians: Some Engineering Aspects of Economic Design

By ALVIN E. ROTH AND ELLIOTT PERANSON*

We report on the design of the new clearinghouse adopted by the National Resident Matching Program, which annually fills approximately 20,000 jobs for new physi- cians. Because the market has complementarities between applicants and between positions, the theory of simple matching markets does not apply directly. However, computational experiments show the theory provides good approximations. Fur- thermore, the set of stable matchings, and the opportunities for strategic manipu- lation, are surprisingly small. A new kind of "core convergence" result explains this; that each applicant interviews only a small fraction of available positions is important. We also describe engineering aspects of the design process. (JEL C78, B41, J44)

The entry-level labor market for new physi- cians in the United States is organized via a centralized clearinghouse called the National Resident Matching Program (NRMP). Each year, approximately 20,000 jobs are filled in a process in which graduating physicians and other applicants interview at residency pro- grams throughout the country and then compose and submit Rank Order Lists (ROLs) to the NRMP, each indicating an applicant's prefer- ence ordering among the positions for which she has interviewed. Similarly, the residency programs submit ROLs of the applicants they have interviewed, along with the number of positions they wish to fill. The NRMP processes these ROLs and capacities to produce a match- ing of applicants to residency programs.

The clearinghouse used in this market dates from the early 1950's. It replaced a decentral- ized process that suffered a market failure when residency programs and applicants started to seek each other out individually through infor- mal channels, earlier and earlier in advance of

employment, rather than waiting to participate in the larger market. (By the 1940's, contracts were typically being signed two years in ad- vance of employment.) Although the matching algorithm has been adapted over time to meet changes in the structure of medical employ- ment, roughly the same form of clearinghouse market mechanism has been used since 1951 (see Roth, 1984). The kind of market failure that gave rise to this clearinghouse has since been seen in many markets (Roth and Xiaolin Xing, 1994), a number of which have also organized clearinghouses in response.

In the mid 1990's, in an environment of rap- idly changing health-care financing with many implications for the medical labor market, the market began to suffer a crisis of confidence concerning whether the matching algorithm was unreasonably favorable to employers at the ex- pense of applicants, and whether applicants could "game the system" by strategically ma- nipulating the ROLs they submitted. The con- troversy was most clearly expressed in an exchange in Academic Medicine (Peranson and Richard R. Randlett, 1995a, b; Kevin J. Williams, 1995a, b). In reaction to this ex- change, groups such as the American Medical Student Association together with Ralph Nad- er's Public Citizen Health Research Group (1995), and the Medical Student Section of the American Medical Association (AMA-MSS, 1995) advocated that the matching algorithm be

* Roth: Department of Economics, and Graduate School of Business Administration, Harvard University, Cam- bridge, MA 02138 (e-mail: [email protected]); Peran- son: National Matching Services, Inc., 595 Bay Street, Suite 301, Box 29, Toronto, ON M5G 2C2, Canada. We thank Aljosa Feldin for able assistance with the theoretical com- putations reported in Section VI. Parts of this work were sponsored by the National Resident Matching Program, and parts by the National Science Foundation.

748

Page 3: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 749

changed and/or that the description of the match be changed to give applicants more accurate advice about how to participate.'

Medical-school personnel responsible for ad- vising students about the job market began to report that many students believed the NRMP did not function in the best interest of students, and that students were discussing the possibility of different kinds of strategic behavior. Given the prior history of market failure due to lack of con- fidence in the market in this and other entry-level professional labor markets, these reports deserved and received the most serious attention.

In this atmosphere, in the fall of 1995 the Board of Directors of the NRMP commissioned the de- sign of a new algorithm for conducting the annual match, and a study comparing it to the existing NRMP algorithm. The present paper reports how the new algorithm was designed, how the two algorithms were compared, and what was learned about the market in the process. (In May 1997, the NRMP Board of Directors decided to switch to the new algorithm, and the first match using the new algorithm was successfully completed in March 1998.)

In the course of designing, testing, and eval- uating the new clearinghouse algorithm, some surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit on how many interviews are conducted, and one conse- quence of this is that the set of stable outcomes is very small, and there are very few opportu- nities for participants to engage in strategic ma- nipulation of their stated preferences when it comes to making and accepting offers. (Neither of these would be the case in the absence of transaction costs.)

Aside from describing these new facts, and presenting some theoretical computation to ex- plain them, we also describe in this paper the process by which the new clearinghouse algo- rithm was designed, evaluated, and compared with the existing algorithm. At each stage, this

process involved computational experiments. This process resembles engineering practice rather than theorem-proving or hypothesis-test- ing. But despite the fact that economists are increasingly called upon to design markets, there is little or no economic literature devoted to the engineering aspects of economic design and the practical problems of moving from the- ory about simple markets to workable institu- tions for complex markets. Yet if we fail to develop such an "engineering" literature, we will fail to profit from design experience in a cumulative way. The present paper then, in ad- dition to presenting some new results, is in- tended to take a step in the direction of an engineering literature as well, by describing how those facts were learned, and how they impacted design decisions.2

A rough analogy may be helpful for thinking about how the different parts of this paper hang together. Consider the design of suspension bridges. The Newtonian physics they embody is beautiful both in mathematics and in steel, and college students can be taught to derive the curves that describe the shape of the supporting cables. But no bridge could be built based only on this elegant theoretical treatment, in which the only force is gravity, and all beams are perfectly rigid. Real bridges are built of steel and rest on rock and soil and water, and so bridge design also concerns metal fatigue, soil mechanics, and the forces of waves and wind. Many design questions concerning these real- world complications cannot be answered ana- lytically but, instead, must be explored using physical or computational models. Often these involve estimating magnitudes of phenomena missing from the simple Newtonian model, some of which are small enough to be of little consequence, while others will cause the bridge to fall down if not adequately addressed. Just as no suspension bridges could be built without an

1 At around the same time, the Antitrust Division of the Department of Justice initiated a wide-ranging discovery process concerning these markets. This ultimately gave rise to a fairly narrowly focused consent decree involving the practices of the Association of Family Practice Residency Directors (U.S. District Court for the Western District of Missouri, 1996).

2 Some beginnings of such a literature can also be found in connection with the design of electricity markets (Robert B. Wilson, 1993) and the auction of the radio spectrum (see e.g., John McMillan, 1994, 1995; R. Preston McAfee and McMillan, 1996; Lawrence M. Ausubel et al., 1997; Peter Cramton, 1997; John 0. Ledyard et al., 1997; Paul Milgrom, 1997; Charles R. Plott, 1997; David J. Salant, 1997). There is, of course, already something of an engineering-oriented literature in finance; for an innovative example, see Robert J. Shiller (1993).

Page 4: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

750 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

understanding of the underlying physics, neither could any be built without understanding many additional features, also physical in nature, but more varied and complex than addressed by the simple model. These additional features, and how they are related to and interact with that part of the physics captured by the simple model, are the concern of the scientific literature of engineering. Some of this is less elegant than the Newtonian model, but it is what makes bridges stand. Just as important, it allows bridges designed on the same basic Newtonian model to be built longer, stronger, and lighter over time, as the complexities and how to deal with them become better understood.

For the design of the medical labor-market clearinghouse, the underlying theory is the theory of two-sided matching. Simple models of two- sided matching markets have proved to be elegant and tractable, and very useful in understanding the organization and evolution of many markets. But the theory concentrates on simple models in which no worker needs more than one job, and there are no married couples or other connections between workers or between positions. There is a large body of theory relevant to design problems (see e.g., Roth and Marilda Sotomayor, 1990), but none of the theorems applies directly to the med- ical market, although many of the counterex- amples do. That is, many of the existing theorems rest on assumptions not met in the complex med- ical market, and many of the medical market's complexities are known to open the door to the possibility of serious design problems. But the counterexamples do not give any guidance to the magnitude of these problems, and for this we will have to rely on computational exploration, both of the data from the medical market itself, and of simpler models which will help explain what is going on in the complex market. In both cases, the computational explorations will be guided by the theory, which will make possible computational experiments that would be impossible to conduct by brute force on such large markets. It seems likely that, as game theory moves from simple conceptual problems to complex design problems, we will need to make more general use of this interaction among theory, computational investi- gation of market data, and theoretical computa- tion, and that this in tum will produce new problems arnd directions for traditional theory.

This paper is organized as follows. Section I

gives an overview of the medical market and the design problem, and it presents some necessary background by discussing stable matchings and why they are important, how complex markets differ from simple markets with respect to sta- ble matchings, and how the algorithm used by the NRMP prior to this study is structured. Section II presents statistics describing the mar- ket and previous match results. These demon- strate that three of the four match variations that make the NRMP a complex market are present in substantial numbers. Section III describes how the new algorithm was designed, including the role of computational experiments. Section IV compares the performance of the two algo- rithms on the data from recent matches, and Section V looks at the possibilities for strategic behavior when each of the two algorithms is employed. In studying the possibilities for stra- tegic behavior, we will first treat the ROL data as if they were the true preferences of the agents, and then (in Section VI) show why this is justified; we will also explain why the set of stable matchings turns out to be so small. Sec- tion VII presents some thoughts on the interplay among theory, computational experiments, and theoretical computation in the design of market mechanisms. The theory of simple markets framed the questions that needed to be answered in the course of this design and suggested how to construct and evaluate computational exper- iments on the complex system to answer these questions. The magnitudes determined by the computational experiments were then explained with theoretical computations on simple mar- kets, providing results which, with the aid of theory, could be unambiguously interpreted. This interplay was what gave the present design effort its "engineering" flavor, and we suspect that this will generalize to other design efforts. Section VIII contains concluding remarks.

I. Background to the Present Study

The considerable body of theory that has been developed for two-sided matching mar- kets, together with multiple opportunities to ob- serve empirically both the successful and unsuccessful clearinghouse organization of other entry-level labor markets, provided a gen- eral road map for both the design and evaluation of a new clearinghouse algorithm. Specifically,

Page 5: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 751

there was strong empirical evidence that suc- cessful clearinghouses are generally those that produce matchings that are stable in the sense that they do not create "blocking pairs" of agents, not matched to one another, who would mutually prefer to be matched to one another than to accept the matching produced by the clearinghouse. The theory clearly shows that, in sufficiently simple markets (simple in a way that will shortly be made precise), systematic welfare comparisons can be made between dif- ferent stable matchings, with some being rela- tively favorable to firms and unfavorable to workers, and some the reverse. In addition, for sufficiently simple markets the theory allows strong conclusions to be drawn about the op- portunity and scope for strategic behavior. (For an overview of the theory, relevant parts of which will be reviewed below, see Roth and Sotomayor [1990].)

The goal of the design was to construct an algorithm that would produce stable matchings as favorable as possible to applicants, while meeting the specific constraints of the medical market. The comparisons between the new and existing algo- rithms were to focus both on how many applicants and residency programs could be expected to re- ceive more-preferred or less-preferred matches under the two algorithms and on how the different algorithms might influence the opportunity or need for strategic behavior by applicants and pro- grams. Closely related issues were what advice could be given to participants in the match when it is conducted with one or the other of the algo- rithms, and what kinds of changes in the behavior of match participants might be anticipated if the matching algorithm were changed.

These questions were at the heart of the con- troversy that spilled into the medical journals in 1995. Much of that discussion referred to results in the theoretical literature concerning simple two- sided matching markets. But, although the NRMP originated as a simple market, it has become mrrore complex particularly since the early 1980's, as it has developed complementarities and linkages be- tween positions and between applicants. These arise through four kinds of "match variations," introduced to accommodate the changing struc- ture of the medical labor market, namely:

(i) couples in the applicant pool who seek two positions close to one another;

(ii) applicants who seek second-year positions in the match and, if they are successful, have supplemental Rank Order Lists which must be consulted to match them to prerequisite first-year positions;

(iii) residency programs with positions that re- vert to other programs if they remain un- filled;3 and

(iv) programs that wish to fill an even number of positions if they cannot fill all their positions.

These linkages can be shown to allow situations in which many of the conclusions reached about simpler markets no longer apply.

It was therefore necessary, both in designing the new algorithm and in making comparisons to the existing algorithm, first to conduct computa- tional experiments to determine the extent to which the predictions of the theory of simple matching markets applied to the NRMP. These computational experiments, as well as those em- ployed to compare the two algorithms, were con- ducted on the ROLs submitted by all applicants and residency programs in the four most recent matches (1993, 1994, 1995, and 1996) and in the 1987 match. The recent matches were selected to have contemporary patterns of preferences among applicants and residency programs, and 1987 was selected for a comparison over a longer period, and specifically because it had the lowest rate of unmatched U.S. seniors in the available data set (6.0 percent, as opposed to the historically high rate of 7.5 percent for 1996).

A number of specialty matches are also run under the auspices of the NRMP, and these are largely free of the match variations which add complexity to the general resident match. The existing theory of simple matching markets therefore provides accurate predictions about the nature and direction of changes to be antic- ipated in these matches if the existing NRMP algorithm were replaced by the new algorithm.

' Typically these reversions arise when, for example, the director of a second-year postgraduate residency program arranges with the director of a prerequisite first-year pro- gram that his residents will spend their first year in that prerequisite program. However if the second-year program then fails to match with as many residents as were antici- pated, this leaves vacancies in the first-year program that can be filled by other applicants.

Page 6: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

752 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

However the theory offers little guidance as to the magnitude of the changes to be expected, and for this purpose, computational experiments on the data of past matches were also needed. These were conducted for the Thoracic Surgery match, for the five years 1991-1994 and 1996.

The design of the new algorithm and the comparisons of the two algorithms will be dis- cussed in detail in the body of the paper. The general conclusions can be summarized by not- ing that, both for the NRMP and the specialty matches, the effects of changing from the exist- ing algorithm to the newly designed algorithm are in the directions predicted by the theory for simple markets, but the size of these changes is small, and the opportunities for profitable stra- tegic behavior are comparably small for both applicants and programs under either algorithm.

In the course of explaining why the differences are so small, we will present a new kind of "core convergence" result, which shows that the size of the set of stable matchings becomes small as the size of the market increases, even when prefer- ences are uncorrelated, provided that the number of positions for which an applicant can interview remains small (and not otherwise).

A. Stable Matchings in Simple and Complex Matching Markets

Centralized matching mechanisms often arise to solve market failures due to unraveling of ap- pointment dates. Perhaps the most important and least controversial empirical finding about central- ized matching algorithms is that they are most often successful if the matchings they produce are stable (Roth, 1984, 1990, 1991; Roth and Xing, 1994; John Kagel and Roth, 2000). In a simple matching market, a matching between applicants and residency programs is stable if there is no applicant or program matched to an unacceptable (unlisted) partner, and if there are no applicant- program pairs such that the applicant prefers the program to his/her current match, and the program also prefers the applicant to one of its current matches (or vacant position).4

Therefore, this study, and the controversy which preceded it, focused on choices among algorithms that produce stable matchings. The reason for the controversy is that there can be systematic differences among stable matchings. Appendix C gives formal definitions of stability in simple and complex matching markets, but the basic ideas can be conveyed by considering the "deferred acceptance algorithm" first for- mally studied by David Gale and Lloyd Shapley (1962).5 There are two basic versions of this algorithm, in each of which one side of the market (firms or workers) makes offers, which the other side can reject or hold to see whether any better offers are forthcoming.

In the worker-proposing version of the algo- rithm, each worker begins by applying for the position at the top of her preference list. Each firm rejects any unacceptable candidates, and if it has q positions it temporarily holds the (up to) q most-preferred applications it has so far re- ceived and rejects the rest. A candidate who is rejected at any step of the algorithm next applies to her next-highest-ranked position (if any re- main) among those not yet applied to. The al- gorithm stops at any step in which no new applications are made, at which point each worker is matched to the firm (if any) holding her application.

In a simple market the resulting matching must be stable (i.e., there are no firm-worker blocking pairs) since, if a worker w prefers firm f to her final match, she must have applied to firm f and been rejected, and hence firmf does

4 Among the programs and applicants who have inter- viewed one another, programs do not list applicants with whom they are unwilling to match, and applicants do not list programs with whom they are unwilling to match. (Un- matched programs and applicants can be matched in the

postmatch secondary market called the "scramble," which takes place primarily in the 24-hour period before the offi- cial public announcements of the match results.) Also, pro- grams and applicants generally do not list applicants or programs with whom they have not had interviews [and this is of course an equilibrium, since the clearinghouse pro- duces a stable matching, at which an applicant (program) cannot be matched to a program (applicant) without being listed, so there is no incentive to list programs or applicants with whom one has not had an interview]. There is also a charge to applicants who list more than 15 residency pro- grams, which may dissuade some applicants from listing some programs.

' Although Gale and Shapley (1962) discussed the algo- rithm in an abstract setting, it appears that, in various forms, equivalent algorithms have been developed in applied con- texts both before and since, with the initial NRMP algo- rithm, dating from 1951, being the first we know of (Roth, 1984).

Page 7: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 753

not prefer her to any of the workers whose applications it held when the algorithm stopped. Furthermore, Gale and Shapley showed that, when preferences are strict, the particular stable matching produced by the worker-proposing version of the algorithm gives each worker her most-preferred position among those she can get at any stable matching. Even more striking, the firm-proposing version of the algorithm gives every firm that fills q positions its q most- preferred workers among those it can be matched to at any stable matching (Roth, 1985; Roth and Sotomayor, 1989). Much of the con- troversy about the organization of the NRMP focused on this difference between these two versions of the deferred-acceptance algorithm.

But a deferred-acceptance algorithm may fail to produce a stable matching in a market with some of the complexities of the NRMP, such as the presence of couples who submit Rank Order Lists of pairs of positions. The key to the stability of the outcome in simple markets is that (in the worker-proposing version of the algorithm) no firm ever regrets having rejected a worker's application, since it only does so when it has an application it prefers, and it will be matched to this preferred applicant unless it receives appli- cations it prefers even more. However, in a market containing couples, suppose that a firm f1 receives an application from a worker -w1, and rejects an application from a less-preferred worker w' in order to hold w I's application. Suppose further that wI is married to w2, whose application is being held by firmf2, because the pair (fl, f2) is high on the preference list sub- mitted by the couple c = (w1, w2). Finally, suppose that firmf2 now receives an application it prefers and rejects the application of w2. In order for the couple c now to apply to its next- choice pair of firms, (f3, f4), wI must be with- drawn from firm fl. Thus, firm f1 now regrets having rejected worker w', and there may be a potential instability involving f1 and w' (and, if w' is part of a couple, this instability may involve another firm as well; see Appendix C).

The differences between simple and complex markets involve more than the failure of the deferred-acceptance algorithm to produce stable matchings: they extend to the nonemptiness and structure of the set of stable matchings itself. Some of the important differences are summa- rized below, by noting theorems about simple

matching markets that do not hold when the market contains couples or other linkages that create complementarities between positions or applicants (see Roth and Sotomayor [1990] for a comprehensive treatment and more detailed references to the literature):

(i) In simple matching markets, firm and worker optimal stable matchings exist for all possible ROLs and are produced by the firm- and worker-proposing vari- ants of the deferred-acceptance algo- rithm (Gale and Shapley, 1962; Roth, 1985).

(i') In markets with complementarities, no stable matching may exist, and even when stable matchings exist there may be no optimal stable matchings for either side of the market (Roth, 1984; Brian Aldershof and Olivia Carducci, 1996).

(ii) In simple markets, the same applicants are matched at every stable matching, and the same positions are filled. (That is, any applicant who is unmatched at one stable matching is unmatched at ev- ery stable matching, and the positions that are unfilled are the same at every stable matching.) Furthermore, a firm that fills only some of its positions at a stable matching fills them with the same workers at every stable matching (Roth, 1986).

(ii') In markets with complementarities, dif- ferent stable matchings may have differ- ent applicants matched and different positions filled (Aldershof and Carducci, 1996).

(iii) In simple markets, when the applicant- proposing algorithm is used (but not when the program-proposing algorithm is used), it is a dominant strategy for applicants to submit ROLs correspond- ing to their true preferences. No parallel assertion can be made about residency programs that have more than one posi- tion (Roth, 1982, 1985).

(iii') In markets with complementarities, no algorithm exists that chooses a stable matching whenever one exists and makes it a dominant strategy for all agents to state their true preferences (Roth, 1985; Aljosa Feldin, 1999).

Page 8: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

754 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

Therefore, a major focus of this study was to assess the extent to which these theoretical possi- bilities play a role in the actual NRMP matches. In the course of this report it will become clear that, while it has always been possible to find stable matchings in the previous years' NRMP matches (a stable matching has been found in every match at least since the mid 1970's), it appears that no stable matching is precisely program-optimal or applicant-optimal in any of the years we have examined. However, we will show that applicant- proposing and program-proposing algorithms continue to perform approximately as in the case of simple markets.

B. The Preexisting NRMP Algorithm

The preexisting NRMP algorithm (the one in use in 1995 when this study began, and used through the 1997 matches) is the result of incremental modifications over a period of years. It is primarily, but not entirely, a program-proposing algorithm and deals with match variations through a three-phase pro- cess. The first phase produces an initial match by ignoring most match variations, using the program-proposing deferred-acceptance algo- rithm, modified to let couples hold on to many offers until a late stage in the algorithm. The match produced in this way will in general not be stable (because of the way it handles couples, and because the other match varia- tions are ignored), so the second phase of the program identifies potential instabilities. The third phase of the program uses an algorithm to fix these instabilities one by one and pro- duces a stable match. The processing in this third phase does not always have residency programs proposing. Instead, couples propose in part of the algorithm designed to fix insta- bilities due to couples, and applicants also propose in part of the algorithm that fixes instabilities related to supplemental (first- year) matches. Thus the 1995 NRMP algo- rithm is a hybrid; program-proposing in its first phase (which performs the bulk of the matching), and applicant-proposing in some parts of its third phase.

The NRMP specialty matches like Thoracic Surgery are run using an algorithm that is tech- nically a little different from the original NRMP algorithm (it does not handle some of the

NRMP match variations such as the use of supplemental lists to form multiyear matches, and it is organized in a single phase). But when no match variations are present, the specialty- match algorithm and the 1995 NRMP algorithm are functionally equivalent to the program- proposing deferred-acceptance algorithm in that they all produce the program-optimal stable matching.

I. Descriptive Statistics and Original NRMP Match Results

A. The NRMP in the Years 1987 and 1993-1996

Table 1 gives the descriptive statistics of the NRMP match in the five years we consider. Notice that, in each year, a substantial number of the more than 20,000 applicants who partic- ipate do so in ways that utilize the match vari- ations that the NRMP allows: about 4 percent participate as couples, and 8-12 percent submit supplemental Rank Order Lists. In addition, in the 1990's about 7 percent of the 3,000-4,000 programs that participate in each year have po- sitions that could revert to other programs if they remain unfilled (accounting for almost 6 percent of the total quota of positions). Thus, the match variations are a substantial part of the match. Before investigating how these match variations change the properties of stable matches and of strategic behavior, the first task is the design of an applicant-proposing algo- rithm to produce stable matches that meet the match-variation requirements of thousands of participants.

Quotas include positions in active programs with no ROL returned. Changes during the match are caused primarily by reversions. In some cases, one position is reverted simulta- neously to two programs, causing a net increase in the number of positions offered. In addition, a few positions may be dropped from the match during processing to accommodate requests for even/odd matching.

B. Specialty Matches: Thoracic Surgery in the Years 1991-1994 and 1996

In contrast, the Thoracic Surgery match is a simple match, with no match variations. Its ba-

Page 9: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 755

TABLE 1-DESCRIPTIVE STATISTICS AND ORIGINAL MATCH RESULTS FOR (A) THE NRMP AND (B) THORACIC SURGERY

A. NRMP

Category 1987 1993 1994 1995 1996

Applicants (Active, ROL returned): Primary ROLs 20,071 20,916 22,353 22,937 24,749 Applicants with supplemental ROLs 1,572 2,515 2,312 2,098 2,436 Results

Primary matches 16,117 17,209 17,725 18,170 18,316 Supplemental matches 577 1,294 1,152 990 725

Couples Applicants who are coupled 694 854 892 998 1,008 Coupled applicants who matched 646 794 817 899 912

Programs: Active programs 3,225 3,677 3,715 3,800 3,830 Active programs with ROL returned 3,170 3,622 3,662 3,745 3,758 Potential reversions of unfilled positions

Programs specifying reversion 69 247 276 285 282 Positions to be reverted if unfilled 225 1,329 1,467 1,291 1,272

Programs requesting even/odd matching 4 2 6 7 8

Quotas:a Total quota before match 19,973 22,737 22,801 22,806 22,578 Changes during match processing

Quota decreases Programs 22 120 143 124 130 Positions 45 357 357 327 336

Quota decreases Programs 23 127 142 128 138 Positions 46 338 338 303 326

Total quota after match (final quota) 19,972 22,756 22,820 22,830 22,588 Results:

Positions filled 16,694 18,503 18,877 19,160 19,041 Positions unfilled 3,278 4,253 3,943 3,670 3,547 Programs filled 2,100 2,309 2,440 2,599 2,589

B. Thoracic Surgery

Category 1991 1992 1993 1994 1996

Applicant ROLs 127 183 200 197 176 Active programs 67 89 91 93 92 Program ROLs 62 86 90 93 92 Total quota 93 132 141 146 143 Positions filled 79 123 136 140 132

a Quotas include positions in active programs with no ROL returned. Changes during the match are caused primarily by reversions. In some cases, one position is reverted simultaneously to two programs, causing a net increase in the number of positions offered. In addition, a few positions may be dropped from the match during processing to accommodate requests for even/odd matching.

sic descriptive statistics and match results are given in Table 1B.

III. Design of the Applicant-Proposing Algorithm

The process by which the applicant-proposing algorithm was designed is roughly as follows.

First, a conceptual design was formulated and circulated for comment (Roth, 1996a). This was based on an algorithm for simple markets, modi- fied to deal with the complexities of the NRMP. In order for the design to be coded into a working algorithm, a number of choices had to be made concerning the sequence in which proposals would be made. The sequencing of proposals can

Page 10: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

756 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

be shown to have no effect on the outcome of simple matches, but it could potentially affect the outcome when the NRMP match variations are present. Thus, like the overall architecture of the algorithm, the sequencing of proposals is a design question about which the existing theory gives some general guidance that falls short of a com- plete engineering specification. Consequently, we performed computational experiments before making sequencing choices. In what follows, we first present the conceptual design (from Roth [1996a]) and then discuss the sequencing experi- ments and implementation decisions.

A. The Conceptual Design

The algorithm described here is based on the instability-chaining algorithm in Roth and John H. Vande Vate (1990) (which finds stable matchings by resolving applicant-program instabilities one at a time) and on the general design of phase 3 of the preexisting NRMP algorithm.

The object of the algorithm is to produce a stable matching, by assembling a set .A(k) of residency programs and applicants and a tentative matching M(k) with the property that there are no instabilities within the set A(k), and no applicant or program in A(k) is matched to anyone outside of A(k). When the set A(k) has grown to include all applicants and programs, the resulting match is stable, and the algorithm stops.

In the applicant-proposing algorithm, the ini- tial set, _A(0), consists of all positions offered in the match, and the initial tentative matching has all positions vacant. The algorithm begins by selecting an applicant S(1) from the set of ap- plicants in the match and adding S(1) to .A(0) to make the new set _A(1).

At any step k of the algorithm, at which a new applicant S(k) has just been added to form the set A(k), the new tentative matching M(k) is formed as follows. First, applicant S(k) [=S(k, 1)] pro- poses down his Rank Order List [of programs that also rank S(k)], from the top, until the first pro- gram is reached that either has a vacant position or prefers S(k) to its least-preferred current tentative match. In the latter case, this least-preferred appli- cant, S(k, 2) is now rejected by the program in question, and this applicant now proposes down her ROL in a similar way, and so on. Each appli- cant S(k, n) displaced in this way similarly pro- poses down his or her ROL.

At some point in this process, an applicant S(k, n) may be displaced who is a member of a couple, or who is displaced from a primary (second-year) position for which she also holds a supplemental (first-year) position. In either case, a second posi tion now potentially becomes vacant, as the spouse of S(k, n) is withdrawn from his tentative match, or as S(k, n) is withdrawn from her sup- plemental match. In either case, the program whose position is left vacant, P(k, n), is added to a "program stack" to be held for later processing. If S(k, n) is a couple, then both couple members [S(k, n, a) and S(k, n, b)] now propose down their joint ROL of pairs of programs, and they each may displace another applicant. Also, if any S(k, n) has a supplemental ROL associated with her new tentative match, she proposes down it as well, which may also result in the displacement of an- other applicant. Thus, both couples and supple- mental matches may simultaneously displace more than one applicant. One displaced applicant is processed immediately, and any others are added to an "applicant stack" for later processing.

Applicants propose down their ROLs in this way until the applicant stack is empty. (Applicants continue throughout to be able to propose to pro- grams which may be on the program stack.) A residency program is then selected from the pro- gram stack, and all of the applicants in A(k) with whom it might form instabilities [i.e., all of the applicants in A(k) who are preferred by the pro- gram to its least-preferred current tentative match and who prefer ffiis program to their cufrent match] are added to the applicant stack, which is processed as before, with applicants proposing down their ROLs from the top.

When both the applicant and program stacks are empty, the tentative matching thus produced is M(k): no instabilities for M(k) are contained in the set _A(k), and no applicant or program in JA(k) is matched by M(k) to anyone outside of _A(k). The algorithm is now ready to pick a new applicant S(k + 1), and start the process again, for the set _A(k + 1).

When all applicants have been included in the set ?A(k), even/odd requests and program rever- sions are adjusted, which causes additions to the applicant and program stacks, which are han- dled as above. When these stacks are empty, the algorithm stops, and the last tentative match becomes final.

In a match with no match variations, the

Page 11: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 757

applicant and position stacks would always be- come empty, and the final match would be the applicant-optimal stable matching. When the match variations are present, there is a possibil- ity that at some stages of the algorithm the position stacks would never become empty (i.e., a cycle would occur, in which the same posi- tions reappeared on the stack). Therefore, "loop-detectors" need to be added to each stage k. Every loop must involve a position becoming unmatched and made vacant either because a couple or a supplemental assignment has been withdrawn from the position, or a position has been withdrawn from an applicant (e.g., in sat- isfying an even/odd constraint). Thus, a loop- detector can work by keeping a log of when positions become unmatched in these ways [i.e., recording which applicant is unmatched from which position, during the processing of some step SA(k)]. If the same pairs appear multiple times, a loop is in progress. How to proceed at this point may depend on the nature of the loop. (It is observed in Roth and Vande Vate [1990] that certain kinds of inessential loops can be rendered harmless by randomizing the order in which applicants and positions are processed from the stacks. Loops due to the nonexistence of a stable matching would be more serious, but the prior experience of the NRMP suggests that these may be rare.)

Thus, the existing theory suggests the general architecture for an applicant-proposing algo- rithm that can deal with instabilities one at a time as they are detected, and it provides guid- ance on how the algorithm may possibly fail to find a stable matching. But to determine how often it might fail to produce a stable matching we need some computational experiments. The experiments reported next, which will help de- termine the details of the algorithm design, will also show that failures are rare: we will not observe even a single failure when we explore different versions of the algorithm on previous years' ROL data.

B. Sequencing Questions and Implementation Decisions

In a simple match, without the NRMP match variations, the applicant-proposing algorithm just described always produces the applicant-optimal stable match, and the program-proposing algo-

rithm always produces the program-optimal stable match, regardless of the order in which proposals are processed within the algorithm. One conse- quence of the fact that these optimal stable matches do not exist in general when the match variations are present is that the order in which applicants and programs are processed may have an effect on the match produced. Thus the se- quence in which applicants and programs are pro- cessed at various points in the algorithm needs to be considered as part of the design of the applicant-proposing algorithm.

Two issues were considered in conducting and evaluating experiments related to the se- quencing of operations in the algorithms.

(i) Do sequencing differences cause substan- tial or predictable changes in the match result (e.g., do applicants or programs se- lected first do better or worse than their counterparts selected later)?6

(ii) Does the sequence of processing affect the likelihood that an algorithm will produce a stable matching? (In connection with this latter point, recall that instability-chaining algorithms can cycle-even when stable matchings exist, and certainly when they do not. Therefore, one objective was to consider how sequencing decisions might influence the frequency of "loops" occur- ring in the algorithm.)

Experiments to test the effect of sequencing were conducted using data from three NRMP matches: 1993, 1994, and 1995.

1. Sequencing Experiments on the Preexisting NRMP Algorithm.-We investigated the effect of different sequencing of operations in variants of the preexisting NRMP algorithm, in part to estab- lish a baseline against which to compare the algo- rithm to be designed. In the preexisting algorithm,

6Even in a simple matching market, the order in which proposals are made can matter in versions of the Roth and Vande Vate (1990) instability-chaining algorithm in which members of both sides of the market may be chosen to make the next proposal (in contrast to versions in which all proposals are made by one side of the market). Yosef Blum and Uriel G. Rothblum (1999) show that, in such a version of the algorithm, late proposers have an advantage over early proposers.

Page 12: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

758 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

programs are processed in ascending sequence by six-digit program code number. To test the sensi- tivity of the results to this sequencing, computa- tional experiments were run on the ROL data in which this sequencing was reversed (i.e., pro- grams were processed in descending order by program code number). As expected, the results showed differences, but the differences were small: the largest difference was in 1994 when only four out of 3,662 programs that submitted ROLs received a different match under the alter- native ordering, as did four out of 22,353 appli- cants. Not only are these differences very small, they do not appear to be systematic.7 (A fuller account of the results of these experiments ap- pears in Appendix A.)

The preexisting NRMP algorithm was also investigated for its sensitivity to the sequence in which reversions are processed. Rather than simply changing the order in which reversions occurred, the experiments involved setting the input program quotas to be the final postmatch quotas produced by the preexisting NRMP al- gorithm. All further reversion processing was then eliminated. These experiments then pro- vided an indication of the differences caused, not only by changing the order of reversions, but also by altering the time when reversions enter into the match processing (i.e., all required reversions were assumed to take place simulta- neously, at the beginning of match processing). No more than two programs or applicants were observed to be affected by such changes in any of the three years 1993-1995 (see Appendix A).

Finally, it should be noted that no loops were detected in any of these experiments on the pre- existing NRMP algorithm. Consequently, despite the presence of match variations, sequencing does not appear to play a significant role in the opera- tion of the preexisting NRMP algorithm.

2. Sequencing Experiments on the Applicant- Proposing Algorithm.-Computational experi- ments were conducted to measure the impact of:

(i) the sequence in which applicants are ad- mitted to the algorithm for processing;

(ii) the sequence in which couples are pro- cessed relative to other applicants; and

(iii) the sequence in which applicants ranked by a program are processed when at- tempting to fill a program that has been selected from the program stack.

To understand the results of the computational experiments (which are tabulated in detail in Appendix A) it is useful to compare the out- comes from each experiment to those from a fixed baseline. We chose as a baseline an applicant-proposing algorithm in which appli- cants were processed in ascending order by their applicant codes, regardless of whether they were single or members of couples. (In all cases, when a member of a couple was pro- cessed, so was the other member. When appli- cants were processed in ascending code order, a couple was selected for processing based on the code number of the spouse with the lower ap- plicant code.) When a program was selected from the program stack, applicants were pro- cessed in ascending sequence by program rank number. All of these experiments were carried out on the ROL data from the NRMP matches in 1993, 1994, and 1995.

The experiments were conducted in a partial factorial design. The handling of couples had three treatments (couples intermixed with sin- gles, couples first, and couples last); the order of introducing applicants into the match had two treatments (ascending order by applicant code or descending order); and the order of process- ing applicants when a program is pulled from the stack had two treatments (ascending order by program rank or descending order). The re- sults are that none of these sequencing decisions had a large or a systematic effect on the match- ing produced. In two-thirds of the cases, the match was the same as in the baseline case. (In the majority of the remaining cases only two applicants received different matches, and the maximum number of applicants affected was 12 out of 22,937, which occurred when a couple received a worse match and initiated a chain of displacements. This happened in two of the 18 cases and involved the same 12 applicants in both cases).

However, there was an effect of sequencing on

7We use the term "very small" informally, but not merely to express an opinion of changes that affect on the order of 0.01 percent of applicants. These changes are also at least an order of magnitude smaller than the main effects we will find due to changes between program-proposing and applicant-proposing algorithms. Since the effects appear to be unsystematic, they do not appear to have any welfare implications, on average.

Page 13: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 759

the internal processing of the algorithm. The num- ber of loops encountered was fewest when cou- ples were introduced to the match after single applicants. This is not too surprising in view of the fact that no loops would occur in the absence of match variations. The results indicate that loops are least likely to occur when the couples are introduced into the larger market with some ten- tative matches already assembled, as opposed to when couples enter first, so that the initial tentative matches involve only couples. Introducing cou- ples last reduces the numbers of loops (and hence the potential that in some future match it would be difficult to find a stable matching) without chang- ing the prospects of couples or single applicants in the match.

Finally, experiments related to the se- quence in which reversions are processed were performed on an applicant-proposing algorithm. These experiments were similar to those performed on the preexisting NRMP algo- rithm. Again, no substantial changes were induced by changing the order in which reversions were handled; no changes at all resulted in the 1993 match, and only two applicants and programs were affected in the 1994 and 1995 matches (see Appendix A). Thus, for both the preexisting NRMP algorithm and the applicant-proposing al- gorithm, there is almost no difference between the results obtained with reversion processing and the results obtained by setting the quotas to the final quotas after reversions and eliminating further re- version processing. (This point simplifies the de- sign of some of the experiments to compare the two algorithms, in connection with strategic be- havior by residency programs, to be discussed later in this article.)

Based on the sequencing experiments de- scribed above, it was decided to sequence all proposals by couples after proposals by single applicants, since this was the order that pro- duced the fewest internal loops.8

Note that we did not at any point choose to randomize the processing order (randomization was shown in Roth and Vande Vate [1990] to allow the algorithm to escape from certain kinds of loops). The reason is that loops do not appear to be a problem with the processing sequences selected, and it was felt that a desirable feature of the match is that it should be precisely re- producible from the ROL data.

IV. Differences in the Matches Produced by the Two Algorithms

A. The NRMP

The preexisting NRMP algorithm and the newly designed applicant-proposing algorithm were compared in terms of the matches that they produce for the ROLs submitted in 1987 and 1993-1996. Table 2 gives the results of these

8 The full details of the sequencing decisions are as follows:

(i) All single applicants are admitted to the algorithm for processing before any couples are admitted.

(ii) Single applicants are admitted for processing in ascending sequence by applicant code.

(iii) Couples are admitted for processing in ascending sequence by the lower of the two applicant codes of the couple.

(iv) When a program is selected from the program stack for processing, the applicants ranked by the program are processed in ascending order by program rank number.

(v) The processing of programs requesting even num- bers of matches or reversions of unfilled positions is deferred until all applicants have been admitted for processing.

(vi) Programs requesting even numbers of matches are processed in ascending sequence by program code. An applicant deleted from a program in order to leave an even number of matches in the program is placed on the applicant stack for processing.

(vii) Programs requesting reversions of unfilled positions are processed in ascending sequence by the program code of the program "donating" the unfilled posi- tion(s). A program that "receives" a reverted posi- tion is placed on the program stack for processing.

(viii) After all reversions have been processed, the re- quests for reversions are reprocessed, in case any new reversions of unfilled positions are required as a result of changes made in the processing of rever- sions that have been processed since the last time this reversion request was considered.

(ix) When no further processing is required to satisfy all reversions, requests for even numbers of matches are reprocessed as in point (vi) above, and if any changes are made, requests for rever- sions are reprocessed as in points (vii) and (viii) above. This iterative processing continues until no further changes are made by even processing or reversion processing. (The possible need for a reverted position to be "unreverted" is checked as part of the check for stability, by using original quotas for programs which have lost positions through reversions.)

Page 14: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

760 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

TABLE 2-CoMPARISON OF RESULTS BETWEEN ORIGINAL NRMP ALGORITHM AND APPLICANT-PROPOSING ALGORITHM

Result 1987 1993 1994 1995 1996

Applicants:

Number of applicants affected 20 16 20 14 21 Applicant-proposing result preferred 12 16 11 14 12 Current NRMP result preferred 8 0 9 0 9

U.S. applicants affected 17 9 17 12 18 Independent applicants affected 3 7 3 2 3

Difference in result by rank number I rank 12 11 13 8 8 2 ranks 3 1 4 2 6 3 ranks 2 3 2 2 3 More than 3 ranks 2 1 1 2 3

(max 9) (max 4) (max 5) (max 6) (max 6)

New matched 0 0 0 0 1 New unmatched 1 0 0 0 0

Programs:

Number of programs affected 20 15 23 15 19 Applicant-proposing result preferred 8 0 12 1 10 Current NRMP result preferred 12 15 11 14 9

Difference in result by rank number 5 or fewer ranks 5 3 9 6 3 6-10 ranks 5 3 3 5 3 11-15 ranks 0 5 1 3 1 More than 15 ranks 9 4 6 0 11

(max 178) (max 36) (max 31) (max 191)

Programs with new position(s) filled 0 0 2 1 1 Programs with new unfilled position(s) 1 0 2 0 0

comparisons. The first half of the table concen- trates on the comparisons from the point of view of applicants; the second half is from the point of view of programs.

Only about 0.1 percent of applicants are af- fected by the change in algorithms, and of these, most prefer the match they receive under the applicant-proposing algorithm. Note that in two of the five years the number of applicants matched changed by one (one fewer in 1987, one more in 1996). Recall that in a simple match a change from one stable matching to another would never change the number of ap- plicants matched; so here is another case in which the match variations cause a difference, but a difference which turns out to be very small and unsystematic.

Equally few programs are affected by the change of algorithms-and these constitute about 0.5 percent of all programs. Most, but not all, of the programs prefer the match they

receive under the preexisting NRMP algo- rithm, but in 1994 and 1996 slightly more programs would even have preferred the applicant-proposing algorithm to the preexist- ing NRMP algorithm. Most programs that receive a different match have only one ap- plicant different between the matches pro- duced by the two algorithms. The majority of differences have to do with filling a position with a different applicant; only a small num- ber of positions move from being filled to unfilled or vice versa. Again, this is a conse- quence of the match variations; as already noted in the case of applicants, it turns out to be both very rare and unsystematic.

It may be helpful at this point to consider an example of precisely how the match variations can cause a deviation from the predictions of the the- ory for simple markets; for example, how it can be that a few applicants do worse with the applicant- proposing algorithm than with the program-

Page 15: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 761

proposing algorithm. For example, if switching to the applicant-proposing algorithm causes appli- cant A to improve his match from his second to his first choice, it may be that the first choice now requires a supplemental match that was not re- quired before. If this new supplemental match displaces a previously matched but less-preferred applicant in a program, that displaced applicarnt is forced to go further down his or her list (i.e., does worse). Furthermore, matching that applicant may displace another applicant, who may displace an- other, and so on, causing a chain of applicants who do worse (even though, as expected of the applicant-proposing algorithm, this chain of events began with an applicant who did better than he would have if the program-proposing algo- rithm had been used).

It is worth noting that when we refer to "only 0.1 percent" of applicants, we are talking about a change whose small size we will explain in what follows. But this does not necessarily imply that the associated change in welfare is small. Indeed, in the debate that led to this study, and after our report was circulated to the interested parties, a great deal of discussion stemmed from the view that the difference in welfare was likely to be large for the affected applicants, and likely to be small for the affected programs. This contributed to the decision to adopt the applicant-proposing algo- rithm, a decision strongly lobbied for by the stu- dent organizations, and eventually unanimously adopted by the NRMP Board.9

B. Thoracic Surgery

Because there are no match variations in the Thoracic Surgery matches for the years we con- sider, they are simple matches and are well described by the existing theory. Consequently

we know that the applicants will all do as well as possible at the stable match produced by the applicant-proposing algorithm, and the pro- grams will all do as poorly as possible at that stable matching. What the theory does not tell us is how large this effect will be; for that we need to look at the data (see Table 3). As discussed in the introduction, the effect turns out to be minimal: in the five years we studied, only four applicants and four programs would have been affected by a change in algorithms; in three of the five years, the applicant-proposing algorithm would have produced the same match as the program-proposing algorithm, indicating that this was the only stable matching in those years. (August Colenbrander [1996] reports similarly small differences in the specialty matches he maintains.)

V. Differences in Sensitivity to Participant Behavior

The comparisons of match outcomes dis- cussed in the previous section are all based on Rank Order Lists that were submitted for matches made by the preexisting NRMP and specialty match algorithms. While the changes observed when the match was instead produced by the applicant-proposing algorithm were small, a comparison of the algorithms also re- quires us to consider whether participants might have reason to submit different kinds of ROLs if the new algorithm were to be substituted for the preexisting one. For this purpose, we consider whether participants could have favorably influ- enced the match, under either algorithm, by submitting different ROLs. The idea is to assess both how many participants could do so and how the number is different for the two algo- rithms. This will also allow us to determine what kinds of advice can be given to partici- pants about how to participate in the match, under either algorithm.

Once again, this is a subject about which the theory of simple matching markets tells us a great deal for markets without the match varia- tions found in the NRMP. To see how well the theory for simple markets approximates the NRMP matches, and also to assess the size of the effects to expect, again required computa- tional experiments on the data. A quick review of the theory will help organize the discussion.

9 The argument about the size, and relative size, of the welfare effects for applicants and programs can be para- phrased in part roughly as follows. Both programs and applicants have some uncertainty in their rankings. There may not be that much difference between a program's 7th and 17th ranked candidates. Similarly, applicants may not be able to judge clearly whether they will get a better educational experience at their first- or second-choice pro- grams. But applicants can clearly judge other factors in their preferences, such as whether they would prefer to live in Seattle or Miami, where these programs may be located. Therefore, a change of algorithms may have a big effect on the affected applicants, and only a small one on the affected programs.

Page 16: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

762 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

TABLE 3-DIFFERENCE IN RESULT WHEN ALGORITHM

CHANGED FROM PREEXISTING SPECIALTY MATCH TO

APPLICANT-PROPOSING

Year Difference

1991 none 1992 2 applicants improve, 2 programs do worse

1993 2 applicants improve, 2 programs do worse

1994 none 1996 none

A. Strategic Behavior in Simple and Complex Matching Markets

In a simple matching market, without match variations, it has been shown (Roth, 1982) that there do not exist any stable matching algo- rithms that completely remove the possibility that some applicant or program can get a better match by submitting an ROL that differs from the applicant or program's straightforward pref- erences. However, we have already noted the following:

In simple markets, when the applicant- proposing algorithm is used, but not when the program-proposing algorithm is used, no applicant can possibly improve his match by submitting an ROL that is dif- ferent from his true preferences. (Recall also that no parallel assertion can be made about residency programs that have more than one position.)

Therefore, in simple markets, we would find strategic opportunities for applicants only when the program-proposing algorithm is used, and the theory tells us what these might be. Specif- ically, consider the ROL of some applicant, and define a truncation of that ROL to be a shorter ROL that is the same as the original ROL for as many programs as it ranks. We can then say the following:

In simple markets when the program- proposing algorithm is used, every appli- cant who can do better than to submit his true preferences as his ROL can do so by submitting a truncation of his true prefer- ences. That is, if (holding all other ROLs constant) an applicant would be matched to his kth choice if he submitted his true preferences, and his jth choice (with j <

k) if he submitted some other ROL, then he can be matched to his jth choice by submitting a truncation of his true prefer- ences at the jth choice. Furthermore, no part of his original ROL below the kth choice has any effect on the match (Roth and Vande Vate9 1991).

It can also be shown that truncations are the kind of manipulation that can potentially be identified with the least information about oth- ers' preferences (Roth and Rothblum, 1999).

In simple markets, the reason that all success- ful manipulations can (also) be accomplished by truncations is that, in a simple market, a deferred-acceptance algorithm never "back- tracks": no information in an agent's ROL is used beyond the point at which that agent is matched. Although we cannot apply this result directly to the complex market, we can do com- putational experiments to assess how good an approximation is provided by concentrating only on truncations in the investigation of pos- sible strategic manipulations in the NRMP. Spe- cifically, if we find that information about agents' preferences among options below the point at which they are matched has little effect on the match, then we can be confident that investigating truncations will give us a compa- rably good approximation for the magnitude of possible strategic manipulations in the complex NRMP market.10

For simple markets, the theory also tells us which applicants can potentially profit from ma- nipulation, and how much:

In simple markets, when the program- proposing algorithm is used, the only ap- plicants who can do better than to submit their true preferences are those who would have received a different match from the applicant-proposing algorithm. Furthermore, the best such applicants can do is to obtain the applicant-optimal match, and they can do this by submitting to the program-proposing algorithm the truncation of their true preferences that stops at the match they would have gotten from the applicant-proposing match (see

? This would free us from the computationally impos- sible task of investigating all possible manipulations by all participants.

Page 17: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 763

Gabrielle Demange et al., 1987; Roth and Sotomayor, 1990).

It is important to note that, even in the case of a simple match without match variations, an applicant generally would not have the informa- tion needed to submit such a truncation (and if he submitted a truncation that was one program too short he would become unmatched). But this result shows that, in a simple match, we can identify an upper bound on the number of ap- plicants who could possibly profit from manip- ulating their Rank Order Lists, by seeing how many applicants receive different matches at the two algorithms.

We cannot directly apply this upper bound to the NRMP, because it depends for its proof on the existence of optimal stable matchings for each side of the market, which we know (from the sequencing experiments) do not exist in the NRMP data. But the theory of simple matches allows us to use the compu- tational results reported in Table 2 as a nu- merical benchmark against which to compare the computational estimates we will make of the scope for possible manipulation. That is, we can compare the estimates we get of how many applicants can potentially profit from strategically stating their ROLs with the num- bers of applicants who were observed to get different matches from the two algorithms. If these numbers are close for the program-pro- posing algorithm (and close to zero for the applicant-proposing algorithm), then the the- ory of simple matches provides a comparably close approximation for the situation in the complex NRMP market.

The case of programs that have more than one position is not so simple, even in the case of simple matches. Programs may, at least in theory, profit both from truncating their ROLs, and from reducing the number of po- sitions they submit to the match (either by making early arrangements with some appli- cants or by withholding positions to be filled by unmatched applicants after the match). The temptation for this latter kind of manip- ulation can be shown to be larger with the program-proposing algorithm than with the applicant-proposing algorithm (see Tayfun Sonmez, 1997, 1999). Thus, in addition to experiments with truncations of ROLs, we

must also conduct computational experiments involving reductions in stated capacities.

B. Experiments to Determine Upper Bounds for Profitable Strategic Behavior

1. Preliminary Experiments: Truncation of ROLs at the Match Point.-As noted above, in a simple market, if an applicant is matched to his kth-choice program, or if the lowest- ranked applicant a program is matched to is its kth-choice applicant, truncation of the ROL at the kth entry would have no influence on the match. This is because, in a simple match, the applicant- or program-proposing algorithms never have to "backtrack" on an ROL. But in the NRMP, backtracking can occur, because of the match variations. Thus, before exploring what truncations, if any, could have a strategic effect on the match, it was first necessary to see whether truncations at the match point [i.e., deleting the k + 1 and higher (less-preferred) choices for a partici- pant who was matched to his kth choice] could influence the result of the match under either algorithm, and how much. The trunca- tions of applicant ROLs and program ROLs were investigated separately, for each algo- rithm, for the 1993, 1994, and 1995 matches. In the majority of cases no change was pro- duced when all ROLs were truncated at the match point; and in no case were more than three applicants affected by such truncations. (Over the more than 60,000 applicants in- volved in these experiments, only four were affected by truncations of applicants' ROLs; see Table B 1 in Appendix B for the detailed results.) Thus, truncations at the match point, while not entirely without effect, do not play a substantial role; they affect on the order of 0.01 percent of applicants, an order of mag- nitude smaller than the effects of changing algorithms.

Because we have now seen that information beyond the match point influences the outcome for only a tiny percentage of participants, con- centrating on truncations will give us a compa- rably good approximation for the numbers of participants who could potentially profit from any kind of strategic manipulation of ROLs. The computational experiments which follow,

Page 18: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

764 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

therefore, will concentrate on identifying an up- per bound on the number of participants who (if they had the necessary information) could po- tentially profit from strategic behavior involv- ing truncations of their ROLs above the match point, and (for programs) reductions in the num- ber of positions they offer in the match below the number of applicants to which they were matched.

2. Experiments to Determine Upper Bounds. -As discussed above, the kinds of strategic manipulation to be considered involve trunca- tion of ROLs by applicants or programs, and reductions in stated numbers of positions (quo- tas) by programs. Since we want to know how often a single agent can profitably manipulate the stated ROL, we could in principle conduct a separate experiment for each participant, but this would be computationally infeasible. Con- sequently we need to design an efficient exper- iment that will let us tightly bound the number of individuals who can potentially profit from manipulating their ROLs.

The manipulations involving program quotas raise the question of how to handle reversions of positions when quotas are to be different from those in the data. Similar questions arise when truncating program ROLs, as this may increase unfilled positions. All of the experiments con- cerning strategic behavior of programs handle reversions by fixing quotas at the final quotas observed after the match with the original match data. None of the results are likely to be sensi- tive to this simplification, as shown by the re- sults of the sequencing experiments discussed in Section III and detailed in Appendix A.

For each of the strategic manipulations whose potential magnitude is to be assessed, the chief difficulty in designing the experiments is that a change in a single ROL or quota has two kinds of effects: it may potentially change the match of the applicant or residency program whose ROL or quota is changed, but it may also potentially change the match of other applicants and residency programs. To see why, suppose that the ROL of some applicant is truncated above his current match point, and that the match under one of the algorithms is rerun after making (only) this change. Then the applicant whose ROL was changed may do better (by being matched to a more-preferred choice) or

worse (by being unmatched instead of matched). At the same time, other applicants may do better (and other residency programs may do worse) because of the availability of the position previously held by the applicant whose list was truncated.

This means that, if we truncate a group of applicant ROLs, for example, and see how many of the applicants in this group receive a better match as a consequence of this change, we will be looking at an overestimate of the number of ap- plicants in the group who could have benefited from truncating their own ROL; many of them will have instead profited from someone else's truncation (even if that person himself became unmatched as a result of his own truncation). Thus, the number obtained in this way would be an upper bound on the number that would have benefited by truncating their own ROL, while holding all others' constant. But we do not have to settle for this upper estimate; we can refine it iteratively, by now continuing to truncate the ROLs only of those applicants whose match im- proved as a result of the previous (collective) truncations. This will allow us to further eliminate from the set of truncations those who profited from the truncations of applicants who were them- selves harmed by their own truncation. Proceed- ing in this way, we can continue until no more reductions in the sample are achieved. This final number will still be an upper bound, of course, since even in a group of truncators who all do better when they all truncate their preferences, some may be profiting from the truncated ROLs of the others, not from their own truncation.

Experiments were conducted separately for applicants and for programs, and separately for each of the two algorithms. A computational experiment for applicants in a given year started by truncating all ROLs just above the (lowest) match point (i.e., every applicant's primary ROL was truncated just above the match he received when no ROLs were truncated using the algorithm in question). For example, if an applicant originally matched to rank 3 on his primary ROL, the truncated ROL contained only his first two choices. Of course, many applicants were left unmatched by this trunca- tion, while others received preferred matches (these were the only two possibilities at this stage). Then at the next step, the ROLs of all those who had truncated their lists but did not

Page 19: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 765

improve were restored to their original length, and the process was repeated with the smaller number of truncations that remained. This pro- cess was repeated until it converged. Computa- tional experiments for programs were structured similarly; starting with every program's ROL truncated just above the lowest-ranked match it received. The full results for the NRMP for 1987 and 1993-1996 are given in Appendix B. (Table B2 reports the results of the truncation of applicant ROLs for each algorithm; Table B3 reports the results for programs.) The results can be summarized by looking at the final upper bounds of the number of applicants and the number of programs that could possibly benefit from truncating their ROLs.

The results are reported and analyzed below, first for the NRMP matches, and then for Tho- racic Surgery.

a. Results for the NRMP.-The truncation experiments with applicants' ROLs yield the upper bounds shown in Table 4 for the two algorithms in the years studied. As expected, more applicants can benefit from list truncation under the preexisting NRMP algorithm than under the applicant-proposing algorithm. Note that the number of applicants who could even potentially benefit from truncating their ROLs under the preexisting NRMP algorithm is in each year almost exactly equal to the number of applicants who received a preferred match un- der the applicant-proposing match (line 2 of Table 2). We will return to this point in a moment, but note that it suggests that this upper bound is very close to the precise number that would be predicted in the absence of match variations.

The truncation experiments with programs' ROLs yield the upper bounds shown in Table 5. As expected, some programs can benefit from list truncation under either algorithm. However, consistently more programs benefit from list truncation under the applicant- proposing algorithm than under the preexist- ing NRMP algorithm. Note that, although the numbers of programs in these upper bounds remain small, they are in many cases about twice as large as the number of programs that received a preferred match at the stable matching produced by the algorithm other than the one being manipulated. (That is, re-

TABLE 4-UPPER LIMIT OF THE NUMBER OF APPLICANTS

WHO COULD BENEFIT BY TRUNCATING THEIR LISTS AT ONE

ABOVE THEIR ORIGINAL MATCH POINT

Upper limit

Preexisting NRMP Applicant-proposing Year algorithm algorithm

1987 12 0 1993 22 0 1994 13 2 1995 16 2 1996 11 9

TABLE 5-UPPER LIMIT OF THE NUMBER OF PROGRAMS

THAT COULD BENEFIT BY TRUNCATING THEIR LISTS AT

ONE ABOVE THE ORIGINAL MATCH POINT

Preexisting NRMP Applicant-proposing Year algorithm algorithm

1987 15 27 1993 12 28 1994 15 27 1995 23 36 1996 14 18

ferring back to Table 2, we see, for example, that in 1995 only 14 programs preferred the matching produced by the preexisting NRMP algorithm to the one produced by the appli- cant-proposing algorithm, but we now find 36 programs in our upper bound of programs that could potentially profit from a manipula- tion of the applicant-proposing algorithm.) It therefore seemed worthwhile to examine these upper bounds further and see if they were overestimates.

For each algorithm, this was first done by taking a 50-percent sample of the programs contained in the upper bound for 1995 and restarting the truncation experiment with only these programs having truncated ROLs. The idea is that, if each of these programs can in fact benefit from its own truncation, the experiment would stop after the first iteration, with no fur- ther reductions in the upper bound. But if in fact the upper bound is an overestimate, and some of the programs in it are benefiting not from their own truncated ROLs, but from the truncation of one of the other ROLs in the upper bound, then on average half of such "false positives" in our 50-percent sample would have been benefiting from the truncation by one of the programs in

Page 20: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

766 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

TABLE 6-REFINED ESTIMATE OF THE UPPER LIMIT OF THE

NUMBER OF PROGRAMS THAT COULD IMPROVE THEIR

RESULTS BY TRUNCATING THEIR OWN ROLs IN 1995

Preexisting NRMP Applicant-proposing Estimate algorithm algorithm

Original results 23 36 Current estimate

(still an upper limit) 12 22

the other 50 percent, which are no longer trun- cated. In this case we would iterate until the number of truncators who improved their out- come again stabilized at a new, lower upper bound. This is in fact what happened; the new estimates for 1995 (equal to twice the number obtained from the 50-percent sample) are com- pared to the old ones in Table 6. These results confirm that the number of programs that can benefit from the ROL truncations stated earlier are indeed overestimates.

A further analysis was undertaken for each of the five years, to compare the specific individual programs and applicants who appear in these upper bounds as potentially benefiting from ROL truncations with the programs and appli- cants whose results changed when the algorithm changed. This analysis indicated that those who could benefit from ROL truncations were, for the most part, those who did differently (gener- ally worse) when the algorithm was changed from their side proposing to the other side pro- posing (without ROL truncations). For exam- ple, the applicants who can benefit from ROL truncations when the program-proposing algo- rithm is used are very largely the same as those who benefit when the algorithm is changed to an applicant-proposing algorithm with no ROL truncations. Thus, in this respect also, it appears that the theory for simple markets provides a good approximation of the situation in the NRMP match.

We next turn to the question of capacity manipulation by programs. Recall that in an actual match this could be considered by a pro- gram in the context of either an early agreement (for example with an independent applicant) or in anticipation that some positions would be filled postmatch.

An initial experiment was run setting all program quotas to the number of positions filled with the algorithm in question and the original data. (This is analogous to the initial experiment involving truncations of the ROLs at the match point, rather than above it.) In a simple match without NRMP match variations, this would be expected to have no impact on the results. How- ever, with NRMP match data some differences were observed, as seen in Table 7. With the applicant-proposing algorithm, the differences are negligible. However, more differences were observed with the preexisting NRMP algorithm, and the results obtained by setting the quotas to the original positions filled tended to produce better results for the programs.

In order to identify programs that could im- prove their remaining matches by further reduc- ing their quotas, an iterative technique was employed similar to that used to investigate the effects of ROL truncations. After several itera- tions revised downward the upper bounds ob- tained in this way, the resulting upper bounds on the number of programs that could poten- tially profit from stating lower quotas was as shown in Table 8. Again, these numbers are still estimates of the upper bound; further refinement is still possible. However, given the size of these numbers, it seems clear that only a very small number of programs (less than 1 percent) could improve their remaining matches by re- ducing their quotas. This does not appear to be an advisable strategy for programs to follow with either algorithm.

b. Results for Thoracic Surgery.-Because the Thoracic Surgery match does not have match variations, the theory tells us precisely which applicants and programs could improve their match by an optimal manipulation. As a check on our computational procedures, we confirmed these predictions by running the same computational experiments on ROL trun- cations as described for the NRMP matches. The results, summarized in Table 9, are as ex- pected. Thus, in Thoracic Surgery as in the larger and more complex NRMP match, the opportunities for strategic manipulation are es- sentially nonexistent under either algorithm. (Colenbrander [1996] reaches essentially the same conclusions about the specialty matches he maintains.)

Page 21: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 767

TABLE 7-RESULTS WITH INPUT QUOTAS SET TO POSITIONS FILLED, COMPARED TO ORIGINAL RESULTS

1993 1994 1995

Preexisting Applicant- Preexisting Applicant- Preexisting Applicant- NRMP proposing NRMP proposing NRMP proposing

Result algorithm algorithm algorithm algorithm algorithm algorithm

Programs Improve 12 2 9 none 25 none Do worse none none 3 2 9 2

Applicants Improve none none 3 2 6 2 Do worse 12 2 9 none 27 none

TABLE 8-REVISED ESTIMATE OF THE UPPER BOUND OF THE NUMBER OF PROGRAMS THAT COULD IMPROVE THEIR

REMAINING MATCHES BY REDUCING QUOTAS

Preexisting NRMP Applicant-proposing Year algorithm algorithm

1987 28 8 1993 16 24 1994 32 16 1995 8 16 1996 44 32

VI. Why the Differences Are Small: Insights from the Theory of Simple Markets

All the results to this point can be character- ized by noting that the theory of simple matches, without match variations, gives a good prediction of the direction of each of the com- parisons, and, in addition, the size of all the changes has been very small. This section ex- plores what insights we can get from simple markets to help explain why these differences are so small. The results in this section are based on computational comparisons similar to those discussed earlier, but now concerning hy- pothetical markets without any match vaiia- tions.

The small differences between algorithms we have been seeing reflects that, in each of the years studied, the set of stable matchings has been small, as measured by the number of par- ticipants who receive different matches from the program-proposing and applicant-proposing al- gorithms. 1 1 It is therefore of interest to consider

how the set of stable matchings looks in com- parably large markets when we concentrate on simple matches. For this purpose, we consider the very simple matching markets with n firms (each with one position) and n applicants, as n approaches the size of the markets we are study- ing, namely, the specialty markets like Thoracic Surgery and the general NRMP match.

One factor that strongly influences the size of the set of stable matchings (which coincides with the core in this simple model) is the cor- relation of preferences among programs and among applicants. When preferences are highly correlated (i.e., when similar programs tend to agree which are the most desirable applicants, and applicants tend to agree which are the most desirable programs), the set of stable matchings is small. (When preferences are perfectly corre- lated, then there is a unique stable matching, so both algorithms would produce the same match- ing.) However, as the correlation of preferences goes down, the size of the set of stable match- ings grows, and more and more participants would be matched differently by the two algo- rithms. This is true independently of the size of the market.

" This is the natural measure for the size of the set of stable matchings in the present context, since the concern is with how many market participants will be affected by a

change in algorithms. Note, however, that it is different from the more common measure of the size of the set of stable matchings, the number of distinct stable matchings. If 20 applicants receive different assignments at different sta- ble matchings, there could be as many as 2'0 = 1,024 different stable matchings, in case the 20 applicants can be resolved into 10 independent pairwise interchanges of po- sitions, or there could be as few as two stable matchings, if all 20 applicants are involved in a single irreducible cycle. In either case, if there are 20,000 jobs being filled, we have been focusing on the approximately 20 applicants who receive different assignments when we conclude that the set of stable matchings is small.

Page 22: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

768 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

TABLE 9-THORACIC SURGERY: (A) NUMBERS OF APPLICANTS WHO COULD IMPROVE MATCHES BY TRUNCATING THEIR

ROLs; (B) NUMBERS OF PROGRAMS THAT COULD IMPROVE MATCHES BY TRUNCATING THEIR ROLs

Algorithm 1991 1992 1993 1994 1996

A. Applicants Who Could Improve by Truncating Their ROLs:

Preexisting NRMP none 2 applicants improve 2 applicants improve none none

(same ones who did (same ones who did

better when the better when the

algorithm changed) algorithm changed)

Applicant-proposing none none none none none

B. Programs That Could Improve by Truncating Their ROLs:

Preexisting NRMP none none none none none

Applicant-proposing none 2 programs improve 2 programs improve none none

(same programs that (same programs that

did worse when the did worse when the

algorithm changed) algorithm changed)

It turns out, however, that the size of the market also plays a critical role, in an interest- ing way. Consider the case in which preferences are uncorrelated (so the set of stable matchings is large). If every applicant could somehow interview and be interviewed for all of the po- sitions, then the set of stable matchings would grow larger and larger (even as a percentage of the number of applicants who could get differ- ent stable matchings) as the number of appli- cants and positions grew. Figure 1 shows that this percentage grows to over 90 percent by the time n reaches 1,000.

Of course, in a real market there is a limit to how many interviews an applicant can have, or a program can conduct. When we take this into account, we see that the set of stable matchings quickly becomes very small as the market be- comes large.

Specifically, let k equal the number of inter- views a candidate can have, and let n equal the number of applicants and positions in the mar- ket. Then, even when preferences are com- pletely uncorrelated, as k/n becomes small, the set of stable matchings becomes small. For ex- ample, if k = 15 (not an unreasonable approx- imation for the NRMP) and n > 10,000, fewer than 0.1I percent of applicants would receive a different match from the two algorithms. 12 That is, even with completely uncorrelated prefer-

ences, we see in this simple market the same one-in-a-thousand order of magnitude that we see in the NRMP. And for simple markets the size of the specialty matches like Thoracic Sur- gery, with n on the order of 100 positions, if we suppose that applicants interview at no more than k = 10 programs we find only about 2 percent of applicants receiving different matches from the two algorithms. Figure 2 graphs the curves for fixed k, as n goes from 10 to 10,000.

Especially in view of the fact that preferences are not uncorrelated in the medical matches, this means that the orders of magnitude of the ef- fects studied in the actual matches are very comparable to what we should expect of simple matches with similar values of k and n. Thus (once we look at both k and n) these simple markets turn out to provide a good approxima- tion not only for the direction of the effects we are seeing, but also for their size.

The reason this is important for the present study of the NRMP and specialty matches is that, in the theoretical study of simple mar- kets, we can look at what would happen when we know agents' true preferences, not just the ROLs they submit to the match (whereas in the study of real matches we have been using as data the submitted ROLs). One theoretical possibility for why we find such small poten- tial for strategic manipulation is that our data has been collected after such manipulation has already taken place. That is, one counte- rhypothesis might explain our results by pos-

12 The variance (based on 1,000 randomly generated simple markets) is well under 0.001 percent.

Page 23: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 769

1.0 .

0. e ? - ---__- --;

____ ______ . -m ean

Cn) 0.l I--- variance

0.4 -. ?

0.7

0.0 ,--_

0 200 400 600 800 1,000

n vad~~~~~~~ 0.4-~~~~~~~

FIGURE 1. SIZE OF THE SET OF STABLE MATCHINGS AS A FRACTION OF n, WHEN k - n (UNCORRELATED PREFERENCES)

Note: C(n) is the number of applicants who get different stable matches, when the market size is n.

iting that there are substantial opportunities for strategic manipulation but that these have been exhausted by the time we look at the ROLs submitted to the match, because the participants have already behaved strategi- cally in an optimal way. Another counterhy- pothesis could be that the hybrid nature of the preexisting NRMP algorithm in fact produces matches that are far from the worst possible stable matching for applicants, and that the set of stable matchings is therefore substan- tially larger than we detect. The results dis- cussed in this section show that these hypotheses are implausible, because when we looked at similarly sized artificial matches, in which we can examine the hypothetical participants' true preferences, we find that the set of stable matchings is close to the size we have computed from the ROL data. Thus the study of simple markets provides an explana- tion of not only the direction of the effects we have been examining, but also their small size. 13

VII. Theory and Computation in Economic Design: Some Methodological Reflections

Perhaps the first rule of any design effort is that "details matter." The details determine what outcomes are even feasible, and so they matter in the most basic aspects of design; and they have implications for all of the market's properties, so they matter for the subtlest as- pects of the design's consequences. Thus, every design effort will be different. But if we are to develop a body of knowledge about design practice in economics, we need to think about the methodological issues that may be common to many design efforts. This section is an at- tempt to put the methodological issues encoun- tered in the NRMP design and evaluation into a context that may be useful for other design efforts. Specifically, this design effort involved the continual interplay among various aspects of simple theory, computational experiments, and theoretical computation. The simple theory guided the design of computational experiments on the complex system, which provided unpre- dicted results that were then explained by the- oretical computation.

The reason why there are gaps between the- ory and design is that, just as design is detailed, theoretical models must often be sparse, to be useful for organizing and directing work in a variety of applications whose connections may become apparent only with the benefit of the-

13 It remains an open problem to develop analytical results that explain why the core of this simple market shrinks as the market grows when the number of interviews an applicant can go on remains constant. The fact that every worker who does get a different job at different stable matchings is involved in certain sorts of preference cycles may provide an avenue for obtaining such results.

Page 24: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

770 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

0.8 .

7080 ~100 0.7 t I6 ii

450S 354 ~ ' "' '" ' o t- '---t|--' 1-; * \ '> - - ! --- - -~~~~~~+ - -' - - - - - - ---

20 \ i

C(n) 0-5 --- ----- fl 0 3.,. \.' '.. .. . . . n I. ..........R59 - .: .-.

0.3 - ._ Vt F I 1 i ... . .. . .. .. .L. . ... . ..

0.2 0.2 >b 1?

0 __ t

1 0 100 19000 10000

n

FIGURE 2. SIZE OF THE SET OF STABLE MATCHINGS AS A FRACTION OF n FOR DIFFERENT VALUES OF k (UNCORRELATED PREFERENCES)

Notes: C(n) is the number of applicants who get different stable matches, when the market size is n; k is the number of programs on an applicant's ROL.

ory. Much of this paper has therefore been concerned with filling the gaps between simple abstract markets and complex real ones. But before we discuss the filling of gaps, it is useful to recall the essential role played by the theory of simple matching markets. This role ranged from suggesting the basic design of the clear- inghouse algorithm and the comparisons of the algorithms, to directing attention to aspects of the market in which problems might be antici- pated, and to offering insights into how these might be overcome.

It was the existing simple theory, and the empirical studies it permitted to be conducted on field data, that pointed to the importance of stable matchings. Although counterexamples showed that stable matchings might not exist in the complex American medical market (Roth, 1984), the theory of simple markets suggested a general architecture for an algorithm to find stable matchings. Furthermore, it showed that algorithms in which proposals were issued by

applicants could be expected to produce stable matchings as favorable as possible to appli- cants. In short, the body of theory that existed prior to the start of this design (e.g., as summa- rized in Roth and Sotomayor [1990]) already constituted a rough road map for the mechanism design and evaluation reported here.

At the same time, the existing body of theory, through counterexamples designed to explore its limits (inspired by empirical studies of ex- isting markets), pointed to questions that needed to be answered. These included the role of se- quencing in design of the algorithm, the fre- quency with which the algorithm might fail to find a stable matching, and the frequency with which opportunities for strategic manipulation might arise. These all required estimations of magnitudes, which in turn required computa- tional experiments on the data. Some of these computational experiments were straightfor- ward to conduct. But for estimating how often strategic opportunities might arise, the theory

Page 25: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 771

played an essential role in the design of the computational experiments.

Specifically, although the main conclusions about strategic behavior do not carry over from the simple to the complex market, the theory of the simple case gives us not only final conclusions, but also insight into the way that strategic behavior works. In the case of misrepresentation of ROLs, the way an applicant might gain an advantage, in either the simple or complex markets, is to state an ROL that causes him, at some point in the algorithm, to make a rejection that would not have been made if he had submitted his true preferences. This rejection causes a residency program to have a vacancy and hence make an offer to another applicant, who in turn may make a different rejection than he would have if the original applicant had stated his true preferences. It is the propagation of this "va- cancy chain" through the market that raises the possibility that an applicant could do bet- ter than to state his true preferences.14 The fact that the potential advantage comes from rejections being made implies in the simple model that the possibility of profitable strate- gic misrepresentations of ROLs can be inves- tigated by looking at only the small subset of misrepresentations that consist of truncations. To see if this was approximately true for the complex market required a computational ex- periment, and (when this proved to be the case) it became computationally feasible to investigate the strategic properties of the complex market, through an experiment con- centrating on truncations. Thus, the theory allowed us to see what computational exper- iments would give us the answer to a question that the theory alone could not answer.

While computational experiments on the data allow us to get answers that may not be avail- able from simple theory, they do not necessarily let us understand why the answers are what they are. In addition, results obtained from exploring a large and complex data set with a large and new piece of software (the new algorithm) need to be checked in some way, to make sure that

the results are not due to some unanticipated artifact of the way the algorithm deals with the complexities of the data.15 That is, although properly constructed computational experi- ments on the data offer us answers to questions we cannot answer with theory alone, we need both to check and to understand these answers before we can have the confidence in them that we would like to have before recommending that the new algorithm be considered for use in the market.

We addressed these issues in two ways: by computational experiments on the data from the Thoracic Surgery matches and by theoret- ical computation to determine how the size of the set of stable matches behaved in large simple markets. The first of these allowed us to exercise the software on a medical market free of the match variations present in the general medical market. The theory therefore permitted us to interpret the comparisons be- tween the two algorithms as unambiguously measuring the size of the set of stable match- ings. The small size of this set therefore had no possibility of resulting from some aspect of how the algorithm deals with match vari- ations.

The computations on thousands of ran- domly generated simple markets with fixed length of ROLs and varying numbers of par- ticipants allowed us to see how the size of the set of stable matchings shrinks as the market grows, which establishes a new kind of core convergence result. This shows that the match variations in the medical market do not sub- stantially contribute to the size of the set of stable matchings, since the results on the mar- ket data are entirely consistent with the re- sults for similarly sized simple markets. Given the theoretical results on strategic mis- representation, this core convergence result also shows that it is always a best reply for all

14 The propagation of vacancy chains as such in simple markets is a topic that was explored in the course of this design effort, and is reported in Yossi Blum et al. (1997).

15 Of course, it was necessary to check directly that the program worked, and in fact it was easy to confirm that the matchings it produced were stable as well as feasible with regard to all the match variations. The question we are referring to here is not whether the program does what it was designed to do, but rather whether the apparent small size of the set of stable matchings might have to do with some aspect of how the program handles the match varia- tions.

Page 26: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

772 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

but a tiny percentage of participants in large simple markets to state their true preferences.

Note that we distinguish between what we call the "computational experiments" on the actual NRMP data and the "theoretical com- putation" on the randomly generated simple markets. This has to do with our view that "theory" resides in the simplicity of the model and systematic nature of the conclusions, rather than the body of mathematical tech- nique traditionally associated with theory. The theoretical computations tell us how the difference between the applicant- and firm- optimal stable matches varies with the size of a simple market. This new computational re- sult, combined with existing theory, allows us to interpret this as precisely measuring the size of the core of the market, and to deter- mine the implications this has for the possi- bility of profitable strategic manipulation. The theorems explaining why the core must converge as it does will surely follow (see Feldin [1999] for some progress in this direction).

In summary, the design process discussed here involved interplay among various as- pects of simple theory, computational exper- iments, and theoretical computation. 16 We suspect that, as we build a body of engineer- ing practice in economics, this will prove to be a general pattern.

VIII. Concluding Remarks

The crisis of confidence that threatened to undermine participation in the NRMP was serious precisely because the kind of market failure which the NRMP was initially devel- oped to correct arose when residency pro- grams and applicants lost confidence in the existing market. But by the time of this mod- ern crisis, the historical market failure and how it was corrected by the NRMP were understood; and so was the fact that similar market failures in British medical markets had occurred and been corrected with stable matching mechanisms, while unstable mech-

anisms had failed (Roth, 1990, 1991). In ad- dition, the general class of market failures due to unraveling of appointment dates had been identified in many markets (Roth and Xing, 1994). Therefore, although physicians who had participated in the unraveling of the American medical market and in the forma- tion of the NRMP were no longer active, it was not difficult to communicate to the par- ticipants in the modern market why it was desirable to focus on changes in the market that would not reignite the unraveling of ap- pointment dates (Roth, 1996b).27 Thus, al- though what we knew about two-sided matching markets did not provide an imme- diate solution to the design of a new market for physicians, it provided clear guidelines and suggested clear approaches.

It was nevertheless troubling to us at the outset of this design effort that not only did none of the standard theorems about simple matching markets apply directly to the med- ical market, but counterexamples to the con- clusions of many of them were known to exist when the complications of the actual market were present. These counterexamples had the potential to be of great importance, as in the possibility that different stable matchings might yield different levels of employment (a possibility that does not arise in simple mar- kets). Indeed, our results show that in this market this possibility is real and so cannot be ruled out with better theory. But of the more

16 Laboratory experiments also have a role to play, al- though not one we will discuss here (but see Kagel and Roth [2000]).

17 Indeed the initial study proposal (Roth, 1995) quoted Hippocrates's famous dictum that, when preparing to treat a disease,

The physician must be able to tell the antecedents, know the present, and foretell the future, must me- diate these things, and have two special objects in view with regard to disease, namely, to do good or to do no harm.

In this connection it is worth mentioning that, partic- ularly because confidence in the market was the key issue, the study was conducted in an unusually public way, with progress reports posted regularly on the inter- net (see (http://www.economics.harvard.edu/-aroth/ nrmp.html)) and widely distributed to interested organizations of physicians and medical students. A final report, briefly summarizing the overall results as in Ta- bles 1 and 2, was presented to the medical community at large in Roth and Peranson (1997).

Page 27: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 773

than 100,000 applicants in the years we stud- ied in detail, only two applicants (one in 1987 and one in 1996) would have changed from employed to unemployed or vice versa at the different stable matchings we consider (see Table 2). Because this difference was both tiny and unsystematic, it did not play a role in the market design.

This and the related results about the small number of applicants who receive different matches at the different stable matchings point to a need to develop theory in ways that will tell us not only about the possibility of different effects, but also about their proba- bility and likely magnitudes. It seems to us that questions about magnitudes of the sort we encountered in the course of this design will often arise in efforts to employ economic theory in the design of institutions for com- plex markets. Theoretical computation can be a big help in this effort, as it was in the present case in clarifying the unexpected con- sequences of the simple fact that applicants can interview at only a small fraction of the available positions.

More generally, just as there is a chemical- engineering literature (and not just literature about theoretical and laboratory chemistry) and a medical literature (and not just a biol- ogy literature), economists need to develop a scientific literature concerned with practical problems of design. An engineering-oriented design literature, and the theory that supports it, will be different from the basic science on which it depends, both in emphasis and in method. If we do not develop such a litera- ture, the practical problems of design will be relegated to the arena of "just consulting," and we will fail to benefit from the accumu- lation of knowledge which is so evident in other kinds of engineering.

APPENDIX A: RESULTS OF THE COMPUTATIONAL EXPERIMENTS CONCERNED WITH SEQUENCING

Table Al presents results from the sequencing experiments on the preexisting NRMP algorithm. Table A2 summarizes results of the experiments related to sequencing in the applicant-proposing algorithm. Table A3 compares the results when input quotas are set to final quotas and reversion processing is eliminated to the initial results.

TABLE Al-EFFECTS OF SEQUENCE IN WHICH PROGRAMS

ARE PROCESSED

A. Results with Programs Processed in Descending Code Order Compared to Original Results with Preexisting NRMP Algorithm

Result 1993 1994 1995

Programs Improve none 2 2 Do worse 2 2 none

Applicants Improve 2 2 none Do worse none 2 2

B. Sequencing of Reversions: Results with Input Quotas Set to Final Quotas and Reversion Processing Eliminated, Compared to Original Results with Preexisting NRMP Algorithm

Result 1993 1994 1995a

Programs Improve 2 none none Do worse none 2 2

Applicants Improve none 2 2 Do worse 2 none none

Notes: In 1994, when some programs and applicants did better while others did worse, there was no correlation between the change in result and the code numbers of the applicants and programs.

a Subsequently, the years 1987 and 1996 were also ex- amined with similar results: no applicants were affected in 1987, two were affected in 1996.

Page 28: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

774 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

TABLE A2-SUMMARY OF RESULTS OF EXPERIMENTS RELATED TO SEQUENCING IN THE APPLICANT-PROPOSING ALGORITHM

Sequence of processing 1993 1994 1995

A. Baseline Results (When Program Selected from Stack, Applicants Processed in Ascending Program Rank Number Sequence)

Applicants ascending; singles and couples intermixed Match resulta Loops detected 3 6 4

B. Applicant and Couples Processing Sequence (When Program Selected from Stack, Applicants Processed in Ascending Program Rank Number Sequence)

Applicants descending; couples last Match result same same same Loops detected 0 0 0

Applicants ascending; couples last Match result same same same Loops detected 2 0 0

Applicants ascending; couples first Match result 2 applicants worse 2 applicants worse same Loops detected 3 6 1

Applicants descending; couples first Match result 2 applicants worse 2 applicants worse same Loops detected 1 59 3

C. Sequence of Processing Applicants Ranked by Program Selected from Program Stack (When Program Selected from Stack, Applicants Processed in Descending Program Rank Number Sequence)

Applicants ascending; singles and couples intermixed Match result same 9 applicants improved, same

3 applicants worseb Loops detected 17 148 62

Applicants ascending; couples last Match result same 9 applicants improved, same

3 applicants worseb Loops detected 2 0 0

a This is the base result to which others are compared. b In part C, the results for the two experiments for 1994 (couples intermixed and couples last) were the same. In both cases,

the differences in the results in part C as compared to the baseline results in part A were caused by chains resulting from two applicants doing worse in part C when compared to part A.

TABLE A3-RESULTS WITH INPUT QUOTAS SET TO FINAL

QUOTAS AND REVERSION PROCESSING ELIMINATED,

COMPARED TO INITIAL RESULTS WITH APPLICANT

PROPOSING ALGORITHM

Result 1993 1994 1995a

Programs

Improve none none none

Do worse none 2 2

Applicants

Improve none 2 2

Do worse none none none

a Subsequently, 1987 and 1996 were also examined, with

no applicants affected in 1987 and a single chain of nine

affected in 1996.

APPENDIX B: RESULTS OF THE COMPUTATIONAL

EXPERIMENTS CONCERNED WITH TRUNCATION OF

ROLS AND CAPACITY REDUCTIONS

The results of truncation at the match point are reported in Table B 1. Table B2 shows the results for iterative truncations of applicant ROLs, while Table B3 shows the corresponding results for it- erative truncations of program ROLs.

Page 29: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 775

TABLE B1-TRUNCATIONS AT THE MATCH POINT

1993 1994 1995

Difference in Result for Both the Preexisting NRMP Algorithm and the Applicant-Proposing Algorithm When Applicant ROLs Are Truncated at the Match Point:

none 2 applicants improve, same positions filled 2 applicants improve, same positions filled

Difference in Result for the Preexisting NRMP Algorithm When Program ROLs Are Truncated at the Match Point: none none 2 applicants do worse, same positions filled

Difference in Result for the Applicant-Proposing Algorithm When Program ROLs Are Truncated at the Match Point: none 3 applicants do worse, same number of positions filled, but not same positions (3 none

programs filled I less position; I program filled I more position; I program filled 2 more positions; I additional position was reverted from one program to another)

TABLE B2-RESULTS FOR ITERATIVE TRUNCATIONS OF APPLICANT ROLs

1987 1993 1994 1995 1996

Original Original Original Original Original NRMP Applicant- NRMP Applicant- NRMP Applicant- NRMP Applicant- NRMP Applicant-

algorithm proposing algorithm proposing algorithm proposing algorithm proposing algorithm proposing

Run T T&I T T&I T T&I T T&I T T&I T T&I T T&I T T&I T T&I T T&I

1 16,117 4,324 16,116 4,317 17,209 4,546 17,209 4,536 17,725 4,935 17,725 4,934 18,170 5,763 18,170 5,758 18,316 5,805 18,317 5,806 2 4,324 1,894 4,317 1,887 4,546 2,093 4,536 2,082 4,935 2,361 4,934 2,359 5,763 2,907 5,758 2,899 5,805 2,915 5,806 2,917 3 1,894 898 1,887 891 2,093 1,036 2,082 1,023 2,361 1,185 2,359 1,183 2,907 1,572 2,899 1,559 2,915 1,569 2,917 1,571 4 898 437 891 429 1,036 514 1,023 498 1,185 602 1,183 598 1,572 857 1,559 844 1,569 861 1,571 864 5 437 203 429 194 514 258 498 241 602 292 598 287 857 473 844 460 861 481 864 482 6 203 93 194 84 258 135 241 116 292 151 287 143 473 251 460 238 481 271 482 271 7 93 41 84 31 135 73 116 52 151 75 143 66 251 136 238 124 271 157 271 155 8 41 24 31 13 73 48 52 25 75 40 66 31 136 79 124 67 157 89 155 87 9 24 18 13 6 48 34 25 12 40 27 31 17 79 45 67 31 89 57 87 55

10 18 14 6 2 34 27 12 5 27 18 17 7 45 31 31 17 57 36 55 33 11 14 12 2 0 27 24 5 2 18 14 7 3 31 22 17 8 36 24 33 21 12 12 12 24 22 2 0 14 13 3 2 22 18 8 4 24 19 21 15 13 - - 22 22 13 13 2 2 18 16 4 2 19 15 15 13 14 - - - - - -- 16 16 2 2 15 14 13 12 15 - - ? ? ? ? ? ? ? ? ? ? ? ?14 13 12 11 16 - - ? ? ? ? ? ? ? ? ? ? ? ?13 12 11 10 17 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -- 12 11 10 9 18 - - ? ? ? ? ? ? ? ? ? ? ? ?11 11 9 9

Notes: Columns labeled 'T' report the number of matches involving truncated ROLs. Columns labeled "T & I" report the number of matches involving truncated ROLs and "improved" matches.

TABLE B3-RESULTS FOR ITERATIVE TRUNCATIONS OF PROGRAM ROLs

1987 1993 1994 1995 1996

Original Original Original Original Original NRMP Applicant- NRMP Applicant- NRMP Applicant- NRMP Applicant- NRMP Applicant-

algorithm proposing algorithm proposing algorithm proposing algorithm proposing algorithm proposing

Run T T&I T T&I T T&I T T&I T T&I T T&I T T&I T T&I T T&I T T&I

1 2,967 1,345 2,967 1,349 3,342 1,457 3,342 1,462 3,369 1,514 3,369 1,517 3,444 1,538 3,444 1,541 3,410 1,445 3,410 1,444 2 1,345 670 1,349 675 1,457 740 1,462 748 1,514 809 1,517 813 1,538 783 1,541 790 1,445 727 1,444 725 3 670 347 675 353 740 382 748 394 809 441 813 444 783 420 790 431 727 384 725 384 4 347 186 353 194 382 201 394 216 441 249 444 255 420 237 431 248 384 213 384 212 5 186 100 194 110 201 107 216 122 249 138 255 145 237 130 248 141 213 114 212 115 6 100 55 110 66 107 64 122 79 138 79 145 86 130 77 141 89 114 71 115 72 7 55 33 66 44 64 37 79 52 79 44 86 52 77 50 89 62 71 50 72 52 8 33 21 44 33 37 22 52 37 44 31 52 39 50 35 62 47 50 35 52 39 9 21 17 33 29 22 15 37 30 31 23 39 32 35 29 47 41 35 26 39 30

10 17 15 29 27 15 13 30 28 23 20 32 30 29 26 41 38 26 21 30 25 11 15 15 27 27 13 12 28 28 20 18 30 29 26 24 38 36 21 19 25 23 12 - - - - 12 12 - - 18 16 29 28 24 23 36 36 19 18 23 22 13 - - - - - - - - 16 15 28 27 23 23 - - 18 17 22 21 14 - - - - - - - - 15 15 27 27 - - - - 17 16 21 20 15 - - - - - - - - - - - - - - - - 16 15 20 19 16 - - - - - - - - - - - - - - - 15 14 19 18 17 -? ? ? ? ?? _ - - - - - - 14 14 18 18

Notes: Columns labeled "T" report the number of matches involving truncated ROL's. Columns labeled "T & I" report the number of matches involving truncated ROLs and "improved" matches.

Page 30: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

776 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

APPENDIX C: FORMAL DEFINITIONS OF STABILITY

Simple Matching Markets

For markets without linkages between positions we use the "college admissions" model as refor- mulated in Roth (1985) and Roth and Sotomayor (1990 Ch. 5). There are two finite and disjoint sets, F= { fl, . * fn I and W = { w , ..., wm} , of firns and workers. For each firm f in f, there is a positive integer qf, which indicates the number of (identical) positions f has to offer.

An outcome is a matching of workers to firms, such that each worker is matched to at most one firm, and each firm is matched to at most its quota of workers. It will be convenient to denote a firm that has some number of un- filled positions as matched to itself in each of those positions, and similarly an unmatched worker will be matched to herself. To give a formal definition, define for any set X an unor- dered family of elements of X to be a collection of elements, not necessarily distinct, in which the order is immaterial.

A matching ,u is a function from the set a U nW into the set of unordered families of ele- ments of F U W such that:

(i) I ,u(w)I = 1 for every worker w and ,u(w) = w if ,u(w) 0 I;

(ii) I,(f)| = qf for every firm f, and if the number of workers in ,L(f), say r, is less than qf, then ,L(f) contains qf- r copies off;

(iii) ,u(w) = f if and only if w is in pL(f).

Each worker has preferences over the firms (and the possibility of remaining unmatched in the market), and each firm has preferences over the workers (and the possibility of leaving a position unfilled). All preferences are transitive and strict (recall that, in the markets we con- sider, participants are obliged to submit rank orders which are necessarily strict). We will write fi >w fj to indicate that worker w prefers fi to fj. Similarly, wi >f wj represents firm fs preferences P(f) over individual workers. Firn f is acceptable to worker w iff >W w, and worker w is acceptable to firmf if w >ff (i.e., an acceptable firmn is one that is preferable to being unmatched, and an acceptable worker is one which the firm prefers to leaving a position unfilled).

Each worker's preferences over alternative matchings correspond exactly to her preferences over her own assigmnents at the two matchings. Things are not quite so simple for firms, because even though we have described firms' preferences over workers, each firm with a quota greater than 1 must be able to compare groups of workers in order to compare alternative matchings. It will be sufficient for our purposes to assume merely that a firm's preferences over groups of employees it could be matched with (i.e., over groups of not more than qf workers) are such that, for any two assignments that differ in only one worker, it prefers the assignment containing the more- preferred worker (and is indifferent between them if it is indifferent between the workers). Any pref- erences of this sort are called responsive to the firm's preferences over individual workers (Roth, 1985).

A matching ,u is individually irrational if ,u(w) = f for some worker w and firm f such that either the worker is unacceptable to the firm or the firm is unacceptable to the worker. Such a match- ing will also be said to be blocked by the unhappy agent. A firmf and worker w will be said together to block a matching , if they are not matched to one another at ,u but would both prefer to be matched to one another than to (one of) their present assignments. That is, p. is blocked by the firm-worker pair (f, w) if p(w) 7 f and if f >w p.(w) and w >f u for some o in .(f ). (Note that o- may equal either some worker w' in .(f) or, if one of firm f s positions is unfilled at .(f), a may equalf.) Matchings blocked in this way by an individual or by a pair of agents are unstable in the sense that there are agents with both the in- centive (because preferences are responsive) and the power (under rules that allow any firm and worker to conclude an agreement with each other) to disrupt such matchings. We can now define a matching p. to be stable if it is not blocked by any individual or any firm-nworker pair. 18

Complex Matches

In the medical markets served by the NRMP, the employers are residency programs, and the

18 This definition of stability appears to account only for coalitions of size 1 or 2, but in fact it accounts for coalitions of any size (i.e., stable matchings are in the core; see Roth and Sotomayor [1990]).

Page 31: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL. 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 777

workers are physicians applying to those pro- grams. The simple model of the previous sec- tion does not allow for the variety of matching requirements observed in the medical market, for which purpose we will have to distinguish between different kinds of applicants and dif- ferent kinds of residency programs.

Let the set of applicants be -A = -A1 U A9 U C, where A1 is the set of (single) applicants who seek no more than one position, .AZ is the set of applicants who may want two jobs, and who submit supplemental lists of first-year jobs in connection with any second-year position on their ROLs that requires a complementary first- year position (and does not come with one au- tomatically), and C is the set of couples, who submit a single ROL listing pairs of positions. A member of Cis a couple { a , aj ) such that ai is in the set _43 (of husbands) and a* is in the set A4, and the sets -A1, GA2, .A3, and .A4 are sets of applicants, who together make up the entire population of individual applicants, which will be denoted SA' = .A, U SAQ U gA3 U gA4. (The

A.A may not be disjoint, since members of a couple may also submit supplemental lists.) The reason for denoting the set of applicants both as A and as A' is that, from the point of view of a potential employer, the members of a couple C = { ai, aj } are two distinct applicants who seek distinct positions (typically in different residency programs), while from the point of view of the couple they are one agent with preferences over pairs of positions.

The set of residency programs is R- {r1, ... , r,, }, and associated with each program r is a positive integer qr indicating how many positions it seeks to fill. However, for some programs r, qr may not be a constant at every point in the matching process. There are two reasons why qr may change. A residency pro- gram r may have an agreement with another residency program r' (typically within the same hospital) that if r can only fill k < qr positions, the remaining qr - k positions will be added to the capacity of r'. In such a situation, the algo- rithm will change qr to k and q, to q,. + (q, -

k). (It can also happen that the qi - k unfilled positions revert to more than one other resi- dency program, and so the total number of positions need not remain constant, and differ- ent positions from a given program may revert to different programs.) The other reason why

quotas may vary is that some residency pro- grams wish to have an even number of resi- dents, so a residency program r with quota q may have its quota reduced to q' = q - 2 in the event that it can only be matched to a maximum of q - 1 residents. (These quota adjustments take place after an initial attempt to make a stable match, and they cause the match- ing algorithm to continue from the current match; in what follows, discussion of stability will refer to the current quota of a program r at any point in the algorithm, except as indicated.)

Applicants in the set -A1 submit ROLs over residency programs and hence have preferences just like the workers in the simple model dis- cussed earlier. Applicants in the set gA2 have on their ROLs at least one second-year program that requires (but does not supply) first-year training as well, and these applicants submit a supplemental ROL for each such position, indi- cating their preferences for first-year positions, conditional on being matched to a given second- year position. Each couple c = { ai, aj } in the set C submits, as a single ROL, a ranked list of ordered pairs of positions [i.e., an ordered list of elements of R X R whose first element is some (ri, rj) which is the couple's first-choice pair of positions for ai and aj, respectively, and so forth]. Each residency program submits as its ROL an ordered list of members of A' (i.e., of individual applicants, whether or not they are members of a couple).

Having thus defined the form in which dif- ferent kinds of agents state their preferences, we can now define stable matchings. A matching ,u with range R U A' is defined as in the simple market, except that for an applicant a in the set SA2 it may be that I,ui(a)I = 1 or 2 if ,u(a) matches a to a program for which it has sub- mitted a supplemental ROL. In case I,u(a) I = 2 we will write ,u(a) = (r1, r2), where rI is the (second-year) residency program on a's pri- mary ROL, and r2 is the (first-year) residency program on a's supplemental ROL when a is matched with r . (When I|,u(a)I = 1 it must be that r - ,u(a) is on a's primary ROL.)

As in the case of the simple market considered earlier, we will say that a matching is stable if it is not blocked by any individual agent or by a pair of agents consisting of an individual and a residency programn, or by a couple together with one or two residency programs.

Page 32: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

778 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

A matching ,u is blocked by an individual ap- plicant (in the set Al or A2), or by a residency program, if p matches that agent to some individ- ual or residency program not on its ROL, pre- cisely as in the simple model. A matching is blocked by an individual couple {ai, a)} if they are matched to a pair (ri, rj) not on their ROL. Of course no individual or couple blocks a matching at which the individual or couple is unmatched.

A residency program r and an applicant a in the set A1 together block a matching ,t pre- cisely as in the simple market, if they are not matched to one another and would both prefer to be. A residency program r and an applicant a in the set A2 together block a matching ,u if r prefers a to one of its matches under ,t [i.e., ajr > o for some o- in ,t(r)], and if either r >a r, E ,u(a) where the preferences >a

correspond to a's primary ROL, or r>a r2 E ,u(a) where >a corresponds to a's supplemen- tal ROL for the position r, E ,u(a).

A couple c - {a,, a2} and residency programs r and r' block a matching ,t if (r, r') > ,u(c) and if either:

(i) al I ,t(r), a1 >r f for some o- E ,u(r) and either a2 E ,u(r') or a2 >r' o-' for some o'E ,u (r'); or

(ii) a2 ? pA(r'), a2 >r u' for some o' E ,(r') and either a1 E ,t(r) ora1 >a r for some o- E ,u(r).

REFERENCES

Aldershof, Brian and Carducci, Olivia M. "Stable Matchings with Couples." Discrete Applied Mathematics, August 1996, 68(1-2), pp. 203-07.

AMA-MSS. "Resolutions for the 1995 Interim Meeting." [Online: (http://www.bcm.tnc.edu/ ama-mss/i95res.htm#1 1)], 1995.

American Medical Students' Association and Public Citizen Health Research Group. "Re- port on Hospital Bias in the NRMP." [Online: (http://pubweb.acns.nwu.edu/-alan/nrmp2. html)], 1995.

Ausubel, Lawrence M.; Cramton, Peter; McAfee, R. Preston and McMillan, John. "Synergies in Wireless Telephony: Evi- dence from the Broadband PCS Auctions." Journal of Economics and Management

Strategy (Special Issue: Market Design and the Spectrum Auctions), Fall 1997, 6(3), pp. 497-527.

Blum, Yossi; Roth, Alvin E. and Rothblum, Uriel G. "Vacancy Chains and Equilibration in Senior-Level Labor Markets." Journal of Economic Theory, October 1997, 76(2), pp. 362-411.

Blum, Yosef and Rothblum, Uriel G. " 'Timing Is Everything' and Marital Bliss." Journal of Economic Theory, 1999 (forthcoming).

Colenbrander, August. "Match Algorithms Re- visited." Academic Medicine, May 1996, 71(5), pp. 414-15.

Cramton, Peter. "The FCC Spectrum Auctions: An Early Assessment." Journal of Economics and Management Strategy (Special Issue: Market Design and the Spectrum Auctions), Fall 1997, 6(3), pp. 431-95.

Demange, Gabrielle; Gale, David and Sotomayor, Marilda. "A Further Note on the Stable Matching Problem." Discrete Applied Math- ematics, 1987, 16, pp. 217-22.

Feldin, Aljosa. "Core Convergence in Two- Sided Matching Markets: Some Theoretical Considerations." Mimeo, University of Pitts- burgh, 1999.

Gale, David and Shapley, Lloyd. "College Ad- missions and the Stability of Marriage." American Mathematical Monthly, January 1962, 69(1), pp. 9-15.

Kagel, John H. and Roth, Alvin E. "The Dynam- ics of Reorganization in Matching Markets: A Laboratory Experiment Motivated by a Natural Experiment." Quarterly Journal of Economics, 2000 (forthcoming).

Ledyard, John O.; Porter, David and Rangel, An- tonio. "Experiments Testing Multiobject Al- location Mechanisms." Journal of Economics and Management Strategy (Special Issue: Market Design and the Spectrum Auctions), Fall 1997, 6(3), pp. 639-75.

McAfee, R. Preston and McMillan, John. "Ana- lyzing the Airwaves Auction." Journal of Economic Perspectives, Winter 1996, 10(1), pp. 159-75.

McMillan, John. "Selling Spectrum Rights." Journal of Economic Perspectives, Summer 1994, 8(3), pp. 145-62.

. "Why Auction the Spectrum?" Tele- communications Policy, April 1995, 19, pp. 191-99.

Page 33: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

VOL 89 NO. 4 ROTH AND PERANSON: MATCHING MARKET FOR PHYSICIANS 779

Milgrom, Paul. "Auction Theory in Practice: The Simultaneous Ascending Auction." Mimeo, Stanford University, 1997.

Peranson, E. and Randlett, R. R. "The NRMP Matching Algorithm Revisited: Theory ver- sus Practice." Academic Medicine, June 1995a, 70(6), pp. 477-84.

. "Comments on Williams' 'A Reexam- ination of the NRMP Matching Algorithm'." Academic Medicine, June 1995b, 70(6), pp. 490-94.

Plott, Charles R. "Laboratory Experimental Testbeds: Application to the PCS Auction." Journal of Economics and Management Strategy (Special Issue: Market Design and the Spectrum Auctions), Fall 1997, 6(3), pp. 605-38.

Roth, Alvin E. "The Economics of Matching: Stability and Incentives." Mathematics of Operations Research, 1982, 7, pp. 617-28.

. "The Evolution of the Labor Market for Medical Interns and Residents: A Case Study in Game Theory." Journal of Political Economy, December 1984, 92(6), pp. 991- 1016.

. "The College Admissions Problem Is Not Equivalent to the Marriage Problem." Journal of Economic Theory, August 1985, 36(2), pp. 277-88.

. "On the Allocation of Residents to Rural Hospitals: A General Property of Two- Sided Matching Markets." Econometrica, March 1986, 54(2), pp. 425-27.

. "New Physicians: A Natural Experi- ment in Market Organization." Science, De- cember 14, 1990, 250, pp. 1524-28.

. "A Natural Experiment in the Organi- zation of Entry-Level Labor Markets: Re- gional Markets for New Physicians and Surgeons in the United Kingdom." American Economic Review, June 1991, 81(3), pp. 415-40.

."Proposed Research Program: Evalua- tion of Changes to Be Considered in the NRMP Algorithm." Consultant's report to the National Resident Matching Program [Online: (http://www.economics.harvard.edu/ -alroth/nrmp.html)], 1995.

.__ "Interim Report No. 1: Evaluation of the Current NRMP Algorithm, and Preliminary Design of an Applicant-Proposing Algorithm." Consultant's report to the National Resident

Matching Program [Online: (http://www. economics.harvard.edu/-alroth/nrmp.html)], March 1996a.

. "The NRMP as a Labor Market." Journal of the American Medical Associa- tion, April 3, 1996b, 275(13), pp. 1054-56.

Roth, Alvin E. and Peranson, Elliott. "The Effects of the Change in the NRMP Matching Algo- rithm." Journal of the American Medical As- sociation, September 3, 1997, 278(9), pp. 729-32.

Roth, Alvin E. and Rothblum, Uriel G. 'Truncation Strategies in Matching Markets: In Search of Practical Advice for Participants." Economet- rica, January 1999, 67(1), pp. 21-43.

Roth, Alvin E. and Sotomayor, Marilda. "The College Admissions Problem Revisited." Econometrica, May 1989, 57(3), pp. 559-70.

. Two-sided matching: A study in game- theoretic modeling and analysis. Economet- ric Society Monograph Series. Cambridge: Cambridge University Press, 1990.

Roth, Alvin E. and Vande Vate, John H. "Ran- dom Paths to Stability in Two-Sided Match- ing." Econometrica, November 1990, 58(6), pp. 1475-80.

. "Incentives in Two-Sided Matching with Random Stable Mechanisms." Economic Theory, January 1991, 1(1), pp. 31-44.

Roth, Alvin E. and Xing, Xiaolin. "Jumping the Gun: Imperfections and Institutions Related to the Timing of Market Transactions." American Economic Review, September 1994, 84(4), pp. 992-1044.

Salant, David J. "Up in the Air: GTE's Experi- ence in the MTA Auction for Personal Com- munication Services Licenses." Journal of Economics and Management Strategy (Spe- cial Issue: Market Design and the Spectrum Auctions), Fall 1997, 6(3), pp. 549-72.

Shiller, Robert J. Macro markets: Creating in- stitutions for managing society's largest eco- nomic risks. Clarendon Lectures in Economics. Oxford: Oxford University Press, 1993.

Sonmez, Tayfun. "Manipulation via Capacities in Two-Sided Matching Markets." Journal of Economic Theory, November 1997, 77(1), pp. 197-204.

. "Can Pre-arranged Matches Be Avoided in Two-Sided Matching Markets?" Journal of Economic Theory, May 1999,

Page 34: American Economic Association - Computer Science...surprising properties of large labor markets emerged. The high transaction costs involved in interviewing place a practical limit

780 THE AMERICAN ECONOMIC REVIEW SEPTEMBER 1999

86(1), pp. 148-56. U.S. District Court for the Western District of

Missouri, Western Division. United States of America v. Association of Family Practice Residency Directors, Final judgment. May 28, 1996.

Williams, K. J. "A Reexamination of the NRMP Matching Algorithm." Academic Medicine,

June 1995a, 70(6), pp. 470-76. _____ "'Comments on Peranson and

Randlett' s 'The NRMP Matching Algo- rithm Revisited: Theory versus Practice'.." Academic Medicine, June 1995b, 70(6), pp. 485-89.

Wilson, Robert B. Nonlinear pricing. Oxford: Oxford University Press, 1993.