Is There a Correlation Between Law Professor Publication Counts, Law Review Citation Counts, and Teaching Evaluations? An Empirical Study

Is There a Correlation Between LawProfessor Publication Counts, LawReview Citation Counts, and TeachingEvaluations? An Empirical StudyBenjamin Barton*

This empirical study attempts to answer an age-old debate in legal academia:whether scholarly productivity helps or hurts teaching. The study is of anunprecedented size and scope. It covers every tenured or tenure-track facultymember at 19 U.S. law schools, a total of 623 professors. The study gathersfour years of teaching evaluation data (calendar years 2000–2003) and testsfor associations between the teaching data and five different measures ofresearch productivity/scholarly influence. The results are counterintuitive:there is either no correlation or a slight positive correlation between teachingeffectiveness and any of the five measures of research productivity. Given thebreadth of the study, this finding is quite robust. These findings should helpinform debates about teaching and scholarship among law school and other

*University of Tennessee College of Law, 1505 W. Cumberland Ave., Knoxville, TN 37996;email: [email protected].

The author specially thanks Indya Kincannon, Glenn Reynolds, Eugene Volokh, JamesLindgren, Brian Leiter, James Maule, Richard E. Redding, Brannon Denning, Tom Galligan,Joan Heminway, Mae Quinn, Greg Stein, Jennifer Hendricks, Geroge Kuney, Jeff Hirsch, ChrisSagers, the participants of faculty forums at the Villanova University School of Law, CumberlandSchool of Law, Samford University, the University of Tennessee College of Law, the Southeast-ern Association of Law Schools Panel on Empirical Research, the University of TennesseeCollege of Law for generous research support, and the Honorable Diana Gribbon Motz. Theauthor thanks the faculty and administration (and particularly the deans and associate deans)of all the participating schools: the University of Colorado School of Law, the University ofConnecticut School of Law, the Cumberland School of Law, Samford University, the LevinCollege of Law at the University of Florida, the University of Iowa College of Law, the Lewis &Clark Law School, the University of Michigan Law School, the University of North Dakota LawSchool, the Northwestern University School of Law, the Moritz College of Law at the Ohio StateUniversity, the Penn State Dickinson School of Law, the Southwestern Law School, St. John’sUniversity School of Law, the University of Tennessee College of Law, the Texas Tech UniversitySchool of Law, the University of Toledo Law School, the UCLA School of Law, the VillanovaUniversity School of Law, and the Wayne State University Law School.

Journal of Empirical Legal StudiesVolume 5, Issue 3, 619–644, September 2008

© 2008, Copyright the AuthorJournal compilation © 2008, Cornell Law School and Wiley Periodicals, Inc.

619

mailto:[email protected]

faculties and likely require some soul-searching about the interactionbetween the two most important functions of U.S. law schools.

Anyone who has spent any time in legal academia has heard some version ofthe scholarship versus teaching debate. The debate breaks down into twocamps that I call “pro-teaching” and “pro-scholarship” for brevity’s sake,although I recognize that either camp may object to these labels as toosimplistic. The pro-teaching folks bemoan how current legal academia placesexcessive emphasis on scholarly pursuits, and argue that we are inevitablyshortchanging our students. The pro-scholarship group retorts that our bestscholars are naturally our best and most up-to-date teachers. Thus, it is thefaculty who neglect scholarship who are actually harming students. Thisdebate has also been echoed in various law review articles (Korobkin 1998(pro-scholarship); Scordato 1990 (pro-teaching)).

On the one hand, it seems likely that working hard on scholarshipshould have a positive effect on teaching. Productive scholars do tend to stayon top of their research areas, and are also very engaged with the materialabout which they write. On the other hand, it also makes sense that if lawprofessors are spending more and more of their time on scholarship theymust be shortchanging their teaching.

The question itself is likely impossible to answer definitively, especiallysince there is little agreement about how to measure either the quality ofteaching or the quality/impact of legal scholarship. This study does not try toanswer these questions once and for all, but does the best it can with theavailable data. I gathered three different types of data from the tenured ortenure-track faculties at 19 U.S. law schools. I gathered four years of teachingevaluations. I also did a publication count across those same four years.Lastly, I gathered law review citation data. I used the publication count andcitation data to generate a total of five different measures of scholarly output,and correlated each measure against teaching evaluations. The study foundthat there is either no correlation or a slight positive correlation betweenteaching evaluations and publication count or citation counts.

I. Previous Studies

Over the last 50 years there have been a number of studies of the correlationbetween teaching effectiveness and research productivity in higher educa-tion. There are three excellent overviews of these studies: a meta-analysis by

620 Barton

John Hattie and Herbert Marsh (Hattie & Marsh 1996), John Braxton’ssynthesis (Braxton 1996), and Kenneth Feldman’s earlier collection ofstudies (Feldman 1987). Overall, these three overviews establish that thereis either no correlation or a slight positive correlation between researchproductivity and teaching.

As a matter of methodology, the various studies underlying Hattie andMarsh, Braxton, or Feldman are something of a mixed bag. Many rely onself-reporting for either teaching effectiveness or scholarly productivity, andmost focus on a single institution or department. Because many are self-reported, they rely on the portion of the total studied population that choseto respond to the study. The timeframes studied are also typically muchshorter than four years.

Hattie and Marsh sought to avoid many of these problems intheir more recent study of 182 Australian university professors (2002).Their thorough study used multiple measures of research productivity, andalso made a strong finding of no correlation. I followed many of the meth-odological suggestions of Hattie and Marsh in designing this study.Notably, I studied every tenured or tenure-track faculty member at 19different institutions over four calendar years, and none of the data areself-reported.

There are two prior law school studies of the correlation betweenresearch productivity and teaching effectiveness. Deborah Jones Merrittstudied whether there was any correlation between law school teachingawards and scholarly productivity, and concluded that there was no statisti-cally significant correlation (Merritt 1998). Merritt’s excellent study is far-reaching, and remarkably broad (it includes 832 law professors) and coversmany subjects outside the teaching/scholarship correlation. Nevertheless,Merritt herself recognizes that her teaching award data involved self-reporting and might have been unreliable. Furthermore, using teachingawards as a proxy for teaching effectiveness is somewhat problematic becausethese awards are handled differently at every institution.

James Lindgren and Allison Nagelberg conducted a second study(Lindgren & Nagelberg 1998). It looked at 48 professors (16 professors eachfrom three law schools—the University of Chicago, the University of Colo-rado, and Boston University). The 16 faculty members selected were theeight most-cited faculty members and the eight least-cited faculty members ateach school from a separate citation study by Theodore Eisenberg andMartin T. Wells (Eisenberg & Wells 1998). Teaching evaluations were gath-ered for each of these faculty members, and Lindgren and Nagelberg did a

Publication and Citation Counts and Teaching Evaluations 621

correlation study. They found a statistically significant correlation of 0.20,and also that the most highly cited professors were more likely to have higherteaching evaluations than the least-cited professors.

The Lindgren and Nagelberg study is well done; however, it is on asmaller scale than this study and tracks only three relatively nondiverseinstitutions. Further, the choice to eliminate the middle of the facultyin terms of citations makes the conclusions less concrete than a studythat covers the entire faculties of 19 different law schools. Lindgren andNagelberg chose to study the extremes on each faculty because of theuncertainty of citation studies as a measure of scholarly influence (Lindgren& Nagelberg 1998). Nevertheless, if there is a concern over the accuracy ofa single measure of scholarly influence, it is better to try multiple measuresthan to eliminate a substantial chunk of the studied faculties.

II. Methodology

In 2003, I set out to study whether there is a correlation between teachingevaluations and scholarly productivity in U.S. law schools. I planned togather teaching evaluation data for a four-year period (2000–2003) from20 U.S. law schools and then correlate these data with a study of facultyproductivity over the same period of time. I wrote a prospectus describingthe project and mailed it and a cover letter to the deans of 40 U.S. lawschools.1

The response rate was significantly below the 50 percent I had hopedfor. I ended up writing every dean and associate dean of every ABA orAALS accredited law school in the United States, which resulted in only 13schools agreeing to participate. I then turned to State Freedom of Infor-mation Act requests, and eventually gathered data from a total of 19 lawschools. Because the process of gathering the teaching evaluations was sodifficult, and reaching my goal of 20 law schools would have likely requireda public information lawsuit, I decided to go forward with 19 rather than20 law schools.

1A copy of the original prospectus can be found at ⟨http://www.law.utk.edu/FACULTY/BartonProp.PDF⟩.

622 Barton

http://www.law.utk.edu/FACULTY

A. Who Was Studied

The 19 law schools studied are:

1. The University of Colorado School of Law;2. The University of Connecticut School of Law;3. The Cumberland School of Law, Samford University;4. The Levin College of Law at the University of Florida;5. The University of Iowa College of Law;6. The Lewis & Clark Law School;7. The University of Michigan Law School;8. The University of North Dakota Law School;9. The Northwestern University School of Law;

10. The Moritz College of Law at the Ohio State University;11. The Penn State Dickinson School of Law;12. The Southwestern Law School;13. St. John’s University School of Law;14. The University of Tennessee College of Law;15. The Texas Tech University School of Law;16. The University of Toledo Law School;17. The UCLA School of Law;18. The Villanova University School of Law; and19. The Wayne State University Law School.

These 19 law schools are a reasonably representative sample. The sampleincludes schools from many regions of the country, from varying levels ofacademic reputation, of varying size, and a balance of public and privateinstitutions. Given that there are only 193 ABA accredited law schools (ABA2006b), gathering four years of data on every tenured and tenure-trackfaculty member from almost 10 percent of the total number of U.S. lawschools offers a reasonable sample.

A total of 623 faculty members were studied from these 19 schools.I generated a list of the tenured or tenure-track faculty for each of theschools during the study period. I included only tenured and tenure-trackfaculty because at many schools there is not a requirement of, or supportfor, scholarly activities of the library, clinical, writing, or adjunct faculty.Because these faculty members are not required or encouraged to publish,many do not. Including those faculty members therefore could haveskewed the analysis.


There were, of course, faculty coming and going during a four-yearperiod, so I included any faculty member who had teaching evaluations fromat least four different classes during the time period. This ensured that eachprofessor had a fair number of responses, and also allowed the study to coveras many faculty members as possible.

B. What Was Studied—Teaching Evaluation Data

For each school, I chose the question on the teaching evaluation sheet thatmost closely measured teaching effectiveness. For example, the University ofTennessee College of Law’s form asks the students to rank the professorfrom 1–5 (with 5 being the highest ranking) on the “Instructor’s effective-ness in teaching material.” The results can be found on a publicly accessiblewebsite (University of Tennessee 2006). Of the 19 schools, 13 schools askeda somewhat similar question and ranked the professor from 1–5. Two of theother schools ranked from 5–1 (with 1 being the best ranking), one rankedfrom 4–1 (again with 1 as the best), and one each ranked from 1–4, 1–7, and1–9, with 1 being the lowest. I then took the teaching evaluation data foreach professor and averaged the data over the four-year period.2

In making the individual, school-level correlations, I used the teachingevaluation data as found and then correlated them against the variousmeasures of productivity and influence. To make a mass correlation amongstall the schools, I made two transformations.

The first was a linear transformation of each school’s teaching evalua-tions. I converted each professor’s average teaching evaluation score into ateaching evaluation index that ranged from 0–100. The index was calculatedas follows. For the 13 schools that ranked professors from 1–5, I used thefollowing equation:

Teaching evaluation −×

14

100.

2Wherever possible, I used a per-student response average. Four of the schools supplied class-level averages only, and one provided only averages by professor. I recognize that averaging theresponses discards the richness of the student-level data I have for some of the other schools.These data will be utilized in the later multivariate analysis.

624 Barton

By definition, this score will range from 0–100.3

Second, I generated a z score for each professor within his or herschool’s teaching evaluations. The z score calculated the distance from eachschool’s mean teaching evaluation scores in standard deviation units. Thesez scores were then amalgamated and used for various mass correlations.4

There are weaknesses in amalgamating the data in either of these ways,and I recognize that some readers may discount the mass correlationsbecause of those weaknesses. In particular, there are the different questionsconsidered across the schools, and the psychological differences associatedwith the different numerical scales (i.e., students likely react differently to aform that places 1 as the highest score, rather than 5, and also probably treata 7-point scale differently from a 4-point scale). Moreover, most of thestudents at any particular school will have had no exposure to the teachingof professors at any other school considered, so the rankings are likely to berelative within each law school.

Nevertheless, generating the two indexes allowed for mass correlationsthat cover the entire tenured and tenure-track faculty at 19 schools, a sub-stantial advantage over the 19 separate correlations. Further, some of theconcerns about these transformations are blunted by the uniformity of theeventual results: none of the individual schools showed a strong correlationbetween teaching evaluations and any of the measures of scholarship, andthe mass correlations reached a similar result.

I also am aware that using teaching evaluations at all is somewhatcontroversial. There are studies, both within law schools and higher educa-tion in general, that claim to establish that teaching evaluations have biases,including biases based on race (Smith 1999), gender (Farley 1996), and evenphysical attractiveness (O’Reilly 1987). The methodologies of these, andother studies finding bias, are open to criticism, however (Marsh 2007).

Many law faculty members have argued to me that teaching evaluationsare little more than a popularity contest. Some even have argued that teach-ing effectiveness is inversely correlated with teaching evaluations, since stu-

3I subtracted 1 from both the numerator and denominator to match the evaluation scale, whichbottoms at 1, not 0. So, a very high evaluation average of 4.8 would result in an index score of95 (4.8 - 1/4 ¥ 100), and a moderate to low average of 3.2 results in an index of 55 (3.2 -1/4 ¥ 100). The other calculations were variations on this theme and always resulted in anumber from 0–100.

4For a description of what z scores are and how they are calculated, please see Newton andRudestam (1999). I used a program called “zscore” in Stata 9.1 to calculate the z scores.


dents tend to highly rank “easy professors” of little substance, while rankingthose professors who challenge them comparatively lower.

I have three responses to these objections that I offer in ascendingorder of stridence. First, teacher evaluations are the only viable way to evenattempt to measure teaching effectiveness for this type of study.5

Second, even the most vociferous critic of teaching evaluations has toadmit they measure something. Notably, they measure what the students whoresponded on a particular day thought of the professor’s performance inthat class over the course of the semester. Even if critics contend thatteaching evaluations utterly fail to capture teaching effectiveness, they canlimit themselves to the question of whether teaching evaluations (whateverthey measure) correlate with scholarly productivity.

Third, studies have shown that student teaching evaluations are validand positively correlated with other measures of teaching effectiveness,including peer reviews. These studies suggest that, at a minimum, teachingevaluations track other evaluations of teacher effectiveness (Bok 2003; Marsh& Roche 2000; Marsh 1984, 1987). Marsh’s recent overview of the validityand multidimensionality of student evaluations offers a particularly persua-sive and robust defense (Marsh 2007).

Lastly, the critics of teaching evaluations seem to assume that lawstudents are incapable of accurately measuring the teaching they receive inlaw school. As a relatively recent graduate of law school, I find that argumentpatronizing at best and insulting at worst. Law students have sat through aminimum of 16 years of organized instruction before they rate their first lawschool class, and it defies common sense to say that they have learned so littleabout discerning good teaching from bad that they cannot accurately rank aprofessor’s teaching effectiveness on a 5-point scale.

C. What Was Studied—Scholarly Productivity

I also decided to measure the scholarly productivity of the same group ofprofessors over the same four-year period, 2000–2003. After consultations

5My other choices were exceedingly unpalatable: (1) attempt to gather peer-evaluation data,which is rarely, if ever, expressed numerically, and would also almost certainly not be providedby the host institutions; or (2) use some type of personal subjective measure of teachingeffectiveness, potentially requiring me to personally visit classes and make my own determina-tion on teaching effectiveness. Moreover, I note that almost all the nonlaw studies of teachingeffectiveness have used student teaching evaluations (see, e.g., Hattie & Marsh 1996).

626 Barton

with various law professors, I settled on three similar measures, one thatsimply counted each professor’s publications during 2000–2003, one thatemphasized purely “scholarly” activity (i.e., scholarly books and articles),and one that emphasized publications more directly focused on thepractice of law (i.e., treatises, casebooks, and practitioner articles in barjournals).6

For all the measures, I gathered publication data for each individualprofessor.7 From these three sources I generated a list of publications dated2000–2003 for each professor. I created a raw count of publications, with theonly mediation being dividing any co-authored publication by the number ofco-authors. This created a raw publication count for 2000–2003.

I then separated the publications into different categories and assignedeach category two different point values: the relative value for scholarlypurposes and the relative value for practical lawyering purposes (seeTable 1).

I generated these categories and the points I assigned from anamalgam of sources and received wisdom. In creating the scholarly rankings,I consulted a number of articles and essays offering scholarship advice to newlaw professors (see, e.g., Slomanson 2000), considered conversations withother professors, and used my understanding of the tenure process at Ten-nessee and other U.S. law schools. I generated the practitioner-oriented

6Depending on your point of view, either or both of these labels (“scholarly publishing” or“practice-oriented publishing”) may seem pejorative or insulting. No value judgment isintended; I use these labels only for ease of reference.

7I followed the same procedure for each professor. I started with the law school’s website. Mostof the 19 schools had at least some listing of each professor’s recent publications, and I usedthose listings as a baseline. Some schools even included a full CV for each faculty member,which was ideal. To make sure that each school and each professor was measured identically, Ialso checked each professor for publications on Westlaw and Amazon.com.

On Westlaw I checked the JLR database with the following search “au (first name /2 lastname).” For example, “au (benjamin /2 barton).” The “/2” was used to account for middleinitials, and a /3 or /4 was used when middle names or initials made a larger search appropriate.If that search failed to produce any results, I would check again for possible other names forpublishing, like “au (ben! /2 barton)”. For Amazon.com, I usually used as much of the profes-sor’s name as possible to limit false positives.

If there were any sources listed on a professor’s resume or the law school’s website that wereunavailable on Amazon or Westlaw I generally used a Google search to try to locate thepublications. In short, I made every effort to find and verify each publication for all 623professors.


rankings as a counterpoint that emphasized publications aimed at practicinglawyers or law students.8

There are obvious problems with any ranking of scholarly productivity,and mine is no exception. On the one hand, it would be preferable to makeas few subjective determinations as possible, and assigning any weight to

8There were, naturally, several additional wrinkles. If there were two or more authors for anypublication, I divided the points by the number of authors. If a professor served as an editor forany of the above publications, he or she received half credit for that type of publication.

Another problem was how to count treatises or casebooks that were updated, but not origi-nally published, during the period (2000–2003). I decided to count any update for the fullamount of points. I did this for two reasons. First, the sheer difficulty of following an area of lawsufficiently to update a treatise or a casebook is hard to distinguish from the effort necessary todraft the first version. Second, the main argument of the treatise writers for separate, practice-oriented productivity rankings was the potential positive teaching effect of being fully up-to-datein an area of law. Since an update within the period should reflect a professor who is up-to-date,I wanted to count updates fully.

It is worth noting the items I chose not to count. I did not count op-ed pieces or newspaperarticles. I did not include briefs or other papers filed in a lawsuit. I chose not to count theseitems because they would be impossible to find independently, and I did not want to onlyinclude them depending on whether a faculty member or school decided to include them onits website or CV. I also did not include faculty blogs because of the impossibility of measuringblog activity in 2000–2003 and the great imbalance among different blog types and activitylevels.

Table 1: Productivity Points

CategoryScholarly

PointsPractice-

Oriented Points

Scholarly book 15 5Chapter in a scholarly book 6 3Top-20 or peer-reviewed law review article (40+ pages)a 6 3Law review article (40+ pages) 5 3Top-20 or peer-reviewed law review essay (10–39 pages) 4 2Law review essay (10–39 pages) 3 2Treatise or casebook 3 15Practitioner article or chapter 1 5Top-20 or peer-reviewed publication under 10 pages 2 1Law review publication under 10 pages 1 1

aAs my measure of a top-20 law review, I used the combined top-20 list in Cullen and Kalberg(1995). I defined a publication as “peer reviewed” if the editorial staff and the selection processwere run by faculty members (whether faculty in law or another discipline). Some commenta-tors have worried that this study focused too narrowly on law review articles. To the contrary,scholarly publications in nonlaw academic journals were counted, and were almost invariablygranted the bonus points for peer review, making nonlaw publications at least as valuable, andoften slightly more valuable, than law review publications.

628 Barton

faculty publications carries a number of value judgments. In this vein, someof the non-law-school studies measure the raw number of publications orsimply count the total number of published pages. I decided not to rely solelyon this approach because there is a greater variation among law schoolpublications than in many other disciplines and I thought such a basic counton its own would not accurately reflect professorial productivity. Neverthe-less, the raw count is included as a bulwark against criticism of eitherweighted measure: if anyone objects to the points assigned, he or she canalways refer to the raw count.

In theory, I or my research assistants also could have read all therelevant publications and assigned points based on our own criteria/opinion. Instead, I decided to divide the publications into categories andassign points as a compromise. This approach sufficiently captures the dif-ferences in the types of legal publications, but is not so subjective as to callthe entire process into question.

D. What Was Studied—Scholarly Influence

I also decided to do a citation study of the same 623 professors. I did this forseveral reasons. First, I wanted to match the legal scholarship that has beenbased in citation studies, not studies of raw productivity (Eisenberg & Wells2000; Leonard 1990). This addition also answers the criticism that a corre-lation study that measured only productivity misses the entire point: theproper study is between scholarly influence and teaching, since scholarlyinfluence is a much truer measure of publication quality. Second, a citationmeasure covers a lifetime of work, so it should catch any scholars who wereless productive during 2000–2003 for whatever reason.

I decided to do two measures of scholarly influence: one a lifetime totalof “citations” in the Westlaw JLR database, and one per-year listing of citations,since a lifetime citation study may unfairly benefit longevity. I gathered thesetwo measures by following Brian Leiter’s well-known citation study method-ology (Leiter 2000). I describe the methodology in more detail in thefootnotes, but I basically used Westlaw’s JLR database to generate a raw list ofcitations for each professor and then I created a “cites-per-year” measure bydividing the total number of cites by the years in full-time teaching.9

9The full methodology is as follows: First, I did a search in Westlaw’s JLR database for theprofessor’s name. I searched for “first name /2 last name”. For example, my search was“Benjamin /2 Barton”. I again expanded the search for multiple middle initials or names, and


There are a few notable problems with this measure. First, as BrianLeiter has also recognized, it actually captures more than just citations. Itcaptures “star cite” references, and self-citations, both of which might bedisallowed from a true “citation-only” study. Because I am measuring “influ-ence,” and my rough eye-balling of the references show that few law profes-sors have an unusual ratio of citations to star cite references, I chose not toweed through the raw numbers to eliminate those references.10 Further,insofar as I am attempting to measure “influence,” a star cite referencepossibly shows that the named professor had at least some influence on thepublication at issue.

Second, I did not do a corresponding study of judicial citations toprofessors. I did not do so because court citations are much harder to find onWestlaw and much rarer, and I worried about my ability to accurately count

considered alternate searches if a first search resulted in few or no cites. So, if a Benjamin /2Barton search came up with no citations, I would try “Ben! /2 Barton” to catch nicknames.Because this information is time sensitive I did the entire study of scholarly influence in a shorttimeframe (March 23–27, 2006). If any additional cites were added during these few days itshould not have been enough to make a substantial difference.

I then counted the raw number of references returned. If the professor had a common name(like mine), I would scan the first 20 references for false positives. If there was a false positive,I would extend the search to 40 total cites. To remove the false positives, I would then divide thenumber of false positives by 40, and multiply that percentage by the entire number of citations.If the false positive count overwhelmed the true references, I would try another search, ratherthan risk an inaccurate count. For example, a search for “John /2 White” brings back so manydifferent John Whites that the entire count could be inaccurate, regardless of the false or truereferences in the first 40. In that case, I would try “John A. White”, adding the middle initial tonarrow the search down.

I also counted the number of years each faculty member had been in full-time tenure-tracklaw teaching. I then divided the total number of references by the number of years to get ameasure of references per year. I created the references-per-year number so that longevity inteaching was not ignored in the citation study and so that there was a measure that captured“influence per year,” rather than just a career’s worth of citations.

Please note that this Westlaw search counts each professor’s scholarly influence throughMarch 2006. I decided to stretch the timeframe for scholarly influence beyond 2003 becausemeasuring only citations in 2000–2003 would fail to capture the influence of the publicationsbeing written during the timeframe of the study because of the lag between writing, publishing,and eventual citation.

10This also would have added an entire layer of potential errors and bias in deciding what wasa “real” citation and what was not. It would have also been impossible to accomplish over a shortperiod of time (since some professors had more than 1,000 references in Westlaw), and thatwould have introduced the need to adjust for time in the study, adding another possible sourceof error.

630 Barton

those citations.11 The practice-oriented side of this measure is thus missing,and I recognize the weakness of this asymmetry.

Lastly, Westlaw does not include many nonlaw publications, so profes-sors who publish primarily in the journals of other disciplines or are citedprimarily in those journals will be undercounted by this citation measure.This weakness is blunted by the earlier productivity measure, which explicitlyincludes nonlaw scholarly publications found via the home school website,Amazon, and Google. Moreover, to my knowledge there is not a nonlawequivalent to Westlaw, so a broader citation count would be impossible.Lastly, whatever argument there is for a connection between scholarly pro-ductivity and teaching, it is hard to see how citation and influence in nonlawjournals would have much effect on law school teaching.

III. Results

I gathered all the numbers on a single spreadsheet and used Stata 9.1 tocalculate three different sets of correlation coefficients. I used the rawnumbers to create a Spearman rank-order coefficient, as well as p values tomeasure statistical significance. I used a Spearman correlation first becausemany of the data were not normally distributed, so a nonparametric corre-lation measure like the Spearman was more appropriate.

Next, I adjusted as many of the data as I could to a normal distributionand then ran a series of Pearson product moment correlations on theadjusted numbers. The numbers were adjusted because the Pearson corre-lation is parametric and requires a normal distribution. Lastly, I ran aPearson correlation on the nonnormal, raw data as a companion to, andcheck on, the other two sets of correlations. All three of the correlation setswere largely consistent, which supports the overall findings and conclusions.

Both the Spearman and Pearson correlations generate a correlationcoefficient that measures how strongly two variables correlate.12 The p value

11Many law professors have relatively common names that occur frequently as names of plaintiffsor defendants, creating a massive false positive problem. Further, the courts are not as regularas law reviews in their citation style, and many times do not provide the professor’s first name,making it difficult to gather an accurate citation count.

12A perfect positive correlation coefficient would be +1. A perfect negative correlation is a -1. Anegative correlation coefficient means that as one number grows larger, the other growssmaller; the two variables move in opposite directions. A positive coefficient means that as one


measures the likelihood that the null hypothesis is correct, that is, the p valuehelps avoid Type I errors.13 When the p value rises above either 0.05 or 0.10,the coefficient results are generally not considered statistically significantbecause the null hypothesis of no correlation is too likely to be able to drawany firm conclusions from any finding of correlation.

A second concern is Type II errors.14 To avoid Type II errors, manyresearchers look to the power of their correlation. If the power is below 0.8,then the risk of a Type II error is too high for the null hypothesis to bedefinitively accepted. The power of a correlation is calculated from thesample size and the relative value of the correlation coefficient sought. Ascorrelations become stronger, it is possible to detect them in smallersamples. As the correlations sought become weaker, the sample size must bebigger to reliably detect a correlation. Similarly, if you require a p value of0.05 or lower to find a correlation, the sample size required is larger than ifa p value of 0.1 (or higher) is required. In short, the sample size requireddepends on the size of the correlation the researcher is seeking, as well as thestringency of the p values required.

Table 2 shows the sample size necessary in order to surpass a power of0.8, broken down by the size of the correlation coefficient sought and the pvalue required.15

The entire sample of professors (N = 623) is large enough to meet the0.8 power requirement, even when searching for correlations as weak as 0.1.Nevertheless, the sample sizes for the individual schools (which range from10 to 58) are often too small to detect low levels of correlation. Nevertheless,I do break out the scores by school because it helps explicate and support theoverall numbers. Readers should note that the number of professors at any

grows larger the other does, too; the two variables move in the same direction. As a general ruleof thumb, there are five categories of correlation coefficients: strong positive coefficients (+0.7to 1), strong negative coefficients (-0.7 to 1), weak positive coefficients (+0.4 to 0.7), weaknegative coefficients (-0.4 to 0.7), and no correlation, or very weak correlations (-0.3 to 0.3)(Cope 2005). Some social scientists consider any finding of correlation above 0.1 as a positivecorrelation. A coefficient of 0 is a perfect finding of no correlation.

13A Type I error is when a researcher finds a correlation when in fact the null hypothesis of nocorrelation is true.

14Type II errors occur when a researcher discards a finding of correlation in favor of the nullhypothesis of no correlation in error.

15These numbers were calculated using the SISA sample-size calculation (SISA 2007).

632 Barton

individual school (which are fixed and cannot be enlarged) may often be toosmall to fit a 0.8 power restriction, and read the results accordingly.

The results will be reported in the following order. First, I will reportthe mass Spearman rank-order correlations for all 623 law professors. I willthen show the Spearman correlation coefficients broken out by school. Next,I will report the mass and school-by-school Pearson correlations with allnonnormal data transformed to normality wherever possible. Lastly, I willproduce the unadjusted Pearson correlations as a comparison point to theearlier Spearman and transformed Pearson correlations.

A. Spearman Correlations

Table 3 reports the Spearman rank-order correlation coefficients betweenthe teaching evaluation index and the z scores and each of the five scholar-ship measures.

All the coefficients but one are under 0.1, and many of the smallcorrelations that are found are not statistically significant. The most thatcould be said of these coefficients is that the z scores had a slight positiverelation to the raw publication, scholarly productivity, and practice-oriented

Table 2: Sample Size and Power

CorrelationCoefficient p Value

Sample SizeRequired

0.5 0.1 180.5 0.05 230.3 0.1 500.3 0.05 670.1 0.1 4530.1 0.05 619

Table 3: Spearman Rank-Order Correlation Coefficients Between Teach-ing and Five Scholarship Measures, Aggregate Level

Index Spearman p Value z Score Spearman p Value

Scholarly productivity 0.0627 0.1180 0.0866 0.0306Practice oriented 0.0730 0.0687 0.1301 0.0011Raw publications count 0.0607 0.1303 0.0903 0.0242Total citations 0.0114 0.7761 0.0343 0.3932Citations per year 0.0468 0.2433 0.0588 0.1429

N = 623.


publication counts. The correlation coefficients of the index version of theteaching evaluations are basically indistinguishable from zero, and are notstatistically significant at 0.05, although the small positive correlation foundfor practice-oriented scholarship is statistically significant at 0.10. Overall,these findings support the conclusion of no correlation or, at most, a verysmall correlation with scholarship produced.

A review of the individual Spearman rank-order correlation coefficientsand p values for each participating school further underscores this finding(see Table 4).

There are 95 total correlation coefficients and only 12 are statisticallysignificant. Each of those are positive and above 0.25. These positive corre-lation coefficients tend to cluster around four schools (St. John’s has four ofthem, Ohio State three, Penn State and Villanova two apiece). This suggeststhat at these schools the most prolific and well-cited authors tended to be thehighest-rated teachers. It is also interesting to note that the “total citations”measure has no statistically significant correlations, suggesting that it is theleast likely determinant of high teaching evaluations.

B. Pearson Correlations

The Spearman rank-order correlation places each of the values in rank orderand tests whether two different sets of rank-ordered values correlate. TheSpearman rank-order correlation does not take the gaps between the valuesinto account, so it is considered a less powerful form of correlation coeffi-cient than the Pearson correlation, which considers the relative values.

The Pearson correlation calculation, however, is parametric, whichmeans that both sets of values considered must fit the assumption of nor-mality. To use nonnormal data, those data must be transformed intonormal data. Many of the data considered here are nonnormal, so I trans-formed the data I could in order to use the Pearson correlation. If I wasunable to find a transformation to normalize the data I note it in the text,and then generally used the transformation that came closest to creatingnormal data.

I adjusted the data using the following procedure. First, I tested all datafor normality using the Shapiro-Wilk test.16 Any data that were nonnormal I

16I did this with the “swilk” command in Stata.

634 Barton

Tab

le4:

Spea

rman

Cor

rela

tion

Coe

ffici

ents

Bet

wee

nT

each

ing

and

Sch

olar

ship

Mea

sure

s,Sc

hoo

lL

evel

Scho

larl

yp

Valu

ePr

actic

ep

Valu

eT

otal

Pubs

pVa

lue

Cite

sT

otal

pVa

lue

Cite

spe

rYe

arp

Valu

e

Col

orad

o(N

=30

)0.

1828

0.33

360.

2401

0.20

120.

2194

0.24

400.

2005

0.28

800.

2652

0.15

67C

onn

ecti

cut

(N=

32)

-0.0

310

0.86

610.

0565

0.75

860.

0215

0.90

680.

1568

0.39

140.

1419

0.43

84C

umbe

rlan

d(N

=25

)0.

2850

0.16

730.

1807

0.38

740.

2438

0.24

030.

0943

0.65

38-0

.009

20.

9651

Flor

ida

(N=

39)

-0.1

061

0.52

05-0

.093

10.

5731

-0.1

173

0.47

690.

0539

0.74

440.

0878

0.59

52Io

wa

(N=

38)

-0.1

074

0.52

110.

0732

0.66

22-0

.055

30.

7418

-0.0

959

0.56

69-0

.027

90.

8679

Lew

is&

Cla

rk(N

=29

)0.

0679

0.72

640.

1536

0.42

620.

1703

0.37

720.

2002

0.29

780.

1859

0.33

44M

ich

igan

(N=

35)

0.26

810.

1194

0.19

430.

2635

0.22

950.

1848

0.15

220.

3826

0.17

580.

3123

UN

D(N

=10

)-0

.150

60.

6778

-0.0

490

0.89

32-0

.107

70.

7671

0.32

110.

3656

0.37

610.

2840

Nor

thw

este

rn(N

=41

)-0

.006

90.

9659

-0.0

889

0.58

04-0

.023

20.

8855

-0.1

708

0.28

56-0

.110

00.

4937

Oh

ioSt

ate

(N=

35)

0.31

660.

0639

+0.

3401

0.04

56*

0.30

530.

0745

+0.

0780

0.65

590.

1143

0.51

32Pe

nn

Stat

e(N

=24

)0.

3856

0.06

27+

0.40

180.

0517

*0.

2339

0.27

130.

0661

0.75

890.

2022

0.34

33So

uth

wes

tern

(N=

38)

-0.2

173

0.19

01-0

.232

00.

1610

-0.2

635

0.11

00-0

.261

00.

1135

-0.0

830

0.62

05St

.Joh

n’s

(N=

37)

0.36

180.

0278

*0.

2751

0.09

94+

0.36

440.

0266

*0.

1332

0.43

190.

2958

0.07

54+

Ten

nes

see

(N=

35)

0.01

330.

9394

0.28

030.

1029

0.09

360.

5927

0.09

120.

6025

0.04

330.

8050

Tex

asT

ech

(N=

33)

-0.1

047

0.56

19-0

.056

80.

7534

-0.0

780

0.66

62-0

.042

50.

8145

-0.1

915

0.28

56T

oled

o(N

=24

)-0

.053

80.

8030

-0.1

885

0.37

77-0

.086

50.

6877

0.15

880.

4586

0.13

880.

5178

UC

LA

(N=

58)

0.17

790.

1815

0.36

070.

0054

*0.

1791

0.17

86-0

.020

10.

8812

-0.0

515

0.70

09V

illan

ova

(N=

34)

0.32

260.

0627

+0.

1762

0.31

880.

2685

0.12

460.

1580

0.37

210.

4028

0.01

82*

Way

ne

Stat

e(N

=26

)-0

.119

80.

5600

-0.2

704

0.18

15-0

.161

90.

4293

-0.2

357

0.24

64-0

.134

10.

5138

+Sig

nifi

can

tat

0.10

.*S

ign

ifica

nt

at0.

05.


attempted to transform. Wherever possible I used what Stata calls alognormal zero skew transformation.17

Tables 5–9 note what transformations were used and if the datacould not be transformed to normality. Table 5 reports the mass Pearsoncorrelations.

The index correlation coefficients are all so close to zero that they are aclear finding of no correlation. The sample size is large enough to find acorrelation as small as 0.1 at a power of 0.8. The z scores, however, show astatistically significant and slightly positive correlation across four of the fivemeasures. This is consistent with the earlier Spearman correlations. Thetransformed z scores are still not normal, so a reader may want to considerthose coefficients accordingly. The coefficients are basically consistent withthe earlier Spearman coefficients, however, so there is an argument to bemade for a small positive correlation between scholarship produced and totalcitations and the student evaluation z scores. Interestingly, the small correla-tion between practice-oriented scholarship and teaching evaluation z scores isthe most pronounced across both sets of correlations, and citation totals is theleast likely measure of scholarship to correlate to teaching evaluations.

17The Stata command for this transformation is “lnskew0”. The lognormal zero skew transfor-mation creates a new variable that equals log(+/-exp - k ), choosing k and the sign of exp sothat the skewness of the new variable is zero.

Table 5: Pearson Correlation Coefficients Between Teaching and FiveScholarship Measures, Aggregate Level

Index Pearsona p Value z Score Pearsonb p Value

Scholarly productivityc 0.0502 0.2112 0.0942 0.0187Practice oriented 0.0648 0.1060 0.1087 0.0066Raw publications count 0.0525 0.1903 0.0918 0.0220Total citations 0.0070 0.8618 0.0427 0.2869Citations per year 0.0446 0.2664 0.0802 0.0453

aThe index data were not normal and were adjusted using a lognormal zero skewtransformation.bI could not normalize the z scores. A zero skew lognormal transformation most closely approxi-mated a normal distribution and was used for these correlations.cNone of the data for scholarly productivity, practice oriented, raw publications, total citations,or citations per year were normal. A lognormal zero skew transformation was used, although thetransformed scholarly productivity, practice oriented, raw publications, and citations per yearremain nonnormal even after the transformation. The transformed total citations data arenormally distributed.N = 623.

636 Barton

Tab

le6:

Spea

rman

Cor

rela

tion

Coe

ffici

ents

Bet

wee

nT

each

ing

and

Sch

olar

ship

Mea

sure

s,Sc

hoo

lL

evel

Scho

larl

yp

Valu

ePr

actic

ep

Valu

eT

otal

Pubs

pVa

lue

Cite

sT

otal

pVa

lue

Cite

spe

rYe

arp

Valu

e

Col

orad

o(N

=30

)0.

1883

0.31

900.

2073

0.27

180.

1829

0.33

330.

2120

0.26

080.

2559

0.17

23C

onn

ecti

cut

(N=

32)

0.04

090.

8240

0.03

130.

8651

0.01

38+

0.94

010.

1407

0.44

230.

1304

0.47

69C

umbe

rlan

d(N

=25

)0.

2696

0.19

250.

1898

0.36

350.

2256

0.27

830.

0659

0.75

440.

0013

0.99

52Fl

orid

a(N

=39

)-0

.052

40.

7515

-0.1

091

0.50

87-0

.086

40.

6010

-0.0

321

0.84

610.

0907

0.58

29Io

wa

(N=

38)

-0.1

643

0.32

440.

0027

0.98

71-0

.118

90.

4769

-0.0

240

0.88

640.

0071

0.96

63L

ewis

&C

lark

(N=

29)

0.02

420.

9009

0.09

190.

6352

0.15

190.

4315

0.20

670.

2820

0.18

070.

3483

Mic

hig

an(N

=35

)0.

2194

0.20

550.

1856

0.28

580.

1985

0.25

300.

1889

0.27

710.

1894

0.27

59U

ND

(N=

10)

-0.1

498*

0.67

960.

0024

*0.

9947

-0.0

47*

0.89

650.

3319

+0.

3487

0.39

560.

2579

Nor

thw

este

rna

(N=

41)

0.00

360.

9822

-0.0

222

0.89

04-0

.003

00.

9849

-0.1

868

0.24

22-0

.098

70.

5394

Oh

ioSt

ate

(N=

35)

0.32

040.

0606

0.35

620.

0357

0.29

82#

0.08

190.

0970

0.57

950.

1511

0.38

63Pe

nnSt

ateb

(N=

24)

0.43

960.

0316

0.39

020.

0594

0.30

480.

1476

0.11

420.

5952

0.25

260.

2337

Sout

hw

este

rn(N

=38

)-0

.165

0#0.

3223

-0.1

502#

0.36

80-0

.196

7#0.

2366

-0.1

764

0.28

94-0

.003

30.

9843

St.

John

’sc

(N=

37)

0.33

610.

0420

0.27

900.

0944

0.31

760.

0554

0.14

760.

3833

0.29

570.

0756

Ten

ness

eed

(N=

35)

-0.0

048

0.97

800.

3168

0.06

370.

1136

0.51

590.

1156

0.50

830.

0670

0.70

22T

exas

Tec

h(N

=33

)-0

.106

0#0.

5570

-0.0

992

0.58

27-0

.076

70.

6715

-0.0

119

0.94

76-0

.132

20.

4635

Tol

edo

(N=

24)

-0.1

099#

0.60

92-0

.177

7#0.

4062

-0.1

201#

0.57

630.

2121

0.31

970.

1999

0.34

91U

CLA

e(N

=58

)0.

1668

0.21

080.

3430

0.00

840.

1680

0.20

76-0

.005

90.

9651

-0.0

677

0.61

34V

illan

ova

(N=

34)

0.35

330.

0404

0.18

450.

2962

0.29

630.

0889

0.19

460.

2701

0.41

54#

0.01

46W

ayn

eSt

ate

(N=

26)

-0.0

993+

0.62

93-0

.355

90.

0743

-0.1

710+

0.40

35-0

.232

10.

2539

-0.1

751

0.39

23

a Nor

thw

este

rn’s

eval

uati

onda

taw

ere

not

nor

mal

lydi

stri

bute

d.N

one

ofth

etr

ansf

orm

atio

nsr

esul

ted

inn

orm

aldi

stri

buti

ons.

Th

elo

gnor

mal

zero

skew

nes

str

ansf

orm

atio

nm

ost

clos

ely

appr

oxim

ated

an

orm

aldi

stri

buti

on,s

oI

used

thos

eda

ta.

b Pen

nSt

ate’

sev

alua

tion

data

wer

en

otn

orm

ally

dist

ribu

ted,

soa

Box

-Cox

tran

sfor

mat

ion

was

used

.Th

eB

oxC

oxtr

ansf

orm

atio

ncr

eate

sa

new

vari

able

that

equa

ls(e

xpa

-1)

/a.T

he

Box

-Cox

tran

sfor

mat

ion

was

not

used

for

oth

erda

tabe

caus

eex

pm

ustb

est

rict

lypo

siti

veto

use

the

Box

-Cox

tran

sfor

mat

ion

and

the

oth

erda

taei

ther

con

tain

edze

ros

orn

egat

ive

valu

es.

Th

eSt

ata

com

man

dfo

ra

Box

-Cox

zero

skew

tran

sfor

mat

ion

isbc

skew

0.c St

.Joh

ns’

eval

uati

onda

taw

ere

not

nor

mal

lydi

stri

bute

d,so

aB

ox-C

oxtr

ansf

orm

atio

nw

asus

ed.

d Ten

nes

see’

sev

alua

tion

data

wer

en

otn

orm

ally

dist

ribu

ted,

soa

Box

-Cox

tran

sfor

mat

ion

was

used

.e U

CL

A’s

eval

uati

onda

taw

ere

not

nor

mal

lydi

stri

bute

d,so

aB

ox-C

oxtr

ansf

orm

atio

nw

asus

ed.

+Th

ese

data

wer

en

orm

al,a

nd

wer

en

ottr

ansf

orm

ed.

*Log

(x+

1)tr

ansf

orm

atio

nw

asus

edto

ach

ieve

nor

mal

ity.

Th

ism

ean

sI

calc

ulat

eda

new

vari

able

that

equa

lslo

g(e

xp+

1).

#Log

nor

mal

zero

skew

tran

sfor

mat

ion

used

,but

data

rem

ain

edn

onn

orm

al.


The school-by-school breakdown of the adjusted data, reported inTable 6, shows a similar pattern. Most of the evaluation data were normal,unless marked. As noted above, I tried to transform the data using the Statalognormal zero skew transformation wherever possible, so if the variousscholarship data are not separately marked, the data were transformed tonormality with the lognormal zero skew transformation. The correlationcoefficients that are significant at 0.1 are shown by a p value in bold anditalics

There are 14 correlations out of 95 that are statistically significant at a0.1 p value. Again, the statistically significant correlations clustered aroundOhio State, St. John’s, and Villanova, with three each, and Penn State withtwo. Wayne State has our first statistically significant negative value. Again,there are no statistically significant correlations associated with totalcitations.

Nevertheless, despite a finding of correlation at four of 19 schools, theoverall picture supports the amalgamated numbers: there is either no cor-relation between teaching evaluations and these measures of scholarlyoutput, or a very slight positive correlation. This is especially so in light of thefindings of the large-scale correlations and the general uniformity of theschool-level correlations. The Pearson correlations of adjusted data supporta finding of no correlation or slight positive correlation.

Lastly, I include the unadjusted Pearson correlations. Most of thesedata are not normal, so these correlation coefficients should be used appro-priately. Nevertheless, I include them to show their overall consistency withthe earlier coefficients. Table 7 reports the aggregate, unadjusted correla-tions for the index and z scores and Table 8 reports the unadjusted Pearsoncoefficients for each school.

Out of the 95 correlations, there are only eight that are significant at0.10. Similarly, the amalgamated numbers show correlation coefficients very

Table 7: Unadjusted Pearson Correlation Coefficients Between Teachingand Five Scholarship Measures, Aggregate Level

Index Pearson p Value z Score Pearson p Value

Scholarly productivity 0.0029 0.9417 0.0383 0.3403Practice oriented 0.0366 0.3614 0.0650 0.1053Raw publications count 0.0229 0.5677 0.0528 0.1885Total citations -0.0139 0.7298 0.0194 0.6293Citations per year 0.0224 0.5770 0.0487 0.2251

N = 623.

638 Barton

Tab

le8:

Un

adju

sted

Pear

son

Cor

rela

tion

Coe

ffici

ents

Bet

wee

nT

each

ing

and

Sch

olar

ship

Mea

sure

s,Sc

hoo

lLev

el

Scho

larl

yp

Valu

ePr

actic

ep

Valu

eT

otal

Pubs

pVa

lue

Cite

sT

otal

pVa

lue

Cite

spe

rYe

arp

Valu

e

Col

orad

o0.

0594

0.75

510.

1288

0.49

760.

0657

0.73

030.

1065

0.57

530.

1375

0.46

87C

onn

ecti

cut

0.07

80.

6714

-0.0

214

0.90

760.

010.

9566

0.15

380.

4006

0.10

940.

5511

Cum

berl

and

0.25

070.

2267

0.13

150.

531

0.16

930.

4186

0.16

0.44

50.

1199

0.56

82Fl

orid

a0.

059

0.72

1-0

.119

70.

4679

0.03

510.

8318

-0.0

870.

5982

0.08

060.

6258

Iow

a-0

.139

20.

4046

0.04

20.

8022

-0.0

574

0.73

20.

1136

0.49

710.

1289

0.44

05L

ewis

&C

lark

-0.1

311

0.49

80.

0709

0.71

490.

0531

0.78

430.

1247

0.51

920.

0868

0.65

43M

ich

igan

-0.0

258

0.88

30.

0538

0.75

870.

0118

0.94

630.

1525

0.38

190.

1867

0.28

27U

ND

-0.1

184

0.74

460.

1268

0.72

710.

0056

0.98

780.

3319

0.34

870.

1834

0.61

21N

orth

wes

tern

0.00

30.

9851

-0.0

023

0.98

85-0

.005

0.97

52-0

.084

30.

6004

-0.1

391

0.38

59O

hio

Stat

e0.

3232

0.05

82+

0.34

940.

0396

*0.

3087

0.07

12+

0.08

20.

6394

0.13

390.

4433

Pen

nSt

ate

0.29

140.

1671

0.29

280.

165

0.22

130.

2988

0.10

250.

6336

0.34

350.

1003

+So

uth

wes

tern

-0.1

409

0.39

87-0

.128

20.

4429

-0.1

987

0.23

17-0

.173

50.

2975

-0.0

482

0.77

38St

.Joh

n’s

0.29

930.

072+

0.25

10.

1339

0.26

050.

1194

0.08

510.

6165

0.27

160.

1039

Ten

nes

see

-0.1

515

0.38

480.

219

0.20

63-0

.007

90.

9639

0.01

030.

953

0.06

560.

7082

Tex

asT

ech

-0.0

371

0.83

780.

0144

0.93

64-0

.042

70.

8137

0.07

540.

6767

-0.1

535

0.39

37T

oled

o0.

103

0.63

2-0

.238

0.26

280.

0492

0.81

950.

2086

0.32

80.

1719

0.42

18U

CL

A0.

0877

0.51

270.

2583

0.05

03*

0.12

910.

3342

-0.0

772

0.56

46-0

.016

70.

9009

Vill

anov

a0.

1432

0.41

91-0

.086

0.62

850.

1115

0.53

020.

1264

0.47

610.

3912

0.02

22*

Way

ne

Stat

e-0

.099

30.

6293

-0.3

523

0.07

75+

-0.1

710.

4035

-0.1

436

0.48

41-0

.181

90.

3737

+Sig

nifi

can

tat

0.10

.*S

ign

ifica

nt

at0.

05.


close to zero and generally high p values, indicating that even the smallcorrelation coefficients found are not statistically significant.

In addition to the mass and individual correlations, I also createdscatterplots for each data set and reviewed them in case there was an unde-tected curvilinear relationship not revealed by the linear correlations. Thesescatterplots are available online.18 None of the scatterplots show a strongcurvilinear relationship.

Because the earlier work of Lindgren and Nagelberg found a smallrelationship between citations and teaching evaluations for the most- andleast-cited professors at four schools, and also that the more-cited professorswere more likely to receive good teaching evaluations, I considered whetherthe top 20 percent and the bottom 20 percent of the total citations count hada stronger or weaker relationship than the entire cohort. The results showed,if anything, less of a relationship (see Table 9).

In sum, across the different calculations of scholarly productivity andcitation, and using both a linear index score and a z score to combine theteaching evaluation data, the findings show either no correlation or a slightpositive correlation. In light of the slightness of the few correlations thatwere found, the safest read across these data is that teaching evaluations donot correlate strongly with either scholarly productivity (whether measuredas a raw publication count or weighted) or a citation count.

It is interesting to note that this finding of no or slight correlation isbasically repeated for most schools on the various school-by-school charts.Except for a few schools (which are dissimilar enough from each other thatthey cannot be easily grouped or generalized), the correlation coefficient issimilar (and very weak) regardless of whether the schools are public orprivate, larger or smaller, stronger or weaker in academic reputation, stron-ger or weaker in overall scholarly productivity, and regardless of whether theschool is in the East, West, South, or North of the United States. A finding ofno correlation or a weak positive correlation is also consistent with thenonlaw studies that made similar findings.

C. The Counterintuitive Nature of This Finding

In describing this study to others, I found two general predictions as to myresults. A slight majority predicted that scholarly productivity and teaching

18The scatterplots are available with the online version of this article at the JELS website or at⟨http://www.law.utk.edu/FACULTY/BartonScatter.pdf⟩.

640 Barton

http://www.law.utk.edu/FACULTY/BartonScatter.pdf%E2%8C%AA

effectiveness have a strong positive correlation. These people reason that theprocess of researching and writing scholarship should have multiple benefitsfor classroom teaching: the professor is more likely to be up-to-date in his orher knowledge of a particular subject area, and the professor’s thinking onthe topic should be polished and honed in the process of drafting scholar-ship. The proponents of practice-oriented scholarship were particularlyvociferous on this point, and it intuitively makes sense that the professorswho focus on writing casebooks, treatises, and practitioner articles wouldhave many advantages in teaching those same materials. In short, manythought that excellence in teaching and scholarship would be positivelycorrelated.

There was a second, smaller group, which postulated a strong negativecorrelation. Since faculty members have limited time for teaching and schol-arship, some guessed that the extra time spent on producing scholarshipwould lessen teaching effectiveness. In this zero-sum projection, the

Table 9: Spearman Rank-Order Correlation Coefficients Between Teach-ing and Five Scholarship Measures, Aggregate Level, for High and LowScholarship Subgroups

IndexSpearman p Value

z ScoreSpearman p Value

A. Bottom 20% Total CitesScholarly productivity 0.0163 0.8564 0.0591 0.5124Practice oriented 0.0002 0.9979 0.0401 0.6567Raw publications count 0.0180 0.8419 0.0475 0.5986Total citations 0.0877 0.3310 0.1058 0.2404Citations per year 0.0620 0.4920 0.0959 0.2873

B. Top 20% Total CitesScholarly productivity 0.0383 0.6713 0.0513 0.5701Practice oriented 0.1209 0.1792 0.1157 0.1988Raw publications count 0.0283 0.7543 0.0140 0.8767Total citations -0.0796 0.3776 -0.0643 0.4762Citations per year 0.0316 0.7263 0.0020 0.9820

N = 125.Note: I also considered the effect of outliers on these findings. The use of the Spearmancorrelation coefficient and the use of the transformed data for the Pearson correlations,however, make the existence of outliers much less worrisome. This is because the Spearmancorrelation places each data point in rank order rather than considering the absolute value, andusing the transformed data tends to bring the outliers closer to the pack for the Pearsoncorrelations. Nevertheless, I did run some of the correlations without an outlier or two andfound little change in the results.


professors who spent the bulk of their energy on publishing would naturallysuffer as teachers.

Unless a person had read the literature outside of law schools thatshows a general finding of no correlation in other disciplines, few guessedthat there would simply be no correlation between the numbers. Statisticallyspeaking, it is unwise to conflate correlation and causation, and sometimeseven correlation numbers can be misleading if there is a factor outside ofthose studied that is driving the numbers. Nevertheless, a finding of no orvery slight correlation makes it unlikely that teaching and scholarship arehaving a large effect on each other one way or another.

IV. Conclusion

As a matter of law school policy, this finding is both helpful and puzzling. Infact, throughout the process of gathering and analyzing the data I havefound it to be something of a Rorschach test—people tend to read in theirown preferences. For example, some faculty members who think that thereis too much emphasis on publishing in legal academia responded: “Good. Atlast we’ll learn that hiring and promoting solely on the basis of scholarshipdoes nothing to help/teach our students.” Some faculty members who aremore pro-scholarship said: “Excellent. At last the myth that scholarship andteaching are at odds has been debunked.”

Without joining either side of this debate, I will add the following tothose reactions. First, if it is true that with the advent of rankings andincreased faculty competition law schools have begun to focus more onscholarship in hiring and promoting, that practice may not do much tofurther a school’s teaching mission.

Second, while hiring and promoting on the basis of scholarship maynot do much to further the school’s teaching mission, the study also suggeststhat scholarly productivity is not at odds with teaching. To the contrary, atleast these data suggest that it seems to have little effect, or a slightly positiveeffect. So, insofar as law schools are pushing faculty to increase scholarlyproductivity, it does not appear to be having a deleterious effect on teaching.

Lastly, it is worth thinking about why the AALS and ABA accreditationstandards basically require every law school in the United States to have asizable teaching faculty that also engages in scholarship (ABA 2006a; AALS2006) if scholarship and teaching effectiveness are not strongly related.There are other reasons why the AALS and ABA might require scholarly

642 Barton

productivity. It might be considered a service to the bench, bar, or society atlarge, or it might help the students or the school as a whole in a wayuncaptured by teaching evaluations. Nevertheless, if these other justifica-tions underlie those rules, the ABA and AALS should say so.19

In sum, this study provides the most comprehensive informationabout a very basic question about the relationship between scholarship andteaching at U.S. law schools. However, many fundamental questions aboutthe nature and mission of law schools and the legal professorate remainunanswered.

References

AALS (2006) AALS By-Law Section 6-4(a). Washington, DC: AALS.—— (2006) Member Schools. Available at ⟨http://www.aals.org/about_

membershcolls.php⟩.ABA (2006a) ABA Standards for Approval of Law Schools, Standard 404. Chicago, IL:

ABA.ABA, Section of Legal Education and Admission to the Bar (2006b)

ABA-Approved Law Schools. Available at ⟨http://www.abanet.org/legaled/approvedlawschools/approved.html⟩.

Bok, Derek (2003) Universities in the Marketplace. Princeton, NJ: Princeton Univ. Press.Braxton, John M. (1996) “Contrasting Perspectives on the Relationship Between

Teaching and Research,” in John M. Braxton, ed., Faculty Teaching and Research:Is There a Conflict, pp. 5–14. San Francisco, CA: Jossey-Bass Publishers.

Cope, David (2005) Fundamentals of Statistical Analysis. New York: Foundation Press.Cullen, Colleen M., & Randall S. Kalberg (1995) “Chicago-Kent Law Review Faculty

Scholarship Survey,” 70 Chicago-Kent Law Rev. 1445.Eisenberg, Theodore, & Martin T. Wells (1998) “Ranking and Explaining the Schol-

arly Impact of Law Schools,” 27 J. of Legal Studies 373.—— (2000) “Inbreeding in Law School Hiring: Assessing the Performance of Faculty

Hired from Within,” 29 J. of Legal Studies 369.Farley, Christene H. (1996) “Confronting Expectations: Women in the Legal

Academy,” 8 Yale J. of Law & Feminism 333.Feldman, Kenneth A. (1987) “Research Productivity and Scholarly Accomplishment

of College Teachers as Related to Their Instructional Effectiveness: A Reviewand Exploration,” 26 Research in Higher Education 227.

19Note that I am not suggesting that scholarship is not separately valuable, or that law schoolsshould not have faculties full of productive scholars. I am only saying that if that is a (or the)primary goal of U.S. law schools it should be defended aside from the effect on the teaching ofstudents, since this study shows that there is no demonstrable effect.


http://www.aals.org/about_

http://www.abanet.org/legaled

Hattie, John A. C., & Herbert W. Marsh (1996) “The Relationship Between Researchand Teaching—A Meta-Analysis,” 66 Rev. of Educational Research 507.

—— (2002) “The Relation Between Research Productivity and TeachingEffectiveness—Complementary, Antagonistic, or Independent Constructs?” 73J. of Higher Education 603.

Korobkin, Russell (1998) “In Praise of Law School Rankings: Solutions to Coordina-tion and Collective Action Problems,” 77 Texas Law Rev. 403.

Leiter, Brian (2000) “Measuring the Academic Distinction of Law Faculties,” 29 J. ofLegal Studies 451.

Leonard, James (1990) “Seein’ the Cites: A Guided Tour of Citation Patterns inRecent American Law Review Articles,” 34 Saint Louis Urban Law J. 181.

Lindgren, James, & Allison Nagelberg (1998) “Are Scholars Better Teachers?” 73Chicago-Kent Law Rev. 823.

Marsh, H. W. (1984). “Students’ Evaluations of Teaching: Dimensionality, Reliability,Validity, Potential Biases and Utility,” 76 J. of Educational Psychology 707.

—— (1987). “Students’ Evaluations of University Teaching: Research Findings,Methodological Issues, and Directions for Further Research,” 11 InternationalJ. of Educational Research 253.

—— (2007) “Students’ Evaluations of University Teaching: A Multidimensional Per-spective,” In R. P. Perry & J. C. Smart, eds., The Scholarship of Teaching and Learningin Higher Education: An Evidence-Based Perspective, pp. 319–84. New York: Springer.

Marsh, H. W., & L. A. Roche (2000) “Effects of Corading Leniency and Low Work-load on Students’ Evaluations of Teaching: Popular Myth, Bias, Validity, orInnocent Bystanders?” 92 J. of Educational Psychology 202.

Merritt, Deborah J. (1998) “Research and Teaching on Law Faculties: An EmpiricalExploration,” 73 Chicago-Kent Law Rev. 765.

Newton, Rae R., & Kjell E. Rudestam (1999) Your Statistical Consultant. London, UK:Sage Publications.

O’Reilly, Mark T. (1987) “Relationship of Physical Attractiveness to Students’ Ratingsof Teaching Effectiveness,” 51 J. of Dental Education 600.

Scordato, Marin R. (1990) “The Dualist Model of Legal Teaching and Scholarship,”40 American Univ. Law Rev. 367.

SISA (2007) Sample Size Calculation. Available at ⟨http://home.clara.net/sisa/correl.htm⟩.

Slomanson, William R. (2000) “Legal Scholarship Blueprint,” 50 J. of Legal Education431.

Smith, Pamela J. (1999) “Teaching the Retrenchment Generation: When SapphireMeets Socrates at the Intersection of Race, Gender, and Authority,” 6 William& Mary J. of Women & the Law 53.

University of Tennessee (2006) TN101. Available at ⟨http://tn101.utk.edu/default.asp⟩.

644 Barton

http://home.clara.net/sisa

http://tn101.utk.edu

Documents

Is There a Correlation Between Law Professor Publication Counts, Law Review Citation Counts, and Teaching Evaluations? An Empirical Study