214
Mapping Our Genes—Genome Projects: How Big? How Fast? April 1988 NTIS order #PB88-212402

Mapping Our Genes - Genome Projects: How Big? How Fast?

  • Upload
    dongoc

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mapping Our Genes - Genome Projects: How Big? How Fast?

Mapping Our Genes—Genome Projects:How Big? How Fast?

April 1988

NTIS order #PB88-212402

Page 2: Mapping Our Genes - Genome Projects: How Big? How Fast?

Recommended Citation:U.S. Congress, Office of Technology Assessment, Mapping Our Genes-The Genmne Projects.’How Big, How Fast? OTA-BA-373 (Washington, DC: U.S. Government Printing Office, April1988).

Library of Congress Catalog Card Number 87-619898

For sale by the Superintendent of DocumentsU.S. Government Printing Office, Washington, DC 20402-9325

(order form can be found in the back of this report)

Page 3: Mapping Our Genes - Genome Projects: How Big? How Fast?

ForewordFor the past 2 years, scientific and technical journals in biology and medicine have

extensively covered a debate about whether and how to determine the function andorder of human genes on human chromosomes and when to determine the sequenceof molecular building blocks that comprise DNA in those chromosomes. In 1987, theseissues rose to become part of the public agenda. The debate involves science, technol-ogy, and politics. Congress is responsible for ‘(writing the rules” of what various Federalagencies do and for funding their work. This report surveys the points made so farin the debate, focusing on those that most directly influence the policy options facingthe U.S. Congress,

The House Committee on Energy and Commerce requested that OTA undertakethe project. The House Committee on Science, Space, and Technology, the Senate Com-mittee on Labor and Human Resources, and the Senate Committee on Energy and Natu-ral Resources also asked OTA to address specific points of concern to them. Congres-sional interest focused on several issues:

● how to assess the rationales for conducting human genome projects,● how to fund human genome projects (at what level and through which mech-

anisms),● how to coordinate the scientific and technical programs of the several Federal

agencies and private interests already supporting various genome projects, and● how to strike a balance regarding the impact of genome projects on international

scientific cooperation and international economic competition in biotechnology.

OTA prepared this report with the assistance of several hundred experts through-out the world. Their help included interviews with OTA staff, comments on drafts ofthe report, and sending information to OTA. We want to thank those reviewers andmany others who have contributed to making the report more accurate, balanced, anduseful.

This report is one of many OTA reports related to biotechnology and genetics. Re-cent reports on related topics are Technologies for Detecting Heritable Mutations inHuman Beings, New Developments in Biotechnology: 1) Ownership of Human Tissuesand Cells, Z) Public Perceptions of Biotechnology, 4) U.S. Investment in Biotechnology,and Human Gene Therapy.

u JOHN H. GIBBONSDirector

.,,Ill

Page 4: Mapping Our Genes - Genome Projects: How Big? How Fast?

OTA Advisory Panel on Mapping the Human Genome

Kennedy

George F. Cahill, M.D.

LeRoy Walters, Ph.D., ChairmanDirector, Center for Bioethics

Institute of Ethics, Georgetown University

Vice President for Scientific Trainingand Development

Howard Hughes Medical Institute

Susan E. Cozzens, Ph.D.Assistant ProfessorDepartment of Science and Technology

StudiesRensselaer Polytechnic Institute of Technology

Tamara J. Erickson, Ph.D.Vice PresidentHealth IndustriesArthur D. Little, Inc.

Joseph G. Gall, Ph.D.Department of EmbryologyCarnegie Institution of Washington

Walter B. Goad, Ph.D.Theoretical Biology GroupLos Alamos National Laboratory

Leroy Hood, Ph.D., M.D.ChairmanDivision of BiologyCalifornia Institute of Technology

Horace Freeland JudsonHenry R. Luce ProfessorThe Writing SeminarsThe Johns Hopkins University

William W. Lowrance, Ph.D.Director, Life Sciences and Public Policy

ProgramThe Rockefeller University

Norman R. Pace, Ph.D.Professor of BiologyDepartment of BiologyInhiana University

Mark L. Pearson, Ph.D.Director of Molecular BiologyE.I. du Pent de Nemours & Co.

George Rose, Ph.D.ProfessorDepartment of Biological ChemistryHershey Medical CenterPennsylvania State University

Margery W. Shaw, M.D., J.D.Professor of Health LawUniversity of Texas Health Science Center

Dieter Soil, Ph.D.ProfessorDepartment of Molecular Biophysics and

BiochemistryYale University

Nancy S. Wexler, Ph.D.presidentHereditary Disease Foundation andAssociate Professor of Clinical

NeuropsychologyDepartments of Neurology and PsychiatryColumbia University

Raymond L. White, Ph.D.InvestigatorHoward Hughes Medical Institute, and

ProfessorDepartment of Human GeneticsUniversity of Utah School of Medicine

NOTE: OTA appreciates and is grateful for the valuable assistance and thoughtful critiques provided by the advisory panelmembers. The panel does not, however, necessarily approve, disapprove, or endorse this report. OTA assumes fullresponsibility for the report and the accuracy of its contents.

Page 5: Mapping Our Genes - Genome Projects: How Big? How Fast?

OTA Human Genome

Roger Herdman, Assistant

Project Staff

Director, OTAHealth and Life Sciences Division

Gretchen Kolsrud, Biological Applications Program Manager

Robert Mullan Cook-Deegan, Project DirectorPatricia Hoben, Analyst

Jacqueline Courteau, Research AssistantGladys White, Analyst

David Guston, OTA Summer Intern

OTA Support Staff

Sharon K. Oatman, Admimistrative AssistantLinda S. Rayford, Secretary/Word Processing Specialist

Barbara V. Ketchum, Clerical Assistant

Editor

Blair Burns Potter

Graphics

MedSciArtCo, Washington, DC

Contractors

(for list of topics and availability of reports See app. A)

Computer Horizons, Inc.Theodore Friedmann, University of California, San Diego

Jonathan Glover, New College, Oxford UniversityAllen Hammond, Consultant, Washington, DC

John Heilbron, University of California, Berkeley, andDaniel Kevles, California Institute of Technology

Horace Freeland Judson, The Johns Hopkins UniversityMarc Lappe, University of Illinois at Chicago

Steve Mount, Yale UniversityTeresa Myers, Consultant, Washington, DC

Richard Myers, University of California, San FranciscoPeter Newmark, London

Susan Rosenfeld, Science and the Law Committee, Association of the Bar of the City of New YorkDavid Weatherall, Oxford University

Akihiro Yoshikawa, Berkeley Roundtable on the International Economy, University of California, Berkeley

OTA Staff Reviewem

L. Val Giddings Kathi Hanna Lisa HeinzKevin O’Connor Gary Ellis Mark Schaefer

v

Page 6: Mapping Our Genes - Genome Projects: How Big? How Fast?

ContentsPage

Chapter 1: Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 2: Technologies for Mapping DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 3: Applications to Research in Biology and Medicine. . . . . . . . . . . . . . . . . . . . . . , . . . . . .

Chapter 4: Social and Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 5: Agencies and Organizations in the United States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

21

55

79

93

Chapter 6: Organization of Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...115

Chapter 7: International Efforts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............133

Chapter 8: Technology Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ., . . . . . . . . . . .165

Appendix A: Topics of OTA Contract Reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , ....179

Appendix B: Participants in OTA Workshops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..180

Appendix C: Estimated Costs of Human Genome Projects . . . . . . . . . . . . . . . . . . . . . ...........187

Appendix D: Databases, Repositories, and Informatics . . . . . . . . . . . . . . . . . . . . . . . ............189

Appendix E: Bibliometric Analysis of Human Genome Research . . . . . . . . . . . . . . . . . . . . . ......195

Appendix F: Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............196

Appendix G: Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..201

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............207

Page 7: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 1

Summary

Page 8: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTS

Debates About Mapping the Human Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The Focus of Genome Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Misplaced Controversy About “The Human Genome Project” . . . . . . . . . . . . . . . .The Core Issue: Resource Allocation for Research Infrastructure . . . . . . . . . . . . .Organization of This Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The Role of Congress. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Options for Action by Congress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Appropriations to Federal Agencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Access to Information and Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Organization of Genome Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Technology Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Questions for Congressional Oversight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

479

101111111112121517

Figure1-1. Comparative Scale of Mapping .

Table1-1. Principal Organizations Involved

Page. . . . . . . . . . . . . . ‘. . . . . . . . . . , . . . . . . . . . . , 5

TablePage

in Genome Proiects . . . . . . . . . . . . . . . . . . . 7L u ,

Page 9: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 1

Summary

“We want the maximum good per person; but what is good? To one person it is wilder-ness, to another it is ski lodges for thousands. To one it is estuaries to nourish ducksfor hunters to shoot at; to another it is factory land. Comparing one good with anotheris, we usually say, impossible because goods are incommensurable. Incommensurablecannot be compared.

Theoretically this may be true; but in real life, inconnnensurables are commensura-ble. All that is needed is a criterion of judgment and a system of weighing.”

Garret Hardin, “The Tragedy of the Commons, ”Science 162:1243-1248, 1968.

“Congress is the place where we make impossible choices between apples and oranges.We do it every year in preparing the largest budget on the planet.”

Congressional staff member, 1988.

“All legislative powers granted shall be vested in a Congress of the United States . . . .No money shall be drawn from the Treasury, but in consequence of appropriationsmade by law. . . .“

Article 1, U.S. Constitution.

The mysteries of inheritance are surrenderingto modern biology. Over a century ago, Austrianmonk Gregor Mendel demonstrated that the in-heritance of traits could be most simply explainedif it were controlled by factors passed from onegeneration to the next. These units of inheritancecame to be called genes. The complete set of genesfrom an organism is called its genome. Some traitsare best explained by inheritance of single genes(e.g., many genetic diseases, colorblindness), butmost, including many nongenetic diseases, involvecombinations of multiple genes with environ-mental factors.

Scientists discovered in the 1940s that genes con-sisted of DNA (deoxyribonucleic acid), and in the1950s they further elucidated the mechanisms ofinheritance. In 1953, Watson and Crick describedthe structure of DNA—the double helix—whichprovides at once an explanation of how geneticmaterial is inherited and how genes direct cellu-lar function. DNA encodes the blueprint for everyliving thing; it is packed into chromosomes whichcan be seen under a light microscope. The genomeof an organism can thus be defined as the DNAcomprising its chromosomes. Each human cell has46 chromosomes in 23 pairs. One chromosomeof each pair is inherited from each parent. DNA

consists of long chains of chemicals called nucleo -tide bases. There are four such bases, representedmost simply as A, C, T, and G. The order of basesmaking up DNA is called its sequence. The DNAsequence contains the instructions that specifythe production of molecules, usually proteins, thatprovide cellular structure and perform biochem-ical functions in the cell.

Our understanding of genetics has advancedremarkably in the last three decades as new meth-ods of manipulating and analyzing DNA have beendeveloped. Recombinant DNA technology enablesscientists to insert DNA from one organismdirectly into that of another, thereby allowingthem to study how genes function in relativelycontrolled conditions. New methods to detect andpurify small amounts of DNA, new techniques tohandle and analyze DNA that is millions of baseslong, and novel scientific instruments have aug-mented the tools scientists use to understandheredity. These powerful and rapidly evolvingtechnologies have provoked debate in recent yearsabout whether and how to mount a concertedresearch program to map the human genome andto determine its DNA sequence.

To date, the combined efforts of governmentagencies, university researchers, and private sup-

3

Page 10: Mapping Our Genes - Genome Projects: How Big? How Fast?

4

porters of biomedical research have producedrough but extremely useful maps of DNA mar-kers covering most regions of the human chro-mosomes. Chromosomal locations of over 1,215human genes are now known (of the 50,000 to150,000 estimated to exist), including those caus-ing all 20 of the most common genetic diseases.Sequencing of DNA from human beings has in-creased sharply in recent years, yet far fewer thanI percent of the more than 3 billion bases com-prising the human genome have been sequenced(see figure l-l). The function of only a few hun-dred human genes is known. Some genetic dis-orders are understood at the molecular level (e.g.,sickle cell disease and Tay-Sachs disease), but themechanisms underlying most genetic diseases re-main unknown. Genetic factors underlying otherdiseases are known only in barest outline.

The growing power and speed of research inmolecular biology have led to proposals to applynovel molecular biological methods to the geneticsof entire organisms. Research and technologyefforts aimed at mapping and sequencing largeportions or entire genomes are called genomeprojects. These proposals would build on experi-ence already gained from mapping lower organ-isms (e.g., yeast, nematodes, and bacteria) and se-quencing some virus genomes and regions of otherorganisms, yet they would be more ambitious inscale and complexity. More specifically, a publicdebate began in 1985 about the feasibility of map-ping, and perhaps sequencing, the human genomeand that of certain other organisms. The debatehas often been cast as an on-off decision aboutwhether there should be a concerted Federal ef-fort, yet this is an oversimplification. There aremany component projects at different stages of

completion: Systematically making maps of hu-man chromosomes is a continuation of ongoingefforts, for example. Databases for genetic infor-mation and repositories for research materials areessential whether or not there are other specialefforts. Developing new technologies is widelyagreed to be important and will require focusedresearch programs. The most contentious issueis whether the DNA sequence of all human chro-mosomes should be determined. There is littledoubt that large regions of human chromosomeswill be sequenced eventually, but there is vigor-ous debate about whether a massive, concertedsequencing effort is warranted. This remains anopen question that is likely to be resolved onlyafter pilot projects to determine the sequence ofother organisms, small human chromosomes, orchromosomal regions of special interest have beenperformed. Pilot projects can demonstrate thetechnologies and should also determine whetherdedicated sequencing efforts are efficient andscientifically sensible.

Two scientific advisory groups-one reportingto the Department of Energy (DOE) and the otherconvened by the National Research Council (NRC)of the National Academy of Sciences—recom-mended augmented funding of $200 million peryear for genome projects. An Office of Technol-ogy Assessment (OTA) workshop attempted to esti-mate the costs of major component projects. Pro-jections fell into the range of $45 to $50 millionper year initially, increasing to $200 to $250 mil-lion per year over 5 years. Funding recommen-dations made by the scientific advisory commit-tees would cover most but not all costs estimatedby OTA.

DEBATES ABOUT MAPPING THE HUMAN GENOME

The debate about mapping the human genome genes, based on direct observation of chromo-can be traced through several phases. Until the somes. In successful cases, the location of a gene1960s, techniques for locating human genes were could be specified within several million bases ofrudimentary, and human genetics was based pri - DNA.marily on analysis of inheritance patterns of dis-eases and other observable traits through family In the late 1970s and early 1980s, scientists tooktrees. In the late 1960s and through the 1970s, the first steps toward maps of human chromo-scientists developed the first maps of human somes based on direct biochemical analysis of

Page 11: Mapping Our Genes - Genome Projects: How Big? How Fast?

5

Figure 1-1 .—Comparative Scale of Mapping

UNTRY

OUNTY

CfiFIOMOSO

NUCLEOTIDE BASE PAIRS

The number of base pairs of DNA in human cells is roughly comparable to the number of people on Earth.The scale of genetic mapping efforts can be compared to population maps, with chromosomes (50 to 250 mil-

lion base pairs) analogous to nations, and genes (thousands to millions of base pairs) to towns.

Page 12: Mapping Our Genes - Genome Projects: How Big? How Fast?

...

6

DNA. DNA fragments of unknown function butknown location were used to study inheritanceof traits far more precisely than before. Calcula-tions suggested that DNA markers, which signifythe presence or absence of particular stretchesof DNA, could be identified for regions of all thehuman chromosomes.

Markers can be used to trace which pieces ofDNA, and therefore which parts of chromosomes,are inherited from which parent. When a genetictrait is caused by a single gene and that gene isclose to a marker, the marker can be used to ascer-tain roughly where the gene is located becausethe two are inherited together.

The U.S. Government and research agenciesabroad fund most research that uses DNA mar-kers to study diseases and physiological functionsand most university groups searching for newmarkers in chromosomal regions of particular in-terest. In the United States, the National Institutesof Health (NIH) are the largest funding sourcesfor biomedical research on genetics.

Construction of maps of DNA markers wasundertaken in the early 1980s, The two largestcollections of markers were developed by theHoward Hughes Medical Institute (HHMI), a pri-vate philanthropy, and Collaborative Research,Inc., a private corporation. Dozens of universityresearchers and other private firms also contrib-uted to this kind of genetic map.

In 1985, DOE began planning the Human Ge-nome Initiative to develop research tools formolecular genetics. Events leading up to the ini-tiative included a workshop convened by theUniversity of California at Santa Cruz and inter-nal planning by DOE administrators. DOE con-sidered the initiative an extension of its ongoingwork in molecular biology-largely focused on de-tecting mutations and other biological effects ofradiation and energy production-that would takeadvantage of research staff and instruments lo-cated at the national laboratories, which arefunded by DOE, DOE held several public meet-ings to discuss the technical possibilities. The firstof these was a workshop held in March 1986 inSanta Fe, New Mexico.

Discussion at that workshop of whether to estab-lish a reference sequence for the entire humangenome touched off a controversy that has per-sisted ever since. Arguments about the usefulnessof extensive sequence information reached a highpitch at a conference at Cold Spring Harbor Lab-oratories in June 1986. Many scientists perceiveda major sequencing effort as a threat to the con-duct of basic research in molecular biology be-cause of its projected cost and potential drain onresearch talent. Estimates of the cost of sequenc-ing alone (without accounting for mapping orpreparation of DNA to be sequenced) ran to bil-lions of dollars. Calls for central management ofsuch a prodigious undertaking further heightenedtension because of the strong tradition of decen-tralized, small-group research in molecular biol-ogy. Debate over the appropriate strategy fordeciding which regions to sequence first addedto the din and spilled over into the scientific press.Major newspapers and magazines have coveredthe debate since, giving the Human Genome Ini-tiative a high public profile,

The Cold Spring Harbor discussion was followedby a series of meetings held by HHMI, NIH, DOE,NRC, OTA, and others, Plans for special researchinitiatives by NIH, DOE, and HHMI have resultedfrom these and other discussions. A few privatecorporations have also been established (or arebeing established) to perform DNA sequencing andto develop research resources.

This report deals with various projects that havebeen proposed by Federal agencies to constructmaps of human and other chromosomes, to im-prove relevant databases and repositories, and toimprove research methods and instruments.There is no single human genome project, butinstead many projects. For 1988, there are spe-cific line items in appropriations for DOE and NIH,and the bulk of the discussion in this report refersto these new research programs. For purposesof this report, genome projects refers to the re-search programs of NIH, DOE, and HHMI, aswell as parallel programs in the private sec-tor or other nations.

Page 13: Mapping Our Genes - Genome Projects: How Big? How Fast?

7

THE FOCUS OF GENOME PROJECTS

Genome projects have several objectives:●

� �

to establish, maintain, and enhance databasescontaining information about DNA sequences,location of DNA markers and genes, functionof identified genes, and other related infor-mation;to create maps of human chromosomes con-sisting of DNA markers that would permit sci-entists to locate genes quickly;to create repositories of research materials,including ordered sets of DNA fragments thatfully represent DNA in the human chro-mosomes;to develop new instruments for analyzingDNA;to develop new ways to analyze DNA, includ-ing biochemical and physical techniques andcomputational methods;to develop similar resources for other organ-isms that would facilitate biomedical research;and possiblyto determine the DNA sequence of a largefraction of the human genome and that ofother organisms.

Genome projects underway or planned by DOE,NIH, the National Science Foundation (NSF), HHMI,and other organizations are different but over-lapping. They share two features: They would putnew methods and instruments into the tool kitof molecular biology, and they would build a re-search infrastructure for geneticists (see table 1-1).

DOE’s Human Genome Initiative began in late1986 and consists of several projects. One is tocreate an ordered set of DNA segments fromknown chromosomal locations; this set, if widelyavailable, could save the tedious steps involvedin isolating DNA for study once a gene’s approxi-mate location is known. It should also reduce need-less duplication of effort by different groups study-ing genes in the same chromosomal region. Asecond project is to develop new computationalmethods to enhance analysis of genetic map andDNA sequence data. Another project is to developnew techniques and instruments for detecting andanalyzing DNA, including automation and ro-botics. For these projects, DOE expended $4.2 mil-lion in 1987 and plans $12 million for 1988. It alsoplanned to support an additional $7 million in 1987

Table 1-1.—Principal Organizations Involved in Genome Projects

Organization Mission Funding ($OOO,OOOs)a

National Institutes of Health Biomedical research Life sciences: 6,170(Department of Health Related research: 313and Human Services) Genome projects: 17.2

NLM biotechnology databases: 3.83

Department of Energy Biological effects of Life sciences: 230(Office of Health and energy production and Related research: 7Environmental Research, radiation; use of national Genome projects: 12Office of Energy Research) laboratory resources

National Science Foundation Basic scientific research Life sciences: 206(Directorate of Biological, Related research: 32.7Behavioral, and Social Sciences Genome projects: 0.2

Howard Hughes Medical Institute Biomedical research Life sciences: 240Genetics: 40Genetic marker maps: 2 to 4Databases: 2

atlLife sciences” figureg are estimates for fiscal year 19w; these are total budgets for NIH and HHMI and estimates of relevant Pro9rams fOr NSF and DOE. Figuresfor “related research” include basic research projects that Involve maPPin9 or sewencin9, and research infrastructure such as databases and repositories. “Relatedresearch” figures are estimates for fiscal year 1987. “Genome Projects” figures are estimates for fiscal Year 1988, based on appropriations under the December 1987continuing resolution.

SOURCES: NIH: Rachel Levinson, personal communications, October, November, December 1887, January 1888; DOE: David Smith, personal communications, June,October 1987, January 1988; NSF: David Kingsbury, personal communications, June and November 1987; HHMI: George Cahill, personal communication,January 1988.

Page 14: Mapping Our Genes - Genome Projects: How Big? How Fast?

8

for related research and infrastructure. DOE hasrequested $18.5 million for direct support of itsHuman Genome Initiative in fiscal year 1989.

NIH has supported special genome projects since1987, with two objectives: to improve methodsfor analyzing the genome of human beings andother complex organisms and to enhance com-putational methods. NIH also supports most of therelevant databases and repositories. It spent anestimated $313 million on projects that involvedmapping and sequencing in 1987, and several mil-lion more on infrastructure. NIH plans somewhathigher spending for related research in 1988 andwill have two items in its budget—an additional$17.2 million for genome projects and $3.83 mil-lion for increased database support at the NationalLibrary of Medicine. The fiscal year 1989 budgetrequest for the National Institute of General Med-ical Sciences of NIH includes $28 million for ge-nome projects.

HHMI has two genome initiatives: one to sup-port key databases containing information aboutthe genetics of human and other organisms, andthe other to support biomedical research on basicgenetic mechanisms and genetic disease. HHMI’sbudget estimates from 1987 included $40 millionfor genetics (including $2 to $4 million for geneticmapping) and $2 million to support genomedatabases.

The NSF plans to increase the number of biol-ogy centers it supports, in order to develop newscientific instrumentation and encourage sharingof expensive equipment. These and other NSF pro-grams are not genome projects per se, but theyare likely to be integrated with programs of otheragencies in some locations. Instrumentation de-veloped through the biology centers will probablybe directly relevant to genome projects. NSF bud-get estimates for 1987 were $206 million for lifesciences, of which $32.7 million went to researchrelated to genome projects and $200,000 wentdirectly to genome projects.

Mechanisms for interagency coordination of ge-nome projects have evolved over the past 2 years.Initially, there was informal communicationamong DOE, NIH, NSF, and HHMI. The Federalagencies then formed a working group under the

Domestic Policy Council (DPC), a cabinet-levelgroup in the White House. A committee to replacethe DPC group is now being organized by theWhite House Office of Science and TechnologyPolicy (OSTP), but its exact composition and func-tion have not yet been determined.

International efforts are concentrated in devel-oped nations with strong research traditions, Map-ping genes, both human and nonhuman, has beenan international effort since its inception. Inter-national agreements for databases (particularlythose containing DNA sequence data) and collabo-rations on gene mapping (notably, the Center forthe Study of Human Polymorphism in Paris) havebeen in operation for several years. No foreigngovernment has made a commitment yet tomapping and sequencing the human genome,although several governments support relatedprojects through their usual mechanisms of re-search funding. The United Kingdom has sup-ported one of the pioneering efforts to map thegenome of a nonhuman organism and additionalwork to develop new mapping and sequencingtechnologies. Italy has the most specific commit-ment to the human genome: It funded several pi-lot projects (up to $1 million per year for 2 years)to map and perhaps sequence at least one smallhuman chromosome, with the intent of increas-ing that budget five- to ten-fold if the projects arepromising. France, the Federal Republic of Ger-many, and other Western European nations havesubstantial commitments to genetics research andare also discussing international cooperation.Canada’s medical research planning board is con-sidering special efforts for genome projects. TheEuropean Molecular Biology Laboratory and Euro-pean Molecular Biology Organization have ex-pressed interest in an international collaborationto map and sequence the genomes of human andnonhuman organisms.

Eastern European and Asian nations have ex-pressed interest in using the resulting data, butthey have relatively limited programs for geneticsresearch. Australia is one possible exception; ithas consistently increased its share of publicationsrelated to genetics over the last decade, and itwould logically be included in any internationalplanning. Japan is another exception. Its Scienceand Technology Agency has expended $3.8 mil-

Page 15: Mapping Our Genes - Genome Projects: How Big? How Fast?

9

lion to support automation of DNA sequencing several million dollars to study the feasibility oftechnologies, the Ministry of Education supports an expanded international effort called the Hu-a grants program in genetics, and the Ministry man Frontiers Science Program, which could in-of International Trade and Industry has devoted elude genome research projects.

MISPLACED CONTROVERSY ABOUT“THE HUMAN GENOME PROJECT”

Over the past several years, the debate aboutgenome projects has been vigorous–sometimesacrimonious. Many articles have appeared in thescientific press and the general press about “thegenome controversy.” The most conspicuous dis-agreements, however, have concentrated on is-sues that are not central to the conduct of genomeprojects, Disputes among the executive agencieshave been played up, belying the generally closecooperation among DOE, NIH, HHMI, privatefirms, and other groups in conducting theirrespective projects. International cooperationamong gene mappers and database managers hasbeen successful but has attracted little attention.Private corporations are already involved in manyof the projects that are furthest along. One firmhas developed an extensive map of human geneticmarkers, and others have developed instrumen-tation useful in research relevant to the mappingand sequencing of DNA. These companies haveoffered few complaints about barriers to technol-ogy transfer. Dissent has focused on the importance of and strategy for sequencing DNA ofthe entire genome, yet no agency has made acommitment to massive sequencing. The cur-rent commitment is to develop technologies thatwould make it faster and less costly and to im-prove databases to collect and disseminate the re-sulting information. DOE has expressed interestin a concerted sequencing program, but onlywhen technological development reduces its costto tens of millions of dollars, in several years atthe earliest.

Some of the debate can be attributed to the ti-tle that has often been applied to genome projects—the Human Genome Project. The term is a use-ful way to link research initiatives and to distin-guish them from ongoing programs for budgetplanning. It highlights the ultimate objective–understanding human biology by developing a

new set of research resources—and captures po-litical support and broad public interest. It hashad the effect, however, of generating rancorousdebate which has inhibited the development ofconsensus on how to improve the research infra-structure. The importance of maps, databases, andrepositories has been obscured by the controversyover massive DNA sequencing.

The title has had several other untoward effects.The Human Genome Project centers attention ex-clusively on human genetics, but understandinghuman genes will necessarily involve thestudy of other organisms. Many of the re-sources—particularly maps of human chromo-somes—will be focused on human beings; but tointerpret human genetic information, similar re-sources must be developed for other organisms.New instruments and methods will be applicableto all DNA.

The Human Genome Project invites confusionby implying that the human genome will be un-derstood when the project is over. The immedi-ate goal of genome projects is not complete un-derstanding, but creating tools to bring about suchunderstanding in the 21st century. Understand-ing encompasses all biomedical research; it doesnot distinguish genome projects from others. Themost ambitious possible goal of genome projectswould be to complete the most detailed map: areference sequence of the entire human genome.Even if this were agreed to and developed, it wouldnot yield immediate understanding of how thatDNA sequence is translated to make a human be-ing. It would not explain how nerve cells becomeconnected in the immensely complex anatomy ofthe brain. It would not even provide complete an-swers to how individuals differ or how they haveevolved. Sequence data, like other genetic infor-mation, is meaningful only when compared amongindividuals and correlated with biological function.

Page 16: Mapping Our Genes - Genome Projects: How Big? How Fast?

10

There is no single, monolithic Human GenomeProject. In fact, there are several distinct compo-nents at various stages of development. Some in-struments and many databases already exist; somegenetic maps are more than half complete; repos-itories for DNA used in research have only beenorganized in the past few years; and other projectsare planned but not yet begun. Whether therewill ever be large and expensive research facil-ities for component specific genome projects isan open question that can be answered only asthe technologies evolve.

The Human Genome Project conjures up im-ages of largescale projects such as the Man=hattan Project to build the first atomic bomb,the Apollo Project for a manned Moon land-ing, the space station, or the superconductingsupercollider. Genome projects do not belongin this category. Component genome projectswill not require budgets as large as such mega-projects, nor are the technical ends as focused.Genome projects must be distinguished from thesequencing of the entire human genome, whichis but a component still in the planning phase,There will be no single event such as the Moonlanding or the space shuttle launching, nor is there

likely to be construction of a new multi-billion-dollar facility such as the superconducting super-collider. Genome projects do not now require suchfacilities. Some projects may require facilities toperform services for mapping or sequencing inthe future, yet such facilities would not be largerthan the molecular biology centers already estab-lished at a few major research universities, Map-ping or sequencing facilities would differ only bybeing devoted to production work rather thanpure science. The results of genome projects arenot contingent on completion of large capital-intensive dedicated units, and the data and instru-ments will be integrated into biology and medi-cine as the projects progress. Genome projects are,in this respect, analogous to navigational chartsor road maps, which are useful even as they arebeing updated. Some persons believe a shortageof trained scientific and technical personnel in theUnited States could prove troublesome for molecu-lar biology, but the genome projects proposed thusfar are not so large in scale, even in comparisonto other areas of biology, as to cause shortagesin other areas. Genome projects are relatively mod-est compared to other large science projects nowunder consideration by the Federal Government,

THE CORE ISSUE: RESOURCE ALLOCATION FORRESEARCH INFRASTRUCTURE

Most issues that need to be addressed regard-ing genome projects are variations on the prob-lem of the commons: how to create and maintainresources of use to all. It can be difficult to de-velop goods useful to all if each individual has nodirect incentive to pay for them and only a feware adversely affected.

The core issue concerning genome projects isresource allocation. What priority should be givento funding databases, materials repositories,genetic map projects, and development of newtechnologies? Should genome projects have prece-dence over other projects important to biologicaland biomedical research? These projects will ben-efit the entire biomedical research community,and ultimately the Nation and the world, but theirfunding must be drawn from the same agenciesthat support basic research. Funding for genome

projects will thus be taken from agencies that sup-port research on neuroscience, cancer, immunol-ogy, and many other promising and rapidly mov-ing fields.

The flow of information from molecular biol-ogy is overwhelming the resources devoted to han-dling it. Federal agencies, HHMI, and other inter-ested groups are acting to manage the deluge.Research dedicated to improving databases,maps, repositories, and research methodspremises to increase efficiency overall by deing once systematically what would otherwisebe duplicated by many groups using moreprimitive technologies. Whether massive, con-certed DNA sequencing is similarly efficient canonly be demonstrated by trying it on a smallerscale.

Page 17: Mapping Our Genes - Genome Projects: How Big? How Fast?

11

ORGANIZATION

The following sections describe options for con-gressional action, Subsequent chapters addressthe issues raised here in greater detail. Chapter2 provides technical background and explains howgenome projects might be conducted. Chapter 3reviews how results might be used in biology andmedicine. Chapter 4 outlines some long-term so-cial and ethical issues surrounding human genomeprojects. Chapter 5 surveys agencies and organi-

OF THIS REPORT

zations in the United States actively supportinghuman genome projects. Chapter 6 discusses howgenome projects might be organized among theseagencies and organizations. Chapter 7 briefly sur-veys activities in foreign countries, and chapter8 presents issues involved in technology transfer,Appendixes contain background on material usedto produce this report, databases, costs of projects,and mapping and sequencing publications.

THE ROLE OF CONGRESS

Genome projects have come to the attention ofCongress for three reasons. First, they have be-come highly visible because of the extensive de-bate surrounding them. Second, they involve agen-cies in different executive departments; therefore,mechanisms for coordinating them are less clearthan if they were all in a single department. Third,results of genome projects will lead to new scien-tific and medical instruments for analysis of DNA,development of new genetic tests for use in clini-cal diagnosis, and other products and services,Techniques developed to analyze DNA will expe-dite biological research and will provide data andtechnologies crucial to the development of manynew products. In this sense, genome projectspromise economic returns, although the form and

magnitude of them are not predictable. Genomeprojects have thus been linked to internationalcompetitiveness in biotechnology and its economicimplications for American commerce in comingdecades.

Congress has three roles regarding genomeprojects:

1. annual appropriations to Federal agenciesfunding the projects;

2. authorization of actions by executive agen-cies to setup formal coordinating structuresor of specific mandates of agencies; and

3. oversight of agencies’ conduct of theirprojects.

OPTIONS FOR ACTION BY CONGRESS

Options for congressional action discussed here will set an upper limit on the size and numberbuild on the discussions above and those in chap- of projects that are federally supported; commit-tees 4 through 6. Background material and de- merit by executive agencies, and their granteestails can be found in those chapters. and contractors, will determine the speed and

scope of projects within those limits.Appropriations to Federal Agencies The critical judgment in appropriations is the

The pace of federally funded genome projects importance of the work to be supported relativewill be determined principally by the annual ap- to other research and activities supported by thepropriations set by Congress and by the execu- Federal Government. The two national scientifictive agencies’ commitment to the projects, Al- groups that have written reports on genomethough agencies retain some authority to projects, a DOE advisory subcommittee and an“reprogram” funds for activities that fall within NRC committee, have both recommended substan-their mandates, large efforts cannot be sustained tial additional funding for genome projects, even-without specific appropriations. Appropriations tually equaling $200 million per year. OTA inde-

Page 18: Mapping Our Genes - Genome Projects: How Big? How Fast?

12

pendently projected costs of genome projects ata workshop and through subsequent interviewsand letters. Appendix B summarizes cost estimates,including the history of those made by othergroups, and reviews the process OTA used tomake its estimates. The cost of funding all com-ponent projects was estimated as increasing from$47 million the first year to $228 million the fifthyear. This would permit strengthening of data-bases and repositories, construction of several va-rieties of chromosomal maps, development ofmany new technologies, and initiation of pilotprojects for DNA sequencing.

Access to Information and Materials

The information produced by genetics researchhas swamped existing management systems. Ma-terials to facilitate molecular genetic research havealso proliferated, straining the resources devotedto making them widely available. These manage-ment problems will intensify as new technologiesfurther accelerate research. Several of the genomeprojects are intended to systematically archive in-formation, collect and store research materials,and make information and materials widely avail-able to the research community. Improving data=base and repository services is imperativewhether or not other genome projects pmceed. If genetic mapping and sequencing initia-tives are pursued, then databases and repositorieswill be needed even more. Bills have been intro-duced to improve coordination of and access tomolecular biology databases through the NationalLibrary of Medicine. Each major repository anddatabase has its own advisory panel of outsidescientists. NIH has appointed an internal commit-tee to report on NIH-supported repositories. Twointernational meetings were held in 1987 to dis-cuss management of databases that contain DNAsequence data. NIH and DOE cosponsored a meet-ing on databases and repositories in August 1987,and appropriations to DOE and NIH have beenincreased to support databases and repositories.Congress has the options of maintaining currentfunding levels or increasing funds for databaseand repository services through the current sys-tem of agency planning and congressional over-sight. Seeking recommendations from an advisorycommittee on how to integrate the development

of databases and repositories with genome proj-ects is an additional option.

Organization of Genome Projects

Congress could pass legislation to organize hu-man genome projects—in fact, bills on organiza-tion have dominated discussion in Congress. Thereare four principal choices: 1) to designate a singleagency to coordinate the projects, 2) to establishan interagency task force, 3) to establish a nationalconsortium, or 4) to rely on congressional over-sight of interagency agreement and consultation.

Establishing an interagency task force throughlegislation or encouraging agencies to do so byoversight are the least problematic choices. Des-ignating a lead agency would be politically trouble-some and would risk interruption of ongoing re-search programs atone or more agencies. Devisinga single national consortium to manage the manydiverse genome projects is likely.to prove imprac-tical. See chapter 6 for a more detailed discussionof these options.

Designate a Lead Agency

Congress could choose to designate a leadagency to coordinate and provide principal fund-ing for genome projects. The chief advantage ofa lead agency is accountability through clear au-thority. The purpose of focusing authority wouldbe to reduce duplication of effort, to enhance co-ordination, and to give Congress a single agencyon which to concentrate oversight. The chief dis-advantage is that the difficult political process ofselecting a lead agency would delay progress anddiminish overall funding. If line item funding forgenome projects at the nonlead agency–NIH orDOE—were eliminated, then agreement wouldhave to be reached to add funds for the leadagency. This is a difficult process because it in-volves a completely different set of congressionalcommittees and subcommittees for each agency.The choice of a lead agency would likely precipi-tate a protracted battle among agencies and con-gressional committees, which could only serve todelay projects. Furthermore, activities of NIH,DOE, NSF, HHMI, and other organizations are com-plementary rather than competitive and duplica-tive. Appointing a Iead agency could complicate

Page 19: Mapping Our Genes - Genome Projects: How Big? How Fast?

13

planning for the other agencies. As an alterna-tive, each agency could take the lead in projectsbest suited to its mandate and expertise. Thiswould result in a task force or consultative ar-rangement, discussed below, rather than a singlelead agency. Designating a lead agency would at-tempt to centralize authority, but it is not clearthat this would improve efficiency, communica-tion, or coordination.

Designation of a lead agency for genome projectscould, paradoxically, diminish rather than enhanceaccountability to Congress. This follows from theorganizational structure of congressional commit-tees. Genome projects supported by NIH, DOE,and NSF are authorized by several committees andsubcommittees in both the House of Representa-tives and the Senate. Currently, each committeeor subcommittee has independent authority tooversee programs in agencies under its jurisdic-tion, and interest in human genome projects hasbeen high. Designating a lead agency would limitmost oversight responsibility to a single commit-tee. Further, a lead agency could not fully cen-tralize authority, because HHMI is a nongovern-ment organization. Picking a lead agency wouldbe politically difficult and is unlikely to occur un-less there is strong evidence of the advantagesof centralized authority for Federal efforts. Theevidence to date is quite to the contrary: Agen-cies are communicating, sharing personnel, usingcompatible peer review procedures, and jointlyfunding projects in overlapping areas.

Designating a lead agency might eliminate plural-ism in Federal funding of genome projects. Aninvestigator wishing to pursue a genome projectcan now apply to NIH and DOE, or NIH and NSFfor funding (depending on the nature of theproject), If there were a single lead agency con-trolling genome projects, the choices would belimited, diminishing the pluralistic funding thathas been a mainstay of American biology. If thelead agency had only an administrative role anddid not provide the greatest amount of funds, thenthere would be little point in calling it a leadagency.

Congress sets independent budgets for NSF, NIH,and DOE through different subcommittees in theHouse and Senate appropriations committees.

With several subcommittees involved, projectshave alternative sources of support in Congress.Designating a lead agency would reduce this flex-ibility. The danger of pluralism is that differentagencies will duplicate each other’s work, will failto cooperate, will fail to identify gaps in research,or will receive uncoordinated or inappropriateappropriations due to the absence of a clear au-thority structure. To date, such funding disarrayhas failed to materialize. There are checks andbalances in the congressional budget process,through the Office of Management and Budget(OMB), and through the interagency consultationgroup in OSTP.

Arguments for a centralized and highly orga-nized effort would be stronger if genome projectsaddressed a national health emergency, such asAIDS or polio, or if they were aimed at a singletechnical or scientific objective. But genomeprojects are many and diverse. Focused respon-sibility may nonetheless become necessary forsome of them. Mapping, for example, might bemore efficiently done at production centers asmethods mature, and DNA sequencing might re-quire dedicated facilities if the technologydemands high capital investment or central man-agement, If dedicated service centers are estab-lished, administration by a single agency or for-mal interagency agreement would be necessaryto ensure standardization and efficiency. Suchservices would only be components of overall ge -nome projects, however; integration of the vari-ous projects would still be needed.

If genome projects were neglected or inconspic-uous elements in agencies’ programs, then the ad-vantages of central oversight through a singleagency would carry more weight. This has notbeen the case. Genome projects have been givenhigh priority–first by DOE and more recently byNIH—and there has been extensive media atten-tion to agencies’ management of them. There isthus little danger in the foreseeable future thatgenome projects will receive insufficient attentionor that mismanagement will escape congressionalscrutiny.

The agency most affected by genome projectswill be the NIH. If Congress finds that the advan-tages of a lead agency outweigh the disadvantages,

Page 20: Mapping Our Genes - Genome Projects: How Big? How Fast?

14

then NIH is the natural choice for lead agency.This is because biomedical research is NIH’s cen-tral mandate, whereas NSF’s and DOE’s researchprograms include physical as well as life sciences.NIH funds over 10 times more genetics researchthan any other government or nongovernmentorganization, and researchers funded by NIH arethe most numerous of the intended beneficiariesof genome projects. Researchers supported byDOE, NSF, and other organizations have impor-tant contributions to make, however, and someprojects fall outside the mainstream of researchsupported by NIH. Genome projects that involveexpertise in physical science, engineering, andother fields outside biomedical research wouldbenefit from participation in or leadership by NSFor DOE. DOE in particular has vigorously pro-moted a Federal program to develop new tech-nologies and to create sets of ordered DNA frag-ments. Some DOE-supported projects are logicalextensions of work at the national laboratories,and DOE is the natural agency to conduct these.If NIH were designated the lead agency, it wouldbe important to recognize and plan for the ongo-ing efforts of DOE.

Establish an Interagency Task Force

The chief advantage of an interagency task forceis that it builds on existing research programs andplanning efforts in different agencies and doesnot require a single lead agency. A task force couldmonitor all genome projects, government and non-government, obtain scientific advice, foster com-munication, and make recommendations to Con-gress and the appropriate agencies. Discussion atan OTA workshop in August 1987 stressed thatagencies should have outside scientific advice andthat advice given to one agency should take intoaccount activities supported by other agencies.No advisory body exists to carry out this task. Thechief disadvantage of a task force is that no oneagency is accountable for the conduct of genomeprojects.

Creating a task force entails decisions about whoshould be represented, how appointments are tobe made, and where the task force would be lo-cated administratively. Legislation could specifythat it represent government, academic, indus-trial, and other relevant expertise and could stipu -

late the terms of membership and the appoint-ment process. The task force could be made partof a government agency (making it in effect thelead agency), administratively autonomous, or at-tached to an existing quasi-governmental institu-tion such as the National Academy of Sciences.Several bills to establish such coordination andadvisory groups have been introduced in the loothCongress and are likely to be acted upon in 1988.

Create a National Consortium

A consortium would involve one or more agen-cies in concert with private partners to supportgenome projects. The chief advantages of a con-sortium are administrative flexibility, possiblefunding by private firms to reduce governmentfunding, and direct involvement of industrialpartners—which would presumably hasten tech-nology transfer. Some potential disadvantages areunclear lines of authority (caused by competingneeds of government and private partners) andstatements by the private sector that genomeprojects should be funded exclusively by the Fed-eral Government (e.g., a poll taken by the Indus-trial Biotechnology Association). Accountabilitywould be complicated in two respects. First, thereare many genome projects, and it is difficult toimagine a single consortium that could overseethem all. Second, the possible commingling of gov-ernment and nongovernment funds could provetroublesome. Consortia might nonetheless beformed for specific tasks. Some genome projectsin technology development will undoubtedly beof great interest to industry and might attract pri-vate funding. Such projects (e.g., developing auto-mated DNA mapping instruments or DNA detec-tion methods) are likely to be highly focused,however, and organized at the local rather thanthe national level. Accountability would not beas diffuse for local consortia focused on specifictechnical objectives as for a single national con-sortium with multiple objectives and dozens ofprojects to manage.

The Technology Transfer Act of 1986 (PublicLaw 99-502) grants government agencies author-ity to form consortia with private corporationsand provides guidelines for doing so. PresidentReagan’s Executive Order 12591 (April 1987) fur-ther extends this authority and encourages fed-

Page 21: Mapping Our Genes - Genome Projects: How Big? How Fast?

15

erally owned laboratories to form consortia. Agen-cies thus have the requisite authority already. IfCongress finds terms of the 1986 bill inappropri-ate in some details—for example, regarding pat-ent policies or royalty arrangements—then thestatute could be amended or special measures re-lating to genome projects could be added asamendments to other bills.

One bill introduced early in the 100th Congresswould have established a national consortium spe-cifically to manage genome projects, but the billhas since been replaced by one that establishesa new advisory body (covered above as a taskforce). A national consortium is not the only, andperhaps not the most effective, way to obtain in-dustrial input for genome projects and to facili-tate technology transfer. Alternatives are to en-courage agencies to participate in the formationof local consortia; to facilitate exchange of indus-trial and academic expertise through training ex-change programs, symposiums, and other mech-anisms; and to include industrial representationon any national advisory groups.

Rely on Congressional Oversight

If Congress takes no explicit action, several out-comes are possible. Federal agencies could con-tinue planning processes similar to those followedin 1986 and 1987, consisting of informal commu-nication and coordination through an interagencygroup with members from NIH, DOE, NSF, OSTP,OMB, and other agencies. To date, NIH, DOE, andNSF have sought outside advice from variousstanding advisory committees, a practice that hasresulted in conflicting recommendations. Thisproblem could be remedied without legislation:The agencies could establish a single interagencyadvisory committee of outside experts appointedby the agencies or by a third party, such as theNational Academy of Sciences or a private philan-thropy. The advisory committee could report tothe agencies directly.

A Committee on Life Sciences is forming inOSTP. The interagency nature and conspicuous-ness of genome projects make them a natural topicfor this committee. OSTP is considering the crea-tion of a special subcommittee on genome projects.

Whether OSTP’S efforts meet the objectivesdesired by Congress will depend on effective co-ordination and an appropriate balance among gov-ernment, university, industrial, philanthropic, le-gal, bioethical, and other representatives on thesubcommittee. If OSTP’S subcommittee is com-posed exclusively of government representatives,then its primary function will be interagency com-munication. The main stumbling block to inter-agency planning to date has been conflicting ad-vice from outside advisory bodies, not lack ofinteragency communication. Pluralism in fund-ing is usually a virtue, but making conflicting rec-ommendations to different agencies is not. Anynational coordinating group should take a globalview of activities in all agencies and harmonizethe advice given them.

The chief advantage of relying solely on con-gressional oversight is that it requires no new leg-islation. One disadvantage is that interagencyagreement on appointments and operatingbudgets for a coordinating body might prove dif-ficult without a congressional mandate and mightnot initially include an appropriate range of non-government experts. Another potential disadvan-tage is that initiatives undertaken by an adminis-tration in the absence of legislation could crumbleunder the weight of later interagency disagree-ments or neglect by a subsequent administration.Flexibility is beneficial if projects are short-lived,but genome projects are not. Long-term stabilityis essential to the efficient conduct of genomeprojects because they will require sustained sup-port over many years. Oversight of agency actioncould nonetheless be all that is required. Deficien-cies of a task force set up by agencies could laterbe modified indirectly through congressional over-sight or threat of legislation.

Technology Transfer

Congress appropriates funds to support scien-tific research for several reasons, the principalone for biomedical research being to improvehealth. Increasingly, however, biomedical researchis being regarded as a national investment, andpolicies to facilitate economically fruitful appli-cations of new knowledge are receiving attentionin Congress. The process of exploiting new knowl -

Page 22: Mapping Our Genes - Genome Projects: How Big? How Fast?

16

edge for practical purposes is called technolo~transfer. Some persons favor increased fundingfor genome projects because they believe theprojects will lead to marketable products (instru-ments, research materials) or will accelerate re-search in areas that will later yield marketableproducts. Technology transfer can be improvedthrough patent policies, exchange of industrial andacademic personnel, symposiums for industrialand academic scientists, formation of consortiato develop specific technologies or services, andengaging industry in planning genome projects.Programs for exchanging personnel and spon-soring symposiums will fall to agencies throughnormal policy paths and can be monitored by Con-gress. Consortium formation and industry rep-resentation on planning bodies have been dis-cussed above. The remaining policy area is patentand copyright law.

Patent policies of Federal agencies have changeddramatically during the past decade. The Patentand Trademark Amendments of 1980 (Public Law96-517), as amended in 1984 (Public Law 98-620),were devised to facilitate commercialization of fed-erally sponsored research. President Reagan is-sued directives to Federal agencies in February1983 and April 1987 to this same end. And Con-gress passed the Technology Transfer Act of 1986(Public Law 99-502), which contains patent licens-ing and joint venture provisions with authorityto form consortia with private interests. Thesepatent policies, following outlines of policies pi-oneered by NIH and NSF in the late 1970s, en-courage institutions receiving Federal grants orcontracts to patent products and processes result-ing from federally funded work. A 1987 GeneralAccounting Office report judged that the policieshave increased patenting of research results.

Aside from a possible change regarding DOEpolicies (see ch. 8), genome projects raise no newquestions of patent or copyright law. Genomeprojects would be subject to the same statutes andexecutive orders as other scientific efforts. Thereis a clear role for congressional oversight, how-ever, in ensuring that data are shared promptlyand fully.

In mid-1987, proposals to form private corpo-rations to map and sequence the human genome

stirred a controversy. Scientists expressed con-cern that scientific exchange would be impededby such efforts and that information would besequestered through copyrights and patents. Ifprivate corporations do form to develop map andsequence data and research materials, they willoperate at private expense. If they are success-ful, scientists will have new information, services,and materials available for a price. If they fail, sci-entists should be no worse off, unless the gov-ernment fails to support work it would otherwisehave funded. To date, government agencies havenot dropped plans for genome projects becauseof corporate efforts.

Corporate efforts need not entail restricted ac-cess to information, Corporations can provideservices not appropriately performed by labora-tories conducting basic scientific research (e.g.,mapping, sequencing, or database management).Universities and large corporations can manageresearch facilities, such as the national labora-tories, under contract. Corporations could alsoparticipate in consortia focused on specific tech-nical objectives. Private firms could be given grantsto develop new methods under the Small Busi-ness Innovation Research program; they wouldretain title to inventions, but they would have thesame obligation to share data and materials asuniversities or other grantees. The essential pointis not whether a grantee or a contractor is auniversity or corporation, but whether the re-search results will be widely shared.

It is essential to ensure timely exchange of dataand materials from federally sponsored projects.Maps, databases, and repositories will be usefulonly if they are accurate and complete; they willbe complete only if all participants make promptcontributions. Inmost cases, patent requirementsshould not substantially delay disclosure of data.Many data will not be relevant to a patentable in-vention. When research results do include a pat-entable invention, advance planning for filing pat -ent applications should minimize delays. The mainoption for Congress in this area is to oversee theconduct of genome projects. Changes in agencypolicies for data exchange could be made if prob-lems emerge.

Page 23: Mapping Our Genes - Genome Projects: How Big? How Fast?

17

Congress could also direct agencies to make iteasier for persons receiving Federal grants or con-tracts to understand patent policies in the UnitedStates and abroad. At present, many of the pub-lished guidelines and regulations for NIH, DOE,and NSF are out of date. Investigators contemplat-ing genome projects will probably contact morethan one Federal agency for research support;it would be helpful to have a document summariz-ing the practices of the different agencies. Sucha document could also explain the benefits of fil-ing patents early and outline procedures forpatenting abroad.

Questions for Congressionalovers ight

Congressional oversight will most often involvean informal exchange among congressional staff,executive agency personnel, and other interestedparties, Oversight can be a potent incentive for

cooperation among agencies and for good con-duct of executive actions. Congress may wish tohold hearings from time to time to address suchquestions as: Are genome projects being efficient}administered? Are agencies duplicating efforts ongenome projects? Are agencies communicating ef-fectively? Are agencies ensuring that access toshared data is relatively easy and fair? Are data-bases receiving the information they need to bemost useful (e.g., map and sequence data)? Arecommercial opportunities being exploited? Areshared research resources being neglected? Areissues of special interest to Congress, such as so-cial and ethical implications of genome projects,being adequately addressed? Do genome projectssupported by Federal agencies reflect nationalneeds and social priorities? Are foreign go/ern -ments funding a proportionate share of geneticsresearch and the research infrastructure? Are for-eign governments sharing data and materials tothe same degree as LT.S. agencies?

Page 24: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 2

Technologies forMapping DNA

Page 25: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTSPage

Organization and Function of GeneticInformation . . . . . . . . . . . . . . . . . . . . . . . 21

What Is a Genome? . . . . . . . . . . . . . . . . . . 21How Are Genomes Organized? . . . . . . . . . 21What Is the Genetic Code? . . . . . . . . . . . . 21How Big Is the Human Genome? . . . . . . . 24How Does the Human Genome Compare

to Other Genomes?. . . . . . . . . . . . . . . . . 24Why Does Hereditary Information

Change? . . . . . . . . . . . . . . . . . . . . . . . . . . 25Genetic Linkage Maps . . . . . . . . . . . . . . . . . . 26Linkage Maps of Restriction Fragment

Length Polymorphisms . . . . . . . . . . . . . 28RFLP Mapping . . . . . . . . . . . . . . . . . . . . . . 28When Is a Map of RFLP Markers

Complete? . . . . . . . . . . . . . . . . . . . . . . . . 29Low-Resolution Physical Mapping

Technologies . . . . . . . . . . . . . . . . . . . . . . 30Somatic Cell Hybridization . . . . . . . . . . . . 30Chromosome Sorting . . . . . . . . . . . . . . . . . 31Karyotyping . . . . . . . . . . . . . . . . . . . . . . . . 32Chromosome Banding . . . . . . . . . . . . . . . . 33In Situ Hybridization . . . . . . . . . . . . . . . . . 33Other Methods for Mapping Genes . . . . . 34

High-Resolution Physical MappingTechnologies . . . . . . . . . . . . . . . . . . . . . . 3S

Cloning Vectors as Mapping Tools . . . . . . 35Physical Mapping of Restriction Enzyme

Sites . . . . . . . ... . . . . ....4.... 37Physical Mapping of Nonhuman

Genomes . . . . . . . . . . . . . . . . . . . . . . . . . 41Strategies for Physical Mapping of the

Human Genome . . . . . . . . . . . . . . . . . . . 43DNA Sequencing Technologies . . . . . . . . . 44

Automation and Robotics in Mapping andSequencing . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 2 References . . . . . . . . . . . . . . . . . . 48

FiguresuFigure Page2-1.2-2.2-3.2-4.

2-5.

2-6.

2-7.2-8.

The Structure of DNA . . . . . . . . . . . . . 22Replication of DNA . . . . . . . . . . . . . . . . 22Gene Expression . . . . . . . . . . . . . . . . . . 23Number of Human Gene LociIdentified From 1958 to 1987 . . . . . . . 24Separation of Linked Genes byCrossing Over of ChromosomesDuring Meiosis . . . . . . . . . . . . . . . . . . . 26Detection of Restriction FragmentLength Polymorphisms UsingRadioactively Labeled DNA Probes. ., 29Somatic Cell Hybridization . . . . . . . . . 31Chromosome Purification by theFlow Sorter . . . . . . . . . . . . . . . . . . . . . . 32DNA Cloning in Plasmids. . . . . . . . . . . 362-9.

2-10. Separation of Intact YeastChromosomes by Pulsed FieldGel Electrophoresis. . . . . . . . . . . . . . . . 38

2-11. Constructing a Library of ClonesContaining OverlappingChromosomal Fragments . . . . . . . . . . . 38

2-12. Making a Contig Map . . . . . . . . . . . . . . 392-13. DNA Sequencing by the Sanger

Method . . . . . . . . . . . . . . . . . . . . . . . . . 452-14. DNA Sequencing by the Maxam and

Gilbert Method . . . . . . . . . . . . . . . . . . . 462-15. Automated DNA Sequencing Using

Table2-1.2-2.

2-3.

Fluorescently Labeled DNA. ........................ 48

TablesPage

The Genetic Code . . . . . . . . . . . . . . . . . 23Haploid Amounts of DNA in VariousOrganisms . . . . . . . . . . . . . . . . . . . . . . . 25Amount of Genome Sequenced inSeveral Well-Studied Organisms . . . . . 47

Page 26: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 2

Technologies for Mapping DNA

ORGANIZATION AND FUNCTION OF GENETIC INFORMATION

what Is a Genome?

The fundamental physical and functional unitof heredity is the gene. Genetics is the study ofthe patterns of inheritance of specific traits. Thechemical bearer of genetic information is deoxy-ribonucleic acid (DNA). The DNA of multicellularorganisms such as insects, animals, and humanbeings is associated with protein in highly con-densed microscopic bodies called chromosomes.A single set, or haploid number, of chromosomesis present in the egg and sperm cells of animalsand in the pollen cells of plants. All body cells,or somatic cells, carry a double set, or diploid num-ber, of chromosomes, one originating from eachparental set, The entire complement of geneticmaterial in the set of chromosomes of a par-ticular organism is defined as its genome.

How Are Genomes organized?

Long before genetic material was identified asDNA, maps of genes on chromosomes were con-structed, and many of the details of transmissionof genes from generation to generation were elu-cidated [Judson, see app, A]. The gene for color-blindness, for example, was assigned to the hu-man X chromosome in 1911 (80), about 40 yearsbefore the discovery of the structure of DNA. Infact, it has been known for nearly a century thatthe genetic material:

● has a structure that is maintained in stableform,

● is able to serve as a model for replicas of itself,● has an information code that can be ex-

pressed, and● is capable of change or variation.

Each of these features can be described in molecu-lar terms based on the structure and function ofDNA.

To know how DNA controls cell function, andultimately the structure and function of an en-tire organism, it is necessary to understand its

structure. In multicellular organisms, DNA is gen-erally found as two linear strands wrapped aroundeach other in the form of a double helix. A DNAstrand is a polymeric chain made of nucleotides,each consisting of a nitrogenous base, a deoxyri-bose sugar, and a phosphate molecule (figure Z-1). The arrangement of nucleotides along the DNAbackbone is called the DNA sequence. There arefour nucleotides used in DNA sequences: adeno-sine (A), guanosine (G), cytidine (C), and thymi-dine (T). The two strands of DNA in the helix areheld together by weak bonds between base pairsof nucleotides. In nature, base pairs form onlybetween A’s and T’s and between G’s and C’s. Thesize of a genome is generally given as its totalnumber of base pairs.

A full genome of DNA is regenerated each timea cell undergoes division to yield two daughtercells. During cell division, the DNA double helixunwinds, the weak bonds between base pairsbreak, and the DNA strands separate. Free nucleo -tides are then matched up with their complemen-tary bases on each of the separated chains, andtwo new complementary chains are made (figure2-2). In human and other higher organisms, DNAreplication occurs in the nucleus of the cell. ThisDNA replication process was first proposed in1953 by Francis H.C. Crick and James D. Watson(19,73,74).

What Is the Genetic Code?

Most genes carry an information code that speci-fies how to build proteins. Proteins are an essen-tial class of large molecules that function in theformation and repair of an organism’s cells andtissues. Proteins can be components of essentialstructures within cells, or they can carry out moreactive roles in the overall function of a particularcell type. Included in this important class ofmolecules are hormones such as insulin, antibod-ies to fight cellular infections, and receptors onthe cell’s surface for modulating interactions be-

21

Page 27: Mapping Our Genes - Genome Projects: How Big? How Fast?

tween a particular tissue and its surroundings (68).Enzymes are a specialized group of proteins that

Figure 2-1 .—The Structure of DNA

Phosphate Molecule

/ Deoxyribose

/ /

Sugar Molecule

Nitrogenous

0

,

increase the rate of the biochemical processes thattake place in metabolism.

Proteins are long chains of smaller molecules,called amino acids, that fold into the unique struc-tures necessary for protein function. The infor-mation for generating proteins of specific amino

When DNA replicates, the original strands unwind and seweas templates for the building of new, complementary strands.The daughter molecules are exact copies of the parent, eachdaughter having one of the parent strands.SOURCE: Office of Technology Assessment, 1988.

Page 28: Mapping Our Genes - Genome Projects: How Big? How Fast?

23

acid sequences is found in the genetic code-a code ribonucleic acid (mRNA). The structure ofbased on sequences of nucleotides that are “read” ribonucleic acid (RNA) is very similar to that ofin groups of three (table 2-I). Genetic informa- DNA. Figure 2-3 illustrates the major steps in genetion is transmitted from DNA sequences to pro- expressi’on, namely:tein via another large molecule called messenger

Table 2.1.—The Genetic Code

Codon Amino Acid I Codon Amino Acid

Uuu Phenylalanine Ucu SerineUuc Phenylalanme Ucc SerineUUA Leucine UCA SerineUUG Leucme UCG Serine

Cuu Leucme Ccu ProlineCuc Leucine ccc ProlineCUA Leucine CCA ProlineCUG Leucme CCG Proline

AUU Isoleucine ACU ThreonineAUC Isoleucine ACC ThreonineAUA Isoleucine ACA ThreonmeAUG Methionine ACG Threonine

(start)

GUU Valine GCU ValineGUC Valine GCC AlanineGUA Valine GCA AlanineGUG Valine GCG Alanine

Codon Amino Acid

UAU TyrosineUAC TyrosineUAA stopUAG stop

CAU HistidineCAC HistidineCM GlutamineCAG Glutamine

MU AsparagineMC AsparagineAAA LysineAAG Lysine

GAU Aspartic acidGAC Aspartic acidGAA Glutamic acidGAG Glutamic acid

Codon Amino Acid

UGU CysteineUGC CysteineUGA stopUGG Tryptophan

CGU ArginineCGC ArginineCGA ArginineCGG Ar~inine

AGU SerineAGC SerineAGA ArginineAGG Arginine

GGU GlycineGGC GlycineGGA GlycineGGG Glycine

SOURCES: Office of Technology Assessment and National Institute of General Medical Sciences, 198S.

Figure 2“3.—Gene Expression

Each codon, or triplet of nucleotides inRNA, codes for an amino acid. Twentydifferent amino acids are produced froma total of 64 different RNA codons, butsome amino acids are specified by morethan one codon (e.g., phenylalanine isspecified by UUU and by UUC). In addition,one codon (AUG) specifies the start of aprotein, and three codons (UAA, UAG, andUGA) specify the termination of a protein.Mutations in the nucleotide sequence canchange the resulting protein structure ifthe mutation alters the amino acid speci-fied by a codon or if it alters the readingframe by deleting or adding a nucleotide.

U=uracil (thymine) A =adenineC=cytosine G =guanine

I CYTOPLASM

In the first step of gene expression, messenger RNA (mRNA) is synthesized, or transcribed, from genes by a process somewhatsimilar to DNA replication. In higher organisms, this process takes place in the nucleus of a cell. In response to certain signals(e.g., association with a particular protein), sequences of DNA adjacent to, or sometimes within, genes control the synthesisof mRNA. Protein synthesis, or translation, is the second major step in gene expression. Messenger RNA molecules are knownas such because t hey carry messages specif ic to each of t he 20 different amino acids that make UP proteins. Once synthesized,mRNAs leave the nucleus of the cell and go to another cellular compartment, the cytoplasm, where their messages are t rans-lated into the chains of amino acids that make up proteins. A single amino acid is coded by a sequence of three nucleotidesin the mRNA, called a codon. The main component of the translation machinery is the ribosome—a structure composed ofproteins and another class of RNAs, ribosomal RNAs. The ribosome reads the genetic code of the mRNA, while a third kindof RNA molecule, transfer RNA (tRNA), mediates protein synthesis by bringing amino acids to the ribosome for attachmentto the growing amino acid chain. Transfer RNAs have three nucleotide bases that are complementary to the codons in themRNA (see table 2-l).

SOURCE: Office of Technology Assessment, 1988.

Page 29: Mapping Our Genes - Genome Projects: How Big? How Fast?

— —

24

● transcription of DNA into mRNA, and● translation of mRNA into protein.

By these processes, the genetic code directs aminoacids to be joined together in the order specifiedby the sequence of nucleotides in the messengerRNA, which was in turn determined by the se-quence of nucleotides in the DNA.

In molecular terms, a gene is a region of achromosome whose DNA sequence can betranscribed to produce a biologically activeRNA molecule. Messenger RNAs constitute themajor class of biologically active RNAs. Other RNAsmay act as lattices to stabilize certain cell struc-tures or may participate directly in important cel-lular processes such as protein synthesis.

How Big Is the Human Genome?

The diploid human genome consists of 46 chro-mosomes—22 pairs of autosomes and 1 pair ofsex chromosomes (two X chromosomes for fe-males and an X and a Y chromosome for males).A single egg cell has 22 different autosomes anda single X chromosome, whereas sperm cells carry22 different autosomes and either an X or a Y chro-mosome.

Scientists estimate the total number of humangenes per haploid genome at 50,000 to 100,000.The characterization of the structure of humangenes on chromosomes was made possible re-cently through recombinant DNA technology (theuse of molecular biology tools to combine DNAfrom one organism with that of another). It is nowknown that human genes can vary in size fromfewer than 10)000 base pairs to more than 2 mil-lion. The entire haploid genome is approximately3 billion base pairs, So far, researchers are farfrom having determined where each human geneis located on the 24 chromosomes. Victor McKu-sick of The Johns Hopkins University maintainsMendelian Inheritance in Man, an encyclopediaof expressed genes [see app. D]. According to theOctober 1, 1987, count, 4,257 genes were repre-sented in the encyclopedia; of those, at least 1)200

had been mapped to specific chromosomes or re-gions of chromosomes (51). Figure 2-4 illustratesthe years of effort invested thus far in identify-ing even this small fraction of the total numberof human genes.

How Does the Human GenomeCompare to Other Genomes?

Before much was known about the DNA se-quences that make up genomes, it was thoughtthat the amount of DNA per haploid genomewould increase in proportion to the biological com-plexity of the organism. Since chromosomes canvary in size, the total amount of DNA in a haploidcell is a better indicator of actual genome size thanthe number of chromosomes. Table 2-2 shows thathigher plants and animals do have much moreDNA than lower organisms, There are some nota-ble exceptions, however, to the correlation be-tween overall genome size and complexity of theorganism, A good example is the salamander,which has a haploid DNA content more than 30times greater than that of humans, even thoughit is obviously a smaller, less complex organism.Similarly, the cells of some species of plants havea greater DNA content than human cells (72).

This inconsistency between DNA content andthe apparent complexity of an organism is knownto geneticists as the C-va.lueparadox (C-value refersto the haploid genome size). A great deal of re-search has been devoted to determining the sci-entific basis for the C-value paradox. Variations

Figure 2-4.–Number of Human Gene Loci Identified

1958

1966

1968

1971

1975

1978

1983

1986

1987

From 1958 to 1987

I I 1 1 1

0 1000 2000 3000 4000 5000SOURCE: Victor McKusick, The Johns Hopkins University Medical School, Balti-

more, MD.

Page 30: Mapping Our Genes - Genome Projects: How Big? How Fast?

25

Table 2-2.—Haploid Amounts of DNA inVarious Organisms

Organism Number of base pairs(millions)

Bacterium. . . . . . . . . . . . . . . . . . . . . .Yeast. ... , . . . . . . . . . . . . . . . . . . . . .Nematode . . . . . . . . . . . . . . . . . . . . . .Fruit fly. . . . . . . . . . . . . . . . . . . . . . . .Chicken . . . . . . . . . . . . . . . . . . . . . . .Human . . . . . . . . . . . . . . . . . . . . . . . .Mouse . . . . . . . . . . . . . . . . . . . . . . . . .Corn . . . . . . . . . . . . . . . . . . . . . . . . . .Salamander . . . . . . . . . . . . . . . . . . . .Lilv . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.71580

1551,0002,8003,000

15,00090,00090,000

SOURCES:B. Alberts, D.Bray, J. Lewis, et al., Mo/ecu/arBio/ogy of fhe Ce//(NewYork, NY:

Garland Publishing, 1983)C. Burks,GenBank”, LosAlamosNational Laboratory, Los Alamos, NM,personal

communication, March 1988.T. Cavalier-Smith (cd), The Evo/utionof Genorne Size (New York, NY: Wiley&

Sons, 1985)J. Darnell, H. Lodish, and D. Baltimore, Molecular Cc// Biology (New York, NY:

Scientific American, 1988),

in genome size usually arise from increases in theamount of DNA per chromosome, not from in-creases in numbers of chromosomes. The ge-nomes of all higher organisms contain sequencesof DNA that occur as large numbers of repeatedunits, either clustered in one chromosomal regionor in regions dispersed throughout the entire ge-nome. These repeated sequences contribute towide variations in total DNA content among whatare often closely related species.

In large genomes such as the human genome,intron sequences also contribute to size. Intronsare DNA sequences occurring within the codingregion of a gene. They are transcribed into mRNA,but are cut (spliced) out of the message beforeit is translated into protein. Introns can increasethe number of base pairs in a gene by more thantenfold. Many genes also have long regions at theirends that are transcribed into mRNA but are nottranslated into protein. In addition, some protein-coding genes have given rise to gene families thatmake several closely related protein products.Other gene families consist of hundreds or thou-sands of closely related genes (72).

The untranslated sequences within or at theends of genes, gene families, and moderately or

frequently repeated DNA sequences betweengenes still do not account for all of the DNA inthe genomes of higher organisms, nor for the var-iations in genome size among these organisms.

Many scientists interpret these facts to mean thatsome fraction of DNA in the human genome isexpendable; although there is little agreement onthe size of this fraction, some believe it to be morethan 90 percent of the genome (27,54). The impli-cation of the C-value paradox, that much of hu-man DNA is expendable, is one reason that someesteemed scientists do not favor a major effortto obtain a complete nucleotide sequence of thehuman genome. They believe time would be bet-ter spent identifying and understanding the func-tion of gene products that contribute to the cellu-lar processes leading to the development of anorganism as complex as man (l). On the otherhand, some scientists consider the C-value para-dox to be one of the many mysteries that mightbe unraveled once entire genomes have been ana-lyzed in greater detail.

Why Does Hereditary InformationChange?

Hereditary variation is the result of changesoccurring by mutation —a change in the sequenceor number of nucleotides—which occurs duringDNA replication. Mutations formed in sex cellsare inherited by offspring, whereas those that oc -cur in somatic cells remain only in the affectedorganism. Some diseases, such as certain humancancers, arise from factors in both of these cate-gories. Mutations are also acquired by artificialmeans, such as exposure to chemicals or certainforms of radiation.1 Such factors can cause achange in a single DNA base pair that may mod-ify or inactivate a protein, if one is encoded inthat region of the chromosome.

More extreme mutations, involving changes inthe structure of a single chromosome or changesin chromosome number, can occur; for example:

● deletion of a chromosome,● duplication of a chromosome or a piece of

a chromosome,

1A 1986 OTA report, Technologies for Detecting Heritable Muta-tions in Human Beings, addresses the kinds and effects of muta-tions in human beings and new technologies for detecting muta-tions and measuring mutation rates.

Page 31: Mapping Our Genes - Genome Projects: How Big? How Fast?

26

Figure 2-5.—Separation of Linked Genes by CrossingOver of Chromosomes During Meiosis

● translocation, or insertion of a chromosomefragment from one chromosome pair into anunmatched member of a different pair, and

. inversion, or the breakage of a chromosomefragment followed by its rejoining in the op-posite orientation.

In diploid cells, there is a tendency for each DNAmolecule to undergo some form of modificationor rearrangement with each cell division. The pro-genitors of sex cells area special class of diploidcells that undergo two rounds of cell duplicationin a process called meiosis. Meiosis results in fourinstead of two daughter cells, each with a haploidset of chromosomes. Before the first meiotic celldivision, each member of a chromosome pair isreplicated, forming two sets of chromosome pairs.At this stage, the cell has two identical copies ofchromosomes of maternal origin and two identi-cal copies of chromosomes of paternal origin. Alsoat this time, the chromosome pair of maternal ori-gin is in close association with that of paternalorigin, and an event called crossing over can oc-cur; that is, one maternal and one paternal chro-mosome can break, exchange corresponding sec-tions of DNA, and then rejoin (figure 2-5). (Thisprocess is also referred to as recombination.) Inthis way, two of the four resulting sex cells havechromosomes with new combinations of genes,while the other two cells carry the parental (origi-nal) combinations of genes. Since chromosomesoriginating maternally or paternally can carrydifferent forms of any given gene, new combina-tions of traits are created by such crossovers.

GENETIC LINKAGE

Because of recombination during meiosis, cer- others,tain groups of traits originating on one chromo- can besome are not always inherited together (figure

MAPS

so only genes on the same chromosomelinked.

2-5). The closer, or more linked, genes are on a Gene mapping, broadly defined, is the as-particular chromosome, the smaller the probabil- signment of genes to chromosomes. A geneticity that they will be separated during meiosis. Each linkage map permits investigators to ascertainchromosome is inherited independently of all one genetic locus relative to another on the ba-

Page 32: Mapping Our Genes - Genome Projects: How Big? How Fast?

. —

27

sis of how often they are inherited together.Strictly speaking, a genetic locus is an identifia-ble region, or marker, on a chromosome. Themarker can bean expressed region of DNA (a gene)or some segment of DNA that has no known cod-ing function but whose pattern of inheritance canbe determined, Variation at genetic loci is essen-tial to genetic linkage mapping, The markers thatserve to identify chromosome locations mustvary in order to be useful for linkage studiesin families, because only when the parentshave different forms at the marker locus canlinkage to a gene be followed in their children.Alleles are the alternative forms of a particulargenetic locus. For example, at the locus for eyecolor, there are blue and brown alleles. Duringmeiosis, all of the genetic loci on a chromosomeremain together unless they are separated bycrossing over between chromosome pairs.

Distance on genetic maps is measured by howoften a particular genetic locus is inheritedseparately from some marker. This measure ofgenetic distance is called recombination fre-quency. The amount of recombination is ex-pressed in units called centimorgans. One cen-timorgan is equal to a 1 percent chance of a geneticlocus being separated from a marker due to re-combination in a single generation.

During the generation of sex cells in human be-ings, if a gene and a DNA marker are separatedby recombination in 1 percent of the cases stud-ied, then they are, on average, separated by 1 mil-lion base pairs. The relationship between geneticmap distance (recombination frequencies) andphysical map distance (measured in DNA basepairs) can vary, however, by five-or even tenfold.Recombination can vary from near zero, if geneticloci are very close, to 50 percent, between geneticloci that are far apart on the chromosome or ondifferent chromosomes. Some chromosome re-gions are highly prone to recombination and ex-hibit high recombination frequencies, while otherchromosome regions appear to be resistant torecombination. Interestingly, the rate of recom-bination in the same region of a particular chro-mosome typically varies among males and females,and it is often greater in females. The reasons forthis have not been established. Double or multi-ple crossover events can also occur between two

loci that are widely separated. Each of these vari-ations in recombination frequencies complicatesthe relationship between genetic and physicalmaps. Nevertheless, if a genetic linkage map wereconstructed with a set of markers separated byan average of 1 centimorgan, then most genescould be located within a range of 100)000 to 10million base pairs.

Genetic linkage between two or more observa-ble traits can be established with greater certaintyin large populations. For this reason, large fam-ilies are preferred for mapping studies. If twogenetic loci are closely linked, then their separa-tion by recombination during meiosis is unlikelyand a large family must be studied to determinehow close they are on the genetic map. As moreloci are placed on the genetic map, it becomes pos-sible to determine the location of a new trait onthe basis of its inheritance pattern compared totwo or three others already on the map. The fre-quency with which multiple traits are inheritedtogether generally must be calculated for manyindividuals over many generations before geneticmapping results are statistically significant.

The X chromosome is particularly amenable tolinkage analysis because male traits directly re-flect the genes on the single X chromosomepresent. For this reason, the genetic linkage mapof the X chromosome is the most nearly completeof all chromosome maps.

Mapping of genetic loci on autosomes, on theother hand, is not as easy, unless the gene is foundto be linked to a marker that has already beenmapped through the study of family inheritancepatterns. The first assignment of a gene to a spe-cific autosomal chromosome came in 1968, whenresearchers showed that the Duffy blood group,which can be identified in families by biochemi-cal methods, is linked to a variation in chromo-some 1 (23). About the same time, the feasibilityof correlating specific genes with particular chro-mosomes or chromosome regions by a technol-ogy called somatic cell hybridization was demon-strated (75). This and other experimental methodsdeveloped in the 1970s radically advanced thestudy of human genetics, allowing investigatorsto locate autosomal genes on human genetic andphysical maps [Judson, see app. Al (50,58).

Page 33: Mapping Our Genes - Genome Projects: How Big? How Fast?

28

LINKAGE MAPS OF RESTRICTION FRAGMENTLENGTH POLYMORPHISMS

The advent of recombinant DNA technology inthe 1970s brought about a tremendously usefulnew way to create genetic linkage maps. Exami-nation of DNA from any two individuals revealsthat variations in DNA sequence occur at randomabout once in every 300 to 500 base pairs (37).These variations occur both within and outsideof genes, and most do not lead to functionalchanges in the protein products of genes. Kan andDOZY (40) were the first to demonstrate this phe-nomenon experimentally, by showing that one par-ticular DNA sequence, recognized by the restric-tion enzyme HpaI, was lost in certain individuals.(Restriction enzymes are proteins that recognizespecific, short nucleotide sequences and cut theDNA at those sites.) This alteration in the DNAcorrelated with the inheritance of sickle celldisease.

This important discovery led researchers to pro-pose that natural differences in DNA sequence(pol’orphisms) might replace other chemical andmorphological markers as a way to track chro-mosomes through a family (5,64). In addition topolymorphisms in restriction enzyme cutting sites,it is possible to detect differences among individ-uals in the number of copies of short DNA se-quences repeated in tandem. Polymorphic se-quences can occur within a restriction enzymecutting site or between sites. In either case, thelengths of DNA fragments generated upon cut-ting the DNA with restriction enzymes will varyamong individuals having different alleles at suchlocations. These polymorphic sequences are thuscommonly referred to as restriction fragmentlength polymorphism (RFLP) markers.

In 1983, genetic linkage between a RFLP markeron chromosome 4 and Huntington’s disease (a neu-rological disease that usually strikes its victims bythe age of 35) was discovered (31), paving the wayfor the general use of RFLPs as markers for genesresponsible for inherited disorders. The more fre-quently a RFLP marker is inherited with the gene,the more likely it is to be physically close to thegene, and hence the more useful it is as a genemarker. The major limitations to the usefulnessof RFLP markers are how polymorphic they are

( hOW much they vary among individuals), howmany other markers exist in the same region, andthe extent to which DNA samples of large fam-ilies are available for analysis (43,44).

RFLP Mapping

A RFLP map is a type of genetic linkage map,consisting of markers distributed throughout thegenome. Construction of the map involves deter-mining the linkages between RFLP markers, theirarrangement along the chromosomes, and thegenetic distances between them. RFLP markersare identified and mapped by comparing the sizesand numbers of restriction enzyme fragmentsgenerated from different individuals. Just asgenetic loci representing expressed DNA segmentshave alternate, or allelic, forms, so may RFLPs.The value of any marker depends mostly on howmany variants it displays. The more often themarker varies in a population, the more likely itis that an individual will inherit two differentalleles at the marker location (one on each mem-ber of a matched pair of chromosomes), makingit possible to detect recombination between mark-ers in that individual’s offspring (76).

In RFLP mapping, DNA obtained from whiteblood cells (lymphocytes) or other tissues of sev-eral different individuals are first cut into frag-ments using restriction enzymes (figure 2-6). Thefragments are then separated by size. This is ac-complished by a procedure called electrophore-sis, in which a mixture of DNA fragments of vari-ous sizes is placed in a polymeric gel (e.g., agarose)and then exposed to an electric field. Because thechemical makeup of DNA gives it a net negativecharge, the DNA fragments will travel in an elec-tric field toward a positive electrode. Large DNAfragments will move more slowly than small ones,thus the mixture is separated, or resolved, accord-ing to size. With very large pieces of DNA, theuse of restriction enzymes yields numerous frag-ments along the entire length of the gel, makingit necessary to identify RFLPs using radioactivelylabeled, single-strand segments of DNA called DNAprobes (65). RFLP markers are identified by vir-

Page 34: Mapping Our Genes - Genome Projects: How Big? How Fast?

29

tue of their ability to form base pairs (hybridize)with DNA probes that have complementary se-quences of nucleotides. Some useful probes for

Figure 2-6.—Detection of Restriction Fragment LengthPolymorphisms Using Radioactively Labeled

DNA Probes

Genomic DNA From Three Blood Samples

A B c

+

Expose on X-Ray FilmFor Autoradiography

Film

Variations in DNA sequences at particular marker sites areobserved as differences in numbers and sizes of DNA frag-ments among samples taken from different individuals (shownhere as samples A, B, and C).

SOURCE: Off Ice of Technology Assessment, 1988,

RFLP mapping are fragments of genes; others arerandomly isolated DNA segments that identifypolymorphisms; still others are complementary

to sequences with variable numbers of tandemlyrepeated, shorter sequences that occur frequentlywithin the genome. A technique called autoradi-ography is used to show the image of a band onan X-ray film wherever the agarose gel held a re-striction enzyme fragment that hybridized to theDNA probe. Where polymorphisms occur, differ-ent patterns will be observed among samples takenfrom different individuals [Myers, see app. A] (fig-ure 2-6).

When Is a Map of RFLP MarkersComplete?

Botstein and co-workers (5) predicted in 1980that only 150 different markers would be neededto link all human genes to chromosomal regionscontaining RFLPs. In practice, however, it has beenestimated that many more markers may have tobe studied and evaluated in order to find the min-imum number which would be randomly distrib-uted over the genome (45). It now appears thathundreds of DNA probes for highly polymorphicsequences, scattered widely over the genome, willbe required for a complete human linkage map(77).

With a l0-centimorgan map, for example, thereis a greater than 90 percent chance of being ableto determine the rough chromosomal location ofany gene associated with an inherited disease. Ray-mond White and colleagues at the Howard HughesMedical Institute at the University of Utah havetaken advantage of one such tandemly repeatedsequence, known as VNTR, to create a set ofprobes useful for making a complete RFLP mapof the human genome (76). White’s RFLP map, withcontinuously linked landmarks separated on aver-age by 10 centimorgans (about 10 to 20 millionbase pairs), is nearly complete. At the ninth inter-national Human Gene Mapping Workshop, heldin September 1987, he reported 475 markers cov-ering 17 human chromosomes, based on the DNAfrom 59 different three-generation families.White’s group and other geneticists believe thata l-centimorgan RFLP marker map, determinedfrom normal families and consisting of thousandsof markers spaced an average of 1 million base

Page 35: Mapping Our Genes - Genome Projects: How Big? How Fast?

30

pairs apart, would be the ideal research tool (seech. 3 for further discussion) (17,52).

Another group, led by Helen Donis-Keller at Col-laborative Research, Inc. (Waltham, MA), reportedits own RFLP linkage map, consisting of 403 mar-kers an average of 9 centimorgans apart. A newgene or marker on their map can be located rela-tive to the existing markers 95 percent of the time(24).

As physical markers that can be followed genetically, RFLPs are the key to linking thegenetic and physical maps of the human genome. RFLP linkage maps, as well as linkage mapsof expressed genes, can be correlated with band-ing patterns and other identifiable regions of chro-mosomes by somatic cell hybridization and in situhybridization. These and other relatively low reso-lution physical mapping technologies are de-scribed in the following sections.

LOW-RESOLUTION PHYSICAL MAPPING TECHNOLOGIES

A physical map is a representation of the loca-tions of identifiable landmarks on DNA. For thehuman genome, the physical map of lowest reso-lution is found in the banding patterns on the 22autosomes and the X and Y chromosomes observ-able under the light microscope. This map has atmost 1,000 landmarks (i.e., visible bands) (57).

Another type of relatively low resolution phys-ical map illustrates the positions of expressed seg-ments of DNA relative to certain regions of thechromosome or to specific chromosome bands.Expressed genes include those that are transcribedinto mRNA and then translated into protein, andanother class of essential genes that are tran-scribed into RNA but not translated into proteins.Included in the latter class are transfer and ribo -somal RNAs involved in protein synthesis, RNAsinvolved in the removal of intron sequences frommRNAs, and an RNA associated with the cellularprotein secretion machinery. Procedures are avail-able to make DNA copies, or complementary DNAs(cDAVIS), of RNA transcripts. These cDNAs can inturn be mapped to genomic DNA sequences bysomatic cell hybridization, in situ hybridization,and other low-resolution physical mapping meth-ods. A physical map illustrating the location ofexpressed genes is often referred to as a cDNAmap. As noted earlier, only 1,200 of the 50,000to 100,000 human genes have been physicallymapped to chromosomes.

A high-resolution physical map can be made bycutting up the entire human genome with restric-tion enzymes and ordering the resultant DNA seg-ments as they were originally oriented on the chro-mosomes. This third type of physical map, a contig

map, can be related to the maps of chromosomebands and expressed genes. The physical mapof highest possible resolution or greatestmolecular detail, is the complete nucleotide sequence of the human genome. Thus there is acontinuum of mapping techniques that rangesfrom low to high resolution (see table 2-2). Thesetechniques are discussed in this section, on low-resolution physical mapping, and in the follow-ing one, on high-resolution physical mappingmethods.

Somatic Cell Hybridization

The somatic cell hybridization technique forgene mapping typically employs human fibroblastand rodent tumor cells grown in culture. The hu -

Page 36: Mapping Our Genes - Genome Projects: How Big? How Fast?

31

man and mouse cells are fused (hybridized) to-gether using certain chemicals, Sendai virus, oran electric field, as illustrated in figure 2-7 (58).The chromosomes of each of the fused cells be-come mixed, and many of the chromosomes arelost from the hybrid cell. Human chromosomesare preferentially lost over rodent chromosomes,but there is generally no preference for whichhuman chromosomes are lost. The individual hy-brid cells are then propagated in culture and main-tained as cell lines. In practice, the hybrid cell linesresulting from cell fusions contain different sub-sets of between 8 to 12 human chromosomes inaddition to rodent chromosomes (58).

Using a large set (panel) of somatic cell hybridscontaining different chromosome combinations,it is possible to correlate the presence or absenceof a particular chromosome with a particular gene.Assignment of a gene to a chromosome is madeby detecting a protein produced by a hybrid cellline and associating it with the chromosomeunique to that cell line. Alternatively, if the geneto be mapped has already been isolated by DNAcloning procedures, then the gene can be useddirectly to identify complementary nucleotide se-quences in the DNA extracted from somatic cellhybrids.

Modifications of the somatic cell hybridizationmethod have been devised to generate single chro-mosome hybrids; to date, hybrid cells containingsingle copies of human chromosomes 7, 16, 17,19, X, and Y are available (58). Somatic cell hybridlines carrying chromosomes with deletion ortranslocation mutations are also useful low-reso-lution mapping tools because they make it possi-ble to infer the location of a particular gene.’

Chromosome Sorting

Chromosome sorting offers an alternative to thescreening of somatic cell hybrid panels for low-resolution gene mapping. In this approach, DNAhybridization is used to map genes to chromo-somes that have been differentiated by flow

The Institute for Medical Research, in Camden, New Jersey, estabIished a repository for SCHS with chromosome rearrangements calledthe Human Mutant Cell Library. The availability of this centralizedstorage facility has accelerated the rate of mapping human genesto specific chromosomal locations.

Figure 2-7.—Somatic Cell Hybridization

Human Sendai Mousefibroblast virus t u m o r c e l l

Somatic cell hybrids are generated by the process of cell fu-sion, an event that can be enhanced by adding Sendai virus.Initially, the hybrid cell contains complete sets of chromo-somes from both parental cells, but hybrids of human andmouse celIs are unstable and chromosomes from the humancells are preferentially lost. After a few generations in cul-ture, a line of hybrid cells is established that contains bothmouse and human chromosomes.

SOURCE: Office of Technology Assessment, 1988.

Page 37: Mapping Our Genes - Genome Projects: How Big? How Fast?

32

Photo credit: Larry Oaaven, Los A/amos National Laboratory, Los Alamos, NM

Flow cytometry facility for chromosome sorting atLos Alamos National Laboratow.

cytometry and purified by flow sorting. Fluores-cent markers that bind to chromosomes are usedin flow cytometry as the basis of separating chro-mosomes from one another in a flow sorter (fig-ure 2-8) (21,29,30,46). Because human chromo-somes differ in the degree to which they bind thefluorescent markers, it is possible to use this ap-proach to physically separate some chromosomesfrom others. The dual-laser chromosome sorterhas been used successfully to separate all the hu-man chromosomes except chromosomes 10 and11. In addition, chromosomes from cell lines withtranslocations and deletions can be used to nar-row the location of the gene to a certain chro-mosomal region (46).

To determine on which chromosome a gene lies,chromosomes are sorted onto different paperfilters made of nitrocellulose. There the DNA isdenatured and hybridized with a radioactively la-beled DNA probe complementary to the gene tobe mapped. (In general, the cDNA is available foruse as a probe for the gene.) On whichever chro-mosome the gene appears, the two sequences willhybridize, and the hybridization can be observedusing autoradiography.

Karyotyping

At a stage of cell division when chromosomeshave duplicated but not yet separated from one

Figure 2-8.—Chromosome Purification bythe Flow Sorter

Fluorescence / / /

for Analysis.

Chromosomes stained with a fluorescent dye are passedthrough a laser beam. Each time, the amount of fluorescenceis measured and the chromosome deflected accordingly. Thechromosomes are then collected as droplets.

SOURCE: Courtesy of Los Alamos National Laboratory, Los Alamos, NM

another, they condense to form structures withfeatures that can be observed under a light micro-scope. The structure of human chromosomes canbe studied by chemically fixing white blood cellsat the appropriate stage of cell division and thenphotographing the chromosome spreads as theyappear on slides under the microscope. Individ-ual chromosomes are identified in the photograph,cut out, and, in the case of autosomes, matchedwith their morphologically identical chromosomepartner to generate a karyotype. Karyotyping hasbeen most useful for correlating gross chromo-somal abnormalities with the characteristics ofspecific diseases (e.g., Down’s syndrome andTurner’s syndrome).

Page 38: Mapping Our Genes - Genome Projects: How Big? How Fast?

33

Photo credit: Roger Lebo, the University of California, San Francisco MedicalCenter. Reprinted with permission from Alan R. Liss, Inc.

Assignment of a gene and genes with related se-quences to specific human chromosomes. Samples ofthe 21 different human chromosomes were sorted onto11 circular filters and then hybridized to a radioactivelylabeled DNA probe from the aldolase gene (aldolaseis an enzyme involved in the metabolism of sugars).Most of the radioactive signal in the autoradiographappears on the filter with chromosome 9, indicatingthat the complementary DNA sequence is in that chro-mosome. The autoradiograph also shows some hybridi-zation to chromosomes 17 and 10, indicating that al-dolase genes with different, but similar, sequences are

found on these chromosomes.

Chromosome Banding

Using fluorescent dyes as chromosome-specificstains, Caspersson and others (10-I2) developedoptical methods for observing the banding pat-terns on human chromosomes. These methodsreveal more details of chromosome morphologythan does simple karyotyping. The bands are chro-mosomal regions that appear as stripes on chro-mosome spreads when viewed under the lightmicroscope. Each of the 24 different human chro-mosomes has a unique banding pattern, thus thebands can be used to identify individual chromo-somes. Genes can be mapped to specific bandsby identifying differences between the bandingpatterns on chromosomes from normal individ-uals and those on the chromosomes from an indi-vidual with a significant chromosomal alteration.

Nearly 1,000 distinct bands have been detectedon the 24 human chromosomes by staining andlight microscopy, and an average of 100 genes isrepresented in a single band (50). Chromosome

banding is a useful procedure for finding the gen-eral location of a gene, but it does not offer suffi-cient resolution to identify the exact position ofa gene relative to other genes mapped in the sameregion (58).

In Situ Hybridization

Family linkage and somatic cell hybridizationare not direct mapping methods; they are basedon the correlation between traits and the fre-quency of transmission of those traits in families.Karyotyping and analysis of chromosome band-ing allow a specific trait to be correlated with aparticular chromosome or a large region of a chro-mosome. Advances in molecular biology haveovercome the limitations of those techniques byproviding means for more precise mapping ofgenetic markers. One such method is in situhybridization of isolated genes or gene fragmentsto chromosomal DNA.

The in situ hybridization technique was origi-nally developed by Mary Lou Pardue and JosephGall for detection of genes encoding ribosomalRNAs in chromosomes from Drosophila salivaryglands (56). In the typical in situ hybridization ex-periment, the DNA corresponding to a particulargene or gene fragment is used to probe for com-plementary sequences in chromosomes (28). Thechromosomes to be analyzed are fixed on a micro-scope slide, where the DNA strands are chemi-cally treated and separated. Next, the radioactivelylabeled DNA probe is mixed with the chromo-somes on the slide. Under proper conditions, theDNA probe hybridizes with the gene sequencewherever it is located on the prepared chro-mosomes.

Results of in situ hybridization can be seen byexposing the slides to a photographic emulsionfor a long period, then analyzing the photographsunder a microscope. Wherever the radioactively

labeled DNA strands have paired with complemen-tary chromosomal regions, tiny silver grains ap-pear. The location of a specific gene can be foundby counting the number of grains in each regionand using computer methods to analyze the data(58). Although in situ hybridization has been aprincipal method for the mapping of human genesto autosomes, higher-resolution methods are nec -

Page 39: Mapping Our Genes - Genome Projects: How Big? How Fast?

34

1 2 3

13 14 15 16 17 18

essary. The procedure is limited to a resolutionof about 10 million base pairs, a substantial por-tion of the total length of most chromosomes. Sincemany genes could fit into such a region, the exactlocation of the gene of interest must still be de-termined precisely (58).

Other Methods for Mapping Genes

Several other techniques for mapping humangenes are available, including gene dosage map-ping and comparative mapping of species. In genedosage mapping, a correlation is made betweenthe amount of gene product and the presence ofextra genes or the absence of a gene or chromo-some fragment, Biochemical analysis of cellularcontents isolated from an individual with a par-ticular genetic disease, or from somatic cell hy-brid lines derived from that individual’s cells, isperformed to measure amounts of gene products.

The structure of the altered chromosome (or chro-mosomes) is then characterized by one or moreof the methods already described.

Comparative mapping of species can provideuseful human gene mapping information. This isparticularly true among mammals, where it hasbeen demonstrated that different species have sim-ilar patterns of gene organization on certain chro-mosomes [Computer Horizons, Inc., see app. A].For example, tabulations show that all of the hu-man autosomes except chromosome 13 have atleast two linked genes which are also linked inthe mouse (35).

Comparison of the banding patterns of chro-mosomes from different species have also proveduseful in matching chromosomes between spe-cies, even though differences in total numbers ofchromosomes exist. There is, for example, a strik-ing resemblance between chimpanzee and human

Page 40: Mapping Our Genes - Genome Projects: How Big? How Fast?

35

6 7 8 9 10 11 12

A female with the extra chromosome 21 associated with Down’s syndrome.

chromosomes (81). In recent years, DNA sequence have been used to isolate or confirm the identitycomparisons between widely differing organisms of specific human genes (see ch. 3).

HIGH-RESOLUTION PHYSICAL MAPPING TECHNOLOGIES

Construction of high-resolution physicalmaps of whole genomes involves cutting thecomponent DNA with restriction enzymes,analyzing the chemical characteristics of eachfragment, and then reconstructing the origi-nal order of the fragments in the genome. Gen-erally, the DNA fragments to be ordered are iso-lated from chromosomes; united with carrier, orvector, DNA molecules originating from viruses,bacteria, or the cells of higher organisms; and in-troduced into suitable host cells, where the iso-lated DNA can be reproduced in large quantities,A fragment of DNA is said to be cloned when itis stably maintained as part of a DNA vector ina single line of cells. A set of clones representing

overlapping segments of DNA encompassing anentire genome is called a genomic library. In or-der to make a physical map, the clones in thegenomic library must be ordered in relation toone another’s position on the chromosome. Thefollowing sections describe in more detail themethods currently available for creating high-resolution physical maps and their application tothe genomes of specific organisms.

Cloning Vectors as Mapping Tools

Any genome mapping project first requires theisolation, usually by cloning technologies, of frag-ments of chromosomal DNA. Several different

Page 41: Mapping Our Genes - Genome Projects: How Big? How Fast?

36

Photo credit: Stephen O’Brien, The National Cancer institute, Frederick, MD.Reprinted with permission from Nature 317:14G144, 1985.

A comparative alignment of chromosomes from the gi-ant panda (AME) and the brown bear (UAR). The puta-tive matches between the whole chromosomes, or chro-mosome segments, of each animal were based on thethickness of stained bands on the chromosomes andon the spacing and intensity of the bands. This typeof molecular information has been used to establishthe phylogeny of these bears and to demonstrate someof the problems in using the appearance of animals,

instead of their chromosome structure,in studies of evolution.

types of cloning vectors have been developed usingrecombinant DNA technology:

Plasmid vectors are circular DNA moleculesof 1,000 to 10,000 base pairs that can carryadditional DNA sequences in fragment insertsup to 12,000 base pairs (2,4). Plasmids existas minichromosomes in bacterial cells (usu-ally between 10 to 100 copies per cell) andare separate from the main bacterial chro-mosome.Phage lambda chromosomes are about 50,000base pairs and can accept foreign DNA insertsup to about 23,000 base pairs (33,79). Just asviruses infect human cells, phage infect bac-terial cells and generate hundreds of de-scendants.Cosmid vectors are plasmids that also con-tain specific sequences from the bacterialphage lambda. Cosmids are about 5,000 base

pairs, but because they contain phage lambdasequences, they can carry DNA inserts up toabout 45)000 base pairs (figure 2-9) (25,34,36).Yeast artificial chromosomes are dasrnidscontaining portions of yeast chrohosomalDNA that function in replication. These arti-ficial chromosomes can accommodate foreignDNA fragment inserts nearly 1 million basepairs long (6).

Figure 2-9.—DNA Cloning in Plasmids

Chromosomal DNAto be Cloned

Page 42: Mapping Our Genes - Genome Projects: How Big? How Fast?

37

Most of the physical mapping work carried outto date has employed bacteriophage and cosmidcloning vectors because the yeast artificial chro-mosome vectors have only recently been devel-oped [Myers, see app. A].

physical Mapping of RestrictionEnzyme Sites

With the exception of DNA sequencing, restric-tion enzyme mapping is the method that gives thehighest-resolution picture of DNA as it is organizedin a chromosome. Several basic steps are involvedin the construction of this type of physical mapfor part or all of a genome:

• purifying chromosomal DNA,• fragmenting DNA by restriction enzymes,. inserting all the resulting DNA fragments into

DNA vectors to establish a collection (library)of cloned fragments, and

● ordering the clones to reflect the originalorder of the DNA fragments on the chro-mosome.

Variations in any of these steps can affect the reso-lution of the physical map.

Purification of Chromosomal DNA

Whole chromosomes are the best source of DNAfor genomic libraries. Mixtures of chromosomescan be extracted directly from cells, but for organ-isms with complex genomes, such as human be-ings, it might be desirable to first separate thedifferent chromosomes and then create sets ofclones from the individually purified chro-mosomes.

Mixtures of whole chromosomes extracted fromhuman cells can be sorted by flow cytometry. So-matic cell hybrid lines carrying one or a few hu-man chromosomes can also be used as a highlyenriched source of particular chromosomes. Therefinement of existing methods and the devel-opment of new technologies for obtaininglarge amounts of purified human chromo-somes will be crucial in the early stages of hu-man genome mapping projects.

Fragmentation of DNA

The availability of chromosome fragments ofdecreasing size allows mapping at higher resolu-tion. A technology called pulsed field gel elec-trophoresis (PFGE) allows separation of DNAmolecules ranging in size from 20,000 to 10 mil-lion or more base pairs (8,9)13)61) [Myers, see app.A].

During PFGE, large DNA fragments are sub-jected to an electric field that is switched backand forth across opposite directions for shortpulses of time. This alteration in the direction ofthe electric field allows very large DNA molecules(up to tens of millions of base pairs) to migrateinto the agarose gel and separate from oneanother, even though the normal size limit for elec -trophoretic separation of DNA molecules duringconventional agarose gel electrophoresis is about50,000 base pairs. This method is so powerful thatit has been used successfully to separate all 14of the yeast chromosomes from each other (fig-ure 2-10) [Myers, see app. A]. Since intact humanchromosomes have an average size of approxi-mately 100 million base pairs, the PFGE techniqueis only useful for separating large fragments madefrom individual, purified human chromosomes.

The level of detail possible on a physical mapdepends on the restriction enzyme or enzymesused. There are a few restriction enzymes thatcut DNA very infrequently, generating small num-bers of large fragments (ranging from severalthousand to a million base pairs). Most restrictionenzymes cut DNA more frequently, generatinglarge numbers of small fragments (ranging fromfewer than a hundred to greater than a thousandbase pairs). The relative order of a small set oflarge fragments is easier to determine than theorder of a large set of short fragments, but it givesa lower-resolution physical map. The choice ofenzyme thus depends on the purpose of the phys-ical map. If the aim is to have fragments of a sizeamenable to DNA sequencing, then a mapped re-striction site at least every 500 base pairs wouldbe ideal, but a mapped site every 2,000 to 3,000base pairs would also be practical. Given thetechnology currently available, sequencingthe DNA of the 3-billion-basepair haploid hu-man genome might require the prior mapping

Page 43: Mapping Our Genes - Genome Projects: How Big? How Fast?

38

of as many as 6 million restriction enzyme cut-ting sites (69) [Myers, see app. Al.

Figure 2.11.—Constructing a Librarv of ClonesContaining Overlapping Ch;omosomal Fragments

Construction of Librariesof DNA Fragments

For physical mapping projects, it is importantto have as much DNA as needed. The use of clonedDNA fragments offers this advantage. Fragmentsof DNA from whole chromosomes are generallycloned into vectors such as plasmids, cosmids, DNA

Figure 2-10.—Separation of Intact Yeast Chromosomesby Pulsed Field Gel Electrophoresis

CHROMOSOME BAND SIZE (Thousands

of Base Pairs)

3AA

5

10

2

13

15 1125

1600

2500

wellSOURCES: Chris Traver and Ronald Davis, California Institute of Technology,

Pasadena, CA. SOURCE: Office of Technology Assessment, 1988.

Page 44: Mapping Our Genes - Genome Projects: How Big? How Fast?

39

phage chromosomes, and artificial yeast chromo-somes. These vectors can be stably maintainedin host cells (bacteria or yeast) that multiply rap-idly to provide the amounts of DNA necessary forrestriction enzyme mapping and DNA sequenc-ing. DNA fragments are usually cloned by cuttingthe vector of choice with a restriction enzyme andthen connecting the newly generated ends of thevector to the ends of the DNA fragments with theenzyme DNA ligase. The resuhing collection ofclones is called a library. There is no obvious or-der to the library, and the relationship betweenthe components can only be established by phys-ical mapping.

In order to establish that any two clones repre-sent chromosomal segments that normally occurnext to one another in the genome, it is neces-sary to have collections of clones representing par-tially overlapping regions of chromosomal DNA(figure 2-11). To create libraries of overlappingclones, the chromosomal DNA is treated with afrequent-cutting restriction enzyme, one that cutsevery 500 base pairs or so, but conditions are con-trolled so that the enzyme is not allowed to cutthe DNA at all the possible restriction enzyme sites.Instead, by lowering the amount of restriction en-zyme used, only partial cutting is allowed. Theexperimental conditions for partial cutting are ad-justed so that DNA fragments are generated withan average size equal to the vector’s capacity (usu-ally 20,000 to 50,000 base pairs). In theory, no oneof the cutting sites will be recognized by the re-striction enzyme more frequently than another,so a population of overlapping segments repre-senting all possible cutting sites in the original DNAsample should be generated. These fragments arethen cloned in the appropriate vector.

Determination of the Order of Clones

The clones in a library are ordered by subdivid-ing the chromosomal DNA inserts into evensmaller fragments and identifying which cloneshave some common subfragments. Figure 2-12 il-lustrates how this is done. A particular DNA clone(vector plus the chromosomal DNA insert) iscleaved with one or more restriction enzymes(other than that used to make the clones) underconditions in which all sites are recognized andcut. The resulting fragments are then run on a

Figure 2-12.—Making a Contig Map

Clone A Clone B Clone C Clone D

Clone A Clone B Clone C Clone D

LargerFragments

Analyze Data for RestrictionEnzyme Fragment Overlaps

Establish Order ofGenomic Clones

Clone A

Clone B

Clone D

SOURCE: Office of Technology Assessment, 1988.

gel made of agarose. After electrophoresis, a pat-tern of fragments is observed along the lengthof the gel. If the DNA fragments are present insufficient amounts, they can be seen under ultra-violet light after staining the gel with the dye

Page 45: Mapping Our Genes - Genome Projects: How Big? How Fast?

40

ethidium bromide; otherwise, the phosphates atthe ends of the DNA fragments are labeled witha radioactive isotope and viewed after autoradi-ography. A unique pattern of bands appears (cor-responding to DNA fragments) for any given clonebecause of the unique arrangement of restrictionenzyme sites in the region of the chromosomefrom which that clone was derived. If two clonescontain overlapping segments of DNA, then a por-tion of the banding pattern for each will be iden-tical. For example, if the clone order is A-B-C-D,then the restriction enzyme fragments from cloneA will partially overlap with those from clone B,

clone B fragments with clone C, and so on (figure2-12).

Groupings of clones representing overlapping,or contiguous, regions of the genome are knownas contigs (18)66). on an incomplete physical map,contigs are separated by gaps where not enoughclones have been mapped to allow the connec-tion of neighboring contigs. Of all the steps in phys-ical mapping, the connection of all the contigs isthe one that faces the greatest number of techni-cal problems. Therefore, the time required toachieve a complete physical map of any ge-

GENOMIC MAP OF BACTERIOPHAGE T4

ATRme, DNA

Genomic map of bacteriophage T4. The genome of bacteriophage T4 contains about 166,000 base pairs. Shown are mapsillustrating the names of mapped genes (outer circle), genetic map distances (second largest circle), which are measuredin mimdes in bacteria and their phage, and posit ions of DNA cutting sites for a variety of restriction enzymes (inner circles).

Page 46: Mapping Our Genes - Genome Projects: How Big? How Fast?

41

nome is a function of the time required to con-nect neighboring contigs.

Physical Mapping of NonhumanGenomes

So far, high-resolution mapping of entire ge-nomes has focused on nonhuman organisms. Mostof the technologies applicable to human geneticand physical mapping, therefore, have been de-veloped from work on other organisms. Mappingof complete genomes is well underway for sev-eral species of bacteria and yeast, for the nema-tode, and is beginning for the fruit fly. Theseorganisms have long served as excellent modelsystems for what are sometimes found to beuniversal genetic and biochemical mechanismsgoverning cell physiology. The technologies em-ployed in these high-resolution genome mappingprojects range from making contig maps to finemapping by DNA sequencing.

The Bacterial Genome

For many years, bacteria (mainly Escherichiacoli) and phage (viruses that infect bacteria) havebeen principal research subjects of molecular bi-ologists, molecular geneticists, and biochemists.Because of the relative ease of studying gene func-tion in E. coli, it is the organism whose geneticsand biochemistry are closest to being completelyunderstood. The DNA of this bacterium is con-tained in a single circular chromosome of 4.7 mil-lion base pairs (63). The genetic map of E. coli isquite extensive, with about 1)200 of the 5,000 orso known genes already cloned (3). In addition,the nucleotide sequence of over 20 percent of thisbacterial genome is known (26).

Progress on the physical map of the E. coli ge-nome is good. Cassandra Smith and co-workersat Columbia University made a complete physi-cal map of this genome using a restriction enzymecalled Not I, which cuts DNA only infrequently(63). Not I recognizes a sequence eight nucleotideslong that is expected to occur by chance onceevery 34,000 base pairs. Only 22 Not I sites werefound in the E. coli genome (63).

A higher-resolution physical map of E. coli wasgenerated by Kohara and colleagues (42) at NagoyaUniversity in Japan. These researchers devised

an innovative, rapid mass-analysis mapping ap-proach involving eight different restriction en-zymes. In a period of time equivalent to only one-half of a person-year, this group generated a high-resolution physical map covering 99 percent ofthe E. coli genome, leaving only seven gaps. Anindependent effort to generate a high-resolutionmap of the cutting sites for three different,frequent-cutting restriction enzymes is also nearcompletion at the University of Wisconsin, Madi-son (20).

The work of the researchers in Wisconsin andJapan is important because it generates an orderedset of clones. A map indicating the order of a li-brary of genomic clones is immediately useful toanyone wishing to examine DNA correspondingto a gene whose position on the map is known.The physical map is correlated with the geneticmap at many sites in E. coli, primarily as a resultof including in the analysis clones containingknown genes. Kohara and co-workers demon-strated that the use of large fragments for con-necting groups of clones is not necessary for E.coli. Because of the computational limitations onconnecting great numbers of small fragments,however, large-fragment maps, analogous to theNot I map of E. coli, will no doubt play a signifi-cant role in mapping large genomes, such as thehuman genome.

The Yeast Genome

An ongoing project to map the 15 million basepairs in the Saccharomyces cerevisiae (baker’syeast) genome has been described by olson andcolleagues (55) at Washington University. Theseresearchers initiated the mapping project to fa-cilitate the organization of the vast amount of in-formation already available on this organism. AsOlson writes:

Just as conventional cartography providesan indispensable framework for organizingdata in fields as diverse as demography andgeophysics, it is reasonable to suppose that“DNA cartography” will prove equally usefulin organizing the vast quantities of molecu-lar genetic data that may be expected to ac-cumulate in the coming decades (55).

Page 47: Mapping Our Genes - Genome Projects: How Big? How Fast?

42

A large fraction of the S. cerevisiae genome(about 95 percent) is available in clones that havebeen joined together in over 400 contiguouslymapped stretches. These contigs are being cor-related with a complete large-fragment restric-tion map for the yeast genome. These combinedmaps make it possible to construct or identify amapped region 30,000 to 100,000 base pairs inlength around virtually any starting point, typi-cally a cloned gene [Mount, see app. A].

The Nematode Genome

The nematode Caenorhabditis elegans is a popu-lar organism among developmental biologists be-cause the origin and function of all 958 cells inthe adult animal are known, offering research-ers the opportunity to study the basis of organis -mal development. With its 3-day generation time,C. elegans is also suited to genetic studies, Molecu-lar biologists, interested in the molecular basis ofdevelopment, would find an ordered set of clonesfrom the nematode genome particularly usefulfor their work [Mount, see app. A].

Coulson and Sulston at the Medical ResearchCouncil in England initiated a C. elegans mappingproject to provide such tools and to establish com-munications among the laboratories working onthis organism, Like the S. cerevisiae genome map-ping project, this resulted in a set of clones thatcovers most of the genome (18). One differenceis that the C. elegans clones are put into orderby the fingerprinting method: Distances from eachcleavage site for one enzyme to the nearest sitefor a second enzyme were measured, and clonessharing a number of such distances (measuredas lengths of restriction fragments observed onpolyacrylamide gels) were considered to overlap.This process makes identification of overlappingregions somewhat easier (because the informa-tion is denser), at the expense of more precisephysical map information. A second differenceis that cosmid clones were used in the nematodeproject, while phage clones were used in the yeastproject. Cosmid clones can accommodate largerDNA inserts than phage clones, but they can alsobe less stable, with portions of the inserts becom-ing deleted more often (17). At present, over 700contigs, ranging from 35)000 to 350,000 base pairsin length and representing 90 percent of the C.elegans genome, have been characterized (71).

The Fruit Fly Genome

The genetics of the common fruit fly, Drosophilamelanogaster, are the best characterized of anymulticellular organism. One reason for studyingfruit flies is that it is possible to carry out a saturat-ing screen to detect mutations of a particular type.In a saturating screen, every gene that could mu-tate to produce the defect being studied is identi-fied. (This accomplishment is crucial to a completeunderstanding of many cellular processes.) Thesaturating screen technique allows for a compre-hensive genetic analysis because the entire ge-nome can be examined for the presence of genesthat are involved in a particular process. The mostcelebrated example is an exhaustive study of mu-tations that are lethal to the fly in its larval stage(39,53,78) [Mount, see app. A].

Until recently, the physical mapping of the 165million base pairs in the D. melanogaster genomehad not been undertaken by any one laboratory.Roughly 500 to 1)000 genomic clones have beenobtained in various laboratories in various vec-tors; all of these clones have been localized to achromosomal map position by in situ hybridiza-tion to polytene chromosomes (a multicopy setof D. melanogaster chromosomes unique to its sal-ivary gland). A listing of these clones is maintainedby John Merriam and colleagues at the Univer-sity of California, Los Angeles, and the clones aremade available to all researchers [Mount, see app.A].

Work by Michael Ashburner and co-investiga-tors at Cambridge University on a comprehensivemap of overlapping cosmid clones of the D.melanogaster genome was approved for fundingby the European Economic Community in late1987. This project is expected to follow the fin-gerprinting strategy of the nematode project, withthe important difference that cytological maps(maps of banding patterns derived from micro-scopic analysis of stained chromosomes) of D.melanogaster chromosomes will be exploited.First, the technique of microdissection cloning(whereby DNA is excised from precise regions ofthe salivary gland polytene chromosomes andcloned) will be used to generate region-specificgenomic clones. These microdissection clones arenot of sufficient quality to be used directly, butthey can be used to correlate cosmids in a stand-

Page 48: Mapping Our Genes - Genome Projects: How Big? How Fast?

43

ard genomic library with specific chromosomalregions. This step makes it easier to assemble thecontiguous clones into groups. Finally, the posi-tion of all contigs with respect to the cytologicalmap will be confirmed by in situ hybridization,whereby cosmid clones from the various contigswould be hybridized to salivary gland chromo-somes [Mount, see app. A].

Strategies for Physical Mapping ofthe Human Genome

It is likely that making contig maps of large ge-nomes, such as the human genome, will requirea combination of bottom-up mapping and top-down mapping (55). Bottom-up mapping starts bymaking genomic clones, then fragmenting theseclones further to decipher the overlaps necessaryfor connecting clones into contigs. Top-down map-ping (e.g., Smith’s E. coli map) is of lower resolu-tion because it is derived from minimal fragmen-tation of source DNA. The critical distinctionbetween the two methods is the size of thegenomic DNA fragments used. Bottom-up map-ping starts with relatively small genomic clones,while top down mapping starts with large genomicclones. The advantage of top-down mapping is thatit offers more continuity (fewer gaps), while thebottom-up method has higher resolution (moredetail). In formulating strategies for mapping thehuman genome, it will be necessary to decide whatlevel of molecular detail is necessary to begin ahuman genome mapping project. Will information-rich strategies like those used to develop high-resolution E. coli restriction enzyme maps (20,42)or the DNA signposts offered by a RFLP map bethe best first-generation human genome maps?

Contig Mapping

Scientists in the fields of molecular biology andhuman genetics who reviewed an OTA contractreport on possible strategies for making contigmaps of the human genome [Myers, see app. A]favored the following strategy: to map the genomeone chromosome at a time, dividing and subdivid-ing each chromosome into smaller and smallersegments before beginning restriction enzymemapping and ordering of clones. After subdivi-sion, restriction maps of these smaller segmentswould be determined and the information linked

together to form continuous maps of whole chro-mosomes. In principle, this strategy could be bro-ken

1.2.

3.

4

5.

down into five consecutive steps:

isolation of each human chromosome,division of each chromosome into a collec-tion of overlapping DNA fragments 0,5 to 5million base pairs in length,subdivision and isolation of each of these chro-mosomal fragments into overlapping DNAfragments about 40,000 base pairs in length,determination of the order of the 40,000-base-pair DNA fragments as they appear in thechromosomes and determination of the po-sitions of cutting sites for a restriction enzymewithin each of these fragments, anduse of the mapping information gained in step4 to link together each of the overlapping o.5to 5-million-base-pair fragments isolated instep 2 [Myers, see app. A].

The substantial progress made so far on contigmaps of nonhuman genomes implies that tech-nologies already exist to begin construction of aglobal physical map of the human genome. Thehaploid human genome (approximately 3 billionbase pairs) is at least 30 times larger than thatof the nematode, the largest genome for whichcomprehensive physical mapping has been at-tempted. Sulston predicted that the mapping workhe and his co-workers have done over the past4 years could be repeated within 2 person-years,because much of their time was spent devisingcomputer methods for data analysis (17). If thesize of a genome were linearly related to the timerequired to physically map it, then the human ge -nome could be mapped to the same degree of com-pletion as the nematode genome (90percent) inabout 60 person-years. Such calculations are sim-plistic, however, because features of the humangenome other than its size make it potentially moredifficult to map. For example, some DNA se-quences are repeated frequently throughout thehuman genome, in contrast to the nematode ge-nome, and these are likely to interfere with thephysical mapping process.

Techniques for isolating large chromosomalfragments should offer solutions to some ofthe physical mapping problems expected toarise from the occurrence of repetitive se-quences in the human genome. The two most

Page 49: Mapping Our Genes - Genome Projects: How Big? How Fast?

44

promising methods developed to date are the PFGEtechnology (8)9)13,61) and the yeast artificial chro-mosome cloning technology (6).

A National Research Council advisory panel onmapping and sequencing the human genome rec-ommended improvements in technologies for thefollowing to facilitate the construction of physi-cal

maps of large genomes:

separating intact human chromosomes;separating and immortalizing identified frag-ments of human chromosomes;cloning the cDNAs representing expressedgenes, especially those that represent rarecell-, tissue-, and development-specificmRNAs;cloning very large DNA fragments;purifying very large DNA fragments, includ-ing higher-resolution methods for separatingsuch fragments;ordering the adjacent DNA fragments in aDNA clone collection; andautomating the various steps in DNA map-ping, including DNA purification and hybridi-zation analysis, and developing novel meth-ods that allow simultaneous handling of manyDNA samples (52).

DNA Sequencing

Strategies for sequencing the entire humangenome are much more controversial thanthose for generating contig maps. Some scien-tists favor sequencing only expressed genes, iden-tified with a cDNA map (17). Others propose thatsequencing should continue to be targeted at spe-cific regions of interest, as is currently done. Stillothers hold the view that the whole genomeshould be sequenced because it could reveal se-quences with important functions that wouldotherwise go unidentified (see ch. 3). The NationalResearch Council panel proposed first that pilotprograms be conducted with a goal of sequenc-ing approximately 1 millon continuous nucleo-tides (which is about five times as large as thelargest continuous stretch of DNA sequenced todate) (52). Second, improvements in existing DNAsequencing technologies would be vigorously en-couraged. Finally, extensive sequencing of othergenomes, including the mouse, fruit fly, nematode,

yeast, and bacterial genomes, was recommendedfor purposes of comparison (52).

The potential uses of human genome maps andsequences will likely dominate strategic decisionson which of the possible methods should be usedto construct them (ch. 3). The strategy currentlyfavored–preparing physical maps of individualchromosomes—requires that decisions be madeon which chromosomes should be mapped first.Mapping smaller chromosomes first in pilotprojects (e.g., chromosomes 21 and 22) would bethe logical strategy from a technical perspective.Alternatively, selecting chromosomes linked to thelargest numbers of markers for human geneticdiseases (e.g., chromosome 7 and the X chromo-some) might make the impact of genome mappingon clinical medicine more immediate. Efforts arealready underway in a number of U.S. and for-eign laboratories (ch. 8) to physically map (at vari-ous levels of resolution) human chromosomesknown to be of general clinical significance or tocarry genes of specific interest to the research-ers involved. Scientists at Los Alamos and Liver-more National Laboratories have begun mappingchromosomes 16 and 19, respectively. These chro-mosomes were chosen for their relatively smallsizes and number of clinically relevant geneticmarkers. Researchers at Columbia University havebegun work on a physical map of chromosome21 for similar reasons.

DNA Sequencing Technologies

Two methods for sequencing DNA are standardin laboratories today. One technique, developedby Fred Sanger and Alan Coulson at the MedicalResearch Council in England (60), uses enzymes(figure 2-13), while the other, developed by AlanMaxam and Walter Gilbert at Harvard University,involves chemicals that degrade DNA (figure 2-14) (48,49). The two methods differ in the meansby which the DNA fragments are produced; theyare similar in that sets of radioactively labeled DNAfragments, all with a common origin but terminat-ing in a different nucleotide, are produced in theDNA sequencing reactions.

George Church at Harvard Biological Labora-tories has adapted the Maxam and Gilbert DNAsequencing method in an innovative technology,

Page 50: Mapping Our Genes - Genome Projects: How Big? How Fast?

45

Figure 2.1 3.— DNA Sequencing by the Sanger Method

DNA Polymerase IdATPdGTPdCTPdTTP

Reaction Mixtures

Gel ElectrophoresisAutoradiography toDetect Radioactive

Bands

LargerF r a g m e n t s

SmallerFragments

called multiplex sequencing, that enables aresearcher to analyze a large set of cloned DNAfragments as a mixture throughout most of theDNA sequencing steps. Mixtures of clones areoperated on in the same way as a single samplein traditional sequencing. This is accomplished bytagging each DNA clone in the mixture with short,unique sequences of DNA in the first step and thendeciphering the nucleotide sequence of each

In the Sanger method, a cloned DNA fragment is mixed witha short piece of synthetic DNA complementary to only oneend (the origin) of the cloned fragment. An enzyme called DNApolymerase is then used to catalyze the synthesis of a com-plementary strand. During the polymerization reaction, a mod-ified nucleotide, a dideoxynucleotide, is included with a mix-ture of the four naturally occurring nucleotides (A, G, T, andC), one of which is labeled with a radioactive phosphorousor sulfur atom, causing growth of the DNA chain to stop when-ever the modified nucieotide is inserted. Four separate re-actions, each containing all four normal nucleotides but adifferent dideoxynucleotide, can be carried out. A series ofradioactively labeled DNA strands will be made, the lengthsof which depend on the distance from the origin to the nucleo-tide position where the chain was terminated. For example,if a short DNA template has four G’s, conditions are set upsuch that some molecules will be made with no G dideox-ynucleotide analogs, some will terminate at the fourth G po-sition, some at the third G position, and so on. Similarly, theother three dideoxynucleotides will insert infrequently andrandomly at the appropriate positions in the other threenucleotide-specific reactions. The series of labeled DNAstrands is subsequently analyzed by polyacrylamide gel elec-trophoresis. Radioactively labeled DNA is electrophoresedthrough a vertical slab of polyacrylamide gel (polyacrylamideis a polymeric resin in which DNA molecules from 1 to 400bases long can be separated from one another), an X-ray fiImis then placed over the gel and exposed, and the resultingautoradiograph shows a Iadderlike pattern of bands. The se-quencing reactions corresponding to each of the four differ-ent bases are run as four adjacent lanes on the polyacryla-mide gel, and the resulting ladders of bands are readalternately to give the sequence of the DNA.

SOURCE: Office of Technology Assessment, 1988.

cloned fragment in the final step. Multiplex se-quencing allows the simultaneous analysis of about40 clones on a single DNA sequencing gel, increas-ing the efficiency of the standard procedure bymore than a factor of 10 (14). Church and co-workers have been applying the multiplex se-quencing strategy to determine the completenucleotide sequences of two species of bacteria,E. coli and Salmonella typhimurium (14).

The major problem with current DNA sequenc-ing technology is the large number of DNA se-quences that remains to be determined. Multiplexis only one of several new sequencing protocolsthat could be of great value to large genome se-quencing projects. Church and Gilbert devised amethod related to multiplex sequencing that aI-10WS sequencing directly from genomic DNA (15).Another method, developed by researchers at Ce-tus (Emeryville, CA), involves the selective amplifi-cation of specific DNA sequences without prior

Page 51: Mapping Our Genes - Genome Projects: How Big? How Fast?

46

Figure 2-14.–DNA Sequencing by the Maxam and Gilbert Method

32Pof DNA

I T

AUTOMATION AND ROBOTICS

The longest single stretch of DNA sequence de-termined to date, the genome of the Epstein-Barrvirus, contains fewer than 200,000 base pairs. Thetotal number of nucleotides sequenced to dateusing both chemical and enzymatic sequencingtechnologies is about 16 million base pairs [Com-puter Horizons, Inc., see app. A]. This is the cur-rent size of GenBank@, the U.S. repository of DNAsequence data [app. D]. Since GenBank” includesonlv reDorted data, 16 million base Dairs repre-

In the Maxam and Gilbert procedure, chemical reactions spe-cific to each of the four bases are used to modify DNA frag-ments at carefully controlled frequencies. One end of onestrand in a double-stranded DNA fragment is radioactivelylabeled, and the labeled DNA is used in each of four separatereact ions and treated with a chemical that specifically nicksone or two of the four bases in the DNA. When these DNAmolecules are treated with another chemical, the DNA frag-ments are broken where the base was nicked and are de-stroyed. Just as in the Sanger sequencing method, the prod-ucts of the Maxam and Gilbert sequencing procedure arefragments of varying lengths, each ending at the G, C, T, orA where the chemical reaction took place. By limiting theamount of chemicals used in each of the base-specific re-actions so they will react only a few times per molecule, itis possible to obtain all possible double-stranded DNA frag-ments equal in length to the distance from the radioactivelylabeled origin to each of the bases. For any given DNA frag-ment sequenced, each of the four reactions is eiectrophoresedseparately, as described in figure 2-13, and the sequencingpatterns determined from the autoradiograph.

SOURCE: Office of Technology Assessment, 1988.

cloning (59). Each of these methods could poten-tially eliminate the steps of cloning and DNA prep-aration in sequence analysis (41).

Finally, DNA sequencing methods that do notinvolve either gel electrophoresis or chemical orenzymatic reactions have also been proposed. Atthe Los Alamos National Laboratory, researchersare investigating ways to use enhanced fluores-cence detection methods in flow cytometry as analternative to gel techniques for DNA sequenc-ing. others have suggested scanning tunnelingelectron microscopes to read bases directly on astrand of DNA (57,62).

IN MAPPING AND SEQUENCING

sents a low estimate of the total number of basepairs sequenced. Reported DNA sequences rangefrom those of small viruses to those of animalsand plants (table 2-3). So far, less than one-tenthof 1 percent (1.9 million base pairs) of the nearly3 billion base pairs in the haploid human genomehas been sequenced and reported (7). The cur-rent DNA sequencing rate is estimated to gener-ate only about 2 million base pairs per year ofsequence information (7), a powerful incentive for

Page 52: Mapping Our Genes - Genome Projects: How Big? How Fast?

47

Table 2-3.—Amount of Genome Sequenced inSeveral Well-Studied Organisms

Genome size PercentOrganism (base pairs) sequenced

Escherichia co/i (bacterium) . . 4.7 million 16Saccharomyces cerevisiae

(yeast) . . . . . . . . . . . . . . . . . . . 15 million 4Caenorhabditis elegan

(nematode) . . . . . . . . . . . . . . . 80 million .06Drosophila melanogaster

(fruit fly) . . . . . . . . . . . . . . . . . 155 million .26A4us musculs (mouse) . . . . . . 3 billion .04Homo sapiens (human) . . . . . . 2.8 billion .08SOURCES:C. Burks, GenBank@, Los Alamos National Laboratory, Los Alamos, NM, personal

communication, March 1988.GenBank” Release No. 54, December 1987.

devising methods of automating the proceduresinvolved in preparing for and carrying out DNAsequencing. Some recent reviews (16,38,41,47,57)provide detailed accounts of the robotic and auto-mated systems currently available and describethe kinds of systems being developed or plannedfor genome mapping and sequencing.

Any degree of automation will help lower theoverall costs of genome projects, both in time andin dollars. The primary objective in the use of auto-mation is standardization, driven by the need forrepetitive, highly accurate determinations (41).Some of the existing automated devices are de-signed for repetitive DNA cloning steps, such asthe preparation and restriction enzyme cuttingof cloned DNA samples. Similarly, efforts are be-ing made to automate the pouring, loading, andrunning of gels for separating DNA and for se-quencing DNA. Many of the steps in physical map-ping could be adapted to automation. Cloning pro-cedures, DNA probe synthesis, and DNAhybridizations are only a few of those being ex-plored for application to genome projects. A sys-tem that automates some steps in growing DNAclones, to be used, for example, as gene probesor for DNA sequencing, was recently introducedby Perkin-Elmer Cetus Instruments (Norwalk, CT)(67).

The area of automation that has received themost attention is DNA sequencing. An interna-tional workshop on automation of DNA sequenc-ing technologies was held in 1987 in okayama,Japan, and the proceedings give an extensive ac-

count of the state of the art from an internationalperspective (32). There are five steps in the taskof DNA sequence analysis:

1. cloning or otherwise isolating the DNA,2. preparing the DNA for sequence analysis,3. performing the chemical (Maxam and Gilbert)

or enzymatic (Sanger) sequencing reactions,4. running the sequencing gels, and5. reading the DNA sequence from the gel.

Steps 3 through 5 are the functions most oftenperformed by the instruments developed as ofearly 1988 (16,57, 70); however, none of the com-panies involved has yet commercialized an in-tegrated system that performs all of the functions.

In 1986, Applied Biosystems, Inc. (Foster City,CA) introduced the first commercial, automatedDNA sequencer (16). This instrument was madeusing technology developed by Leroy Hood andco-workers at the California Institute of Technol-ogy. This and similar machines perform steps 4and 5. The Applied Biosystems, Inc. system is basedon the Sanger sequencing reaction, with modifi-cations to use different fluorescent dyes insteadof radioactive chemicals to label the primers. Be-cause the sequencing reaction primers are in-dividually labeled with different dyes, each of thefour enzymatic reactions can be run together ina single lane on the polyacrylamide gel. A laseractivates the dyes, and fluorescence detectors readthe DNA sequence at the bottom of the gel as eachfragment appears. The sequence is determineddirectly by a computer (figure 2-15). E.I. du Pentde Nemours & Co. (Wilmington, DE) introducedin 1987 an automated system that slightly modi-fies the technology used by Hood and Applied Bi-osystems, Inc.; this system can potentially reducethe number of artifacts read by the fluorescencedetectors. Hitachi, Ltd. (Tokyo, Japan) is also ex-pected to market an instrument that automatessteps 4 and 5, but it too is based on the fluores-cence technology developed by Hood and col-leagues. In early 1988, another U.S. company,EE&G Bimolecular (Wellesley, MA), began mark-eting a machine that automates the same DNAsequencing and gel-reading methods used manu-ally in most laboratories. Bio-Rad Laboratories(Richmond, CA) marketed an instrument that

Page 53: Mapping Our Genes - Genome Projects: How Big? How Fast?

48

Figure 2“15.—Automated DNA SequencingUsing Fluorescently Labeled DNA

1 ‘ -I I

Light

ScanningLaser ExcitesFluorescent Dye

( ‘ 1SOURCE: Leroy Hood, California Institute of Technology, Pasadena, CA.

scans autoradiographs of DNA sequencing gelsand analyzes the data.

Most of these DNA sequencing systems arebased on manual enzymatic sequencing reactions,while the gel running and reading are automated.Only one commercial enterprise, Seiko of Japan,has reported automating the chemical or en-

zymatic steps (step 3) in the DNA sequencing pro-tocol (7o). In addition, the University of Man-chester Institute of Science (Manchester, England)has built an automatic reagent manipulating sys-tem to carry out the Sanger sequencing reactions(47).

Robotics are used to give automation flexibility,to extend its capabilities to complex operationstypically performed by highly skilled laboratoryworkers. Conceivably, laboratory robots wouldallow programmable devices to do physical workas well as to process data (41). Several robotic de-vices have been designed and used successfullyby companies involved in the commercializationof recombinant DNA products or processes. Ge-netics Institute’s (Cambridge, MA) Autoprep” Plas -mid Isolation System provides small quantities ofplasmid DNA and vector DNA for DNA sequenc-ing (22). Researchers at the same company alsodeveloped a robot to purify and isolate syntheticoligonucleotides for use as probes in cloning andDNA sequencing (38).

Technical advances are occurring rapidly andsimultaneously in biology, robotics, and computerscience, so it is difficult to predict what the fu-ture will bring in the development of automatedtechnology. Some yet-to-bedeveloped technologycould make many of the current physical map-ping procedures obsolete.

CHAPTER 2 REFERENCES

1. Ayala, F.J., “Two Frontiers of Human Biology: Whatthe Sequence Won’t Tell Us,” Issues in Science andTechno]gy, (Spring) :51-56, 1987.

2. Bernard, H. U., and Helinski, D. R., “Bacterial Plas-mid Cloning Vehicles, ” in Genetic Engineering, VOI.2, J,K. Setlow and A. Hollaender, eds., (New York:Plenum Press, 1980).

3.Blattner, F.,University of Wisconsin, Madison, per-sonal communication, October 1987.

4. Bolivar, F., and Backman, K., “Plasmids of E. Colias Cloning Vectors, ” Methods in Enzymolog68:245-267, 1979.

5. Botstein, D., White, R. L., Skolnick, M., et al., “Con-struction of a Gen,etic Linkage Map in Man UsingRestriction Fragment Length Polymorphisms,”American Journal of Human Genetics 32:314-331,

6.

7.

8.

9

10

1980.Burke, D.T., Carle, G. F., and Olson, M. V., “Cloningof Large Segments of Exogenous DNA Into YeastUsing Artificial-Chromosome Vectors)” Science236:806-812, 1987.Burks, C., Los Alamos National Laboratory, LosAlamos, NM, personal communication, February1988.Carle, G. F., Frank, F., and Olson, M. V., “Elec-trophoretic Separations of Large DNA Moleculesby Periodic Inversion of the Electric Field,” Science232:65-68, 1986.Carle, G. F., and Olson, M. V., “An ElectrophoreticKaryotype for Yeast ,“ Proceedings of the NationalAcademy of Sciences USA 82:3756-3760, 1985.Caspersson, T., Lomakka, C., and Zech, L., ‘( Fluores-

Page 54: Mapping Our Genes - Genome Projects: How Big? How Fast?

49

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

cent Banding,” Hereditas 67:89-102, 1971.Caspersson, T., Zech, L., and Johansson, C.,“Differential Banding of Alkylating Fluorochromesin Human Chromosomes, ” Experimental Cell Re-search 60:315-319, 1970.Caspersson, T., Zech, L., Johansson, C., et al.,“Quinocrine Mustard Fluorescent Banding,” Chro-mosome 30:215-227, 1970.Chu, G., Vollrath, D., and Davis, R. W., “Separationof Large DNA Molecules by Contour-Clamped Ho-mogeneous Electric Fields,” Science 234: 1582-1585,1986.Church, G. M., “Genome Sequence Comparisons)”grant proposal, Feb. 10, 1987.Church, G. M., and Gilbert, W., “Genomic Sequenc-ing, ” Proceedings of the National Academy of Sci-ences USA 81:1991-1995, 1984.Connell, C., Fung, C., Heiner, J., et al., “AutomatedDNA Sequence Analysis, ’’BioTechniques 5:342-348,1987.Costs of Human Genome Projects, OTA, workshop,Aug. 7, 1987.Coulson, A., Sulston, J., Brenner, S., et al., “Towarda Physical Map of the Genome of the NematodeCaenorhabditis elegans)” Proceedings of the Na-tional Academuv of Sciences USA 83:7821-7825,1986.Crick, F. H. C., and Watson, J. D., “The Complemen-tary Structure of Deoxyribonucleic Acid,” Proceed-ings of the Royal Society(A) 223:80-96, 1954.Daniels, D. L., and Blattner, F. R., “Mapping UsingGene Encyclopedias, ” Nature 325:831-832, 1987.Deaven, L. L., Van Dilla, M. A., Bartholdi, M. F., etal., “Construction of Human Chromosome-SpecificDNA Libraries Fran. Flow. Sorted Chromosomes,”Cold Spring Harbor Symposia on Quantitative Bi-ology 51:159-167, 1986.

22. DeBonville, D.A., and Riedle, G. E., “A Robotic Work-station for the Isolation of Recombinant DNA, ” inAdvances in Laboratory Automation-Robotics, vol.3, p. 353.

23. Donahue, R. P., Bias, W.B., Renwick, J. H., et al.,“Probable Assignment of the Duffy Blood GroupLocus to Chromosome 1 in Man,” Proceedings ofthe National Academy of Sciences USA 61:949-955,1968.

24. Donis-Keller, H., Green, P., Helms, C., et al., “AGenetic Linkage Map of the Human Genome, ” Cell51:319-337, 1987.

25. Evans, G. A., and Wahl, G. M., “Cosmid Vectors forGenomic Walking and Restriction Mapping,” inMethods in Enzymology: A Guide to MolecularCloning, vol. 152, (in press).

26. Foley, B., Nelson, D., Smith, M.T., et al., “Cross-

Sections of the Genbank Database, ” Trends inGenetics 2:233-236, 1986.

27. Gall, J.G., letter to the editor, Science 233:1367-1368, 1986.

28. Gerhard, D-S., Kawasaki, E. S., Bancroft, F.C., et al.,“Localization of a Unique Gene by Direct Hybridi-zation,” Proceedings of the National Academy ofSciences USA 78:3755-3759, 1981.

29, Gray, J. W., Dean, P. N., Fuscoe, J. C., et al., “High-Speed Chromosome Sorting, ’’Science 238:323-329,1987,

30. Gray, J. W., Langlois, R. G., Carrano, A. V., et al.,“High Resolution Chromosome Analysis: One andTwo Parameter Flow Cytometry, ” Chromosome73:9-27, 1979.

31. Gusella, J. F., Wexler, N. S., Conneally, P. M., et al.,“A Polymorphic DNA Marker Genetically Linkedto Huntington’s Disease, ’’Nature 306:234-238, 1983.

32. Hayashibara International Workshop on Automaticand High Speed DNA-Base Sequencing, Hayashi-bara Biochemical Laboratory, Okayama, Japan, July7-9, 1987.

33, Hendrix, R. W., Roberts, J, W., Stahl, F. W., et al.,Lambda H (Cold Spring Harbor, NY: Cold SpringHarbor Press, 1982).

34. Hohn, B., and Collins, J., “A Small Cosmid for Effi-cient Cloning of Large DNA Fragments, ” Gene11:291-298, 1980.

35. “Human Gene Mapping 8,” Cytogenetics and CellGenetics 40:1-4, 1985.

36. Ish-Horowicz, D., and Burke, J. F., “Rapid and Effi-cient Cosmid Cloning, ” Nucleic Acids Research9:2989-2998, 1981.

37. Jeffries, A.J., “DNA Sequence Variants in the @-&Globin Genes of Man)” Cell 1:1-10, 1979.

38. Jones, S. S., Brown, J. E., Vanstone, D. A., et al., “Au-tomating the Purification and Isolation of SyntheticDNA)” Biotechnology 5:67-70, 1987.

39. Jikgens, G. E., Wieschaus, C., Nikslein-Volhard, C.,et al., “Mutations Affecting the Pattern of the Lar-val Cuticle in Drosophila melanogaster II: ZygoticLoci on the Third Chromosomes, ” Rouxk Archiveof Developmental Biolo~ 193d:283-295.

40. Kan, Y.W., and Dozy, A.M., “Polymorphism of DNA

Sequence Adjacent to Human Beta-Globin Struc-tural Gene: Relationship of Sickle Mutation, ”Proceedings of the National AcademuV of SciencesUSA 75:5631-5635, 1978.

41, Knobeloch, D. W., Hildebrand, C. E., Moyzis, R. K.,et al., ‘(Robotics in the Human Genome Project, ”Biotechnology 5:1284-1287, 1987.

42. Kohara, Y., Akiyama, K., and Isono, K., “The Phys-ical Map of the Whole E. Coli Chromosome: Appli-cation of a New Strategy for Rapid Analysis and

Page 55: Mapping Our Genes - Genome Projects: How Big? How Fast?

50

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

Sorting of a Large Genomic Library,” Cell 50:495-508, 1987.Lander, E. S., and Botstein, D., “Mapping ComplexGenetic Traits in Humans: New Strategies Usinga Complete RFLP Linkage Map, ” Cold Spring Har-bor Symposia on Quantitative Biology 51:49-62,1986.Lander, E. S., and Botstein, D., ‘(Strategies for Study-ing Heterogeneous Genetic Traits in Humans byUsing a Linkage Map of Restriction FragmentLength Polymorphisms,” Proceedings of the Na-tional Academy of Sciences USA 83:7353-7357,1986.Lange, K., and Boehnke, M., “How Many Poly-morphic Genes Will It Take To Span the HumanGenome?” American Journal of Human Genetics34:842-845, 1982.Lebo, R. V., Anderson, L. A., Lau, Y.-F. C., et al.,“Flow-Sorting Analysis of Normal and AbnormalHuman Genomes,” Cold Spring Harbor Symposiaon Quantitative Biology 51:169-176, 1986.Martin, W.J., and Davies, W .R., “Automated DNASequencing: Progress and Prospects, ’’BioTechnol-0~ 4:890-895, 1986.Maxam, A. M., and Gilbert, W., ‘(A New Method forSequencing DNA,” Proceedings of the NationalAcademy of Sciences USA 74:560-564, 1977.Maxam, A.M., and Gilbert, W., “Sequencing End-Labeled DNA with Base-Specific Chemical Cleav-age,” Methods in Enzymo]oo 65:499-560, 1980.McKusick, V. A., “The Morbid Anatomy of the Hu-man Genome: A Review of Gene Mapping in Clini-cal Medicine,” Medicine 65:1-33, 1986.McKusick, V. A., and Ruddle, F. H., “Toward a Com-plete Map of the Human Genome,” Genomics 1:103-106, 1987.National Research Council, Mapping and Sequenc-ing the Human Genome, (Washington, DC: NationalAcademy Press, 1988.)

53. Nusslein-Volhard, C., Wieschaus, E., and Kluding,H., “Mutations Affecting the Pattern of Larval Cu-ticle in Drosophila melanogaster I: Zygotic Loci onthe Second Chromosome,” Roux’s Archives of De-velopmental Biology 193:267-282, 1984.

54. Ohno, S., “An Argument for the Genetic Simplicityof Man and Other Mammals, ” Journal of HumanEvolution 1:651-662, 1972.

55. Olson, M. V., Dutchik, J. E., and Graham, M.Y.,“Random-Clone Strategy for Genomic RestrictionMapping in Yeast,” Proceedings of the NationalAcademy of Sciences USA 83:7826-7830, 1986.

56. Pardue, M. L., and Gall, J. G., “Chromosomal Loca-tion of Mouse Satellite DNA,” Science 168:1356-1358, 1970.

57, Rotman, D., “Sequencing the Entire Human Ge-nome,” Industrial Chemist (December) :18-21, 1987.

58. Ruddle, F., Bentley, K. L., and Ferguson-Smith, A.,

59,

60,

61.

62.

63.

64.

65.

66.

67.

68.

69.

70.

71.

72.

73,

74.

75.

“Physical Mapping Review,” contract report to theOffice of Technology Assessment, 1987.Saiki, R. K., Sharf, S., Faloona, F., et al., Science 230:1350-1354, 1985.Sanger, F., Nilken, S., and Coulson, A. R., “DNA Se-quencing With Chain-Terminating Inhibitors,”Proceedings of the National Academy of SciencesUSA 74:5463-5468, 1980.Schwartz, D. C., and Cantor, C. R., “Separation ofYeast Chromosome-Sized DNAs by Pulsed Field GelElectrophoresis,” Cell 37:67-75, 1984.Shera, B., Lawrence Livermore National Labora-tory, Livermore, CA, personal communication, Feb-ruary 1988.Smith, C.L., Econome, J.G., Schutt, S., et al., “A Phys-ical Map of the Escherichia coli Genome, ” Science236:1448-1453, 1987.Solomon, E., and Bodmer, W. F., “Evolution of SickleVariant Gene,” The Lancet April 28, 1979, p.923.Southern, E. M., “Detection of Specific SequencesAmong DNA Fragments Separated by Gel Electro-phoresis,’’Journalof Molecular Biology 98:503-517,1975.Staden, R., “A New Method for Storage and Manipu-lation of DNA Gel Reading Data,” Nucleic Acids Re-search 8:3673-3694, 1980.Stinson, S., “System Automates DNA Amplification,”Chemical and Engineering News, Dec. 21, 1987, p.24.U.S. Congress, Office of Technology Assessment,New Developments in Biotechnology, 4: U.S. Invest-ment in Biotechnology (Washington, DC: U .S. Gov-ernment Printing office, in press),Vogel, F., and Motulsky, A. G., Human Genetics:Problems and Approaches (New York: Springer-Verlag, 1986), pp. 369-370.Wada, A., “Automated High-Speed DNA Sequenc-ing, ” Nature 325:771-772, 1987,Waterson, R., Medical Research Council, Cam-bridge, England, personal communication, October1987.Watson, J. D., Hopkins, N. H,, and Roberts, J. W.,Molecular Biology of the Gene [Menlo Park: TheBenjamin/Cummings Publishing Co., 1987).Watson, J.D., and Crick, F. H. C., “Genetic Implica-tions of the Structure of Deoxyribonucleic Acid, ”Nature 171:964-967, 1953.Watson, J.D., and Crick, F. H. C., “Molecular Struc-ture of Nucleic Acids: A Structure for DeoxyriboseNucleic Acid,” Nature 171:737-738, 1953.Weiss, M., and Green, H., “Human-Mouse Hvbrid, , -./ --

Page 56: Mapping Our Genes - Genome Projects: How Big? How Fast?

51

Cell Lines Containing Partial Complements ofHuman Chromosomes and Functioning HumanGenes,” Proceedings of the National Academy ofSciences USA 58:1104-1111, 1967.

76. White, R., and Lalouel, J.-M., “Chromosome Map-ping With DNA Markers,” Scientific American258:40-48, 1988.

77. White, R., Lippert, M., Bishop, D. T., et al., “Con-struction of Linkage Maps with DNA Markers forHuman Chromosomes, ’’Nature 313:101-105, 1985.

78. Wieschaus, E. C., Niisslein-Volhard, C., and Jikgens,G., “Mutations Affecting the Pattern of the LarvalCuticle in Drosophila Melanogaster III: Zygotic Loci

on the X-Chromosome and Fourth Chromosome, ”Roux’s Archive of Developmental Biology 193:296-307, 1984.

79. Williams, B. G., and Blattner, F. R., “BacteriophageLambda Vectors for DNA Cloning,” in Genetic Engi-neering, vol. 2, J.K. Setlow and A. Hollaender (eds.),(New York, NY: Plenum Press, 1980).

80. Wilson, E. B., ‘(The Sex Chromosomes, ” Arch.Mikrosk. Anat. Entw”cklungsmech. 77:249, 1911.

81. Yunis, J.J., Sawyer, J. R., and Dunham, K., ‘(TheStriking Resemblance of High-Resolution G-BandedChromosomes of Man and Chimpanzee, ” Science208:1125-1148, 1980.

Page 57: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 3

Applications to Research inBiology and Medicine

Page 58: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTSPage

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Applicationsin Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Developing Diagnostic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Isolating Genes Associated With Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Developing Human Therapeutics +. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Prospects for Human Gene Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Applications in Human Physiology and Development . . . . . . . . . . . . . . . . . . . . . . . 65Identification of Protein-Coding Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Approachesto Understanding Gene Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Applications in Molecular Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Applications in Population Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Chapter p references. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

BoxesBox Page3-A.Why Sequence Entire Genomes?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573-B. Duchenne and Becker’s Muscular Dystrophies . . . . . . . . . . . . . . . . . . . . . . . . . 633-C. From Gene Structure to Protein Structure: The Protein-Folding Problem. . . 663-D.Constructing the Evolutionary Tree: Morphology v. Molecular Genetics in

3-E3-F3G

the Search for Human Origins . . . . . . . . . . . . . . . . . . . . . . . . ., . . . . . . . . . . . . 70The Origin of Human Beings: Clues From the Mitochondrial Genome . . . . . 71Molecular Anthropology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Imphcations of Genome Mapping for Agriculture. . . . . . . . . . . . . . . . . . . . . . . 73

FiguresFigure Page

3-1.Mappingat Different Leve]sof Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563-2.The Use of Synthetic DNA probes To Clone Genes . . . . . . . . . . . . . . . . . . . . . 60

TablesTable Page3-l. Examples of Single-Gene Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573-2. Some Companies Developing DNA Probes for Diagnosis of Genetic Diseases 583-3.The Size of Human Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613-4. Some Human Gene Products With Potentialas Therapeutic Agents. . . . . . . . 643-5. Classification of Human Proteinsby Invention Period . . . . . . . . . . . . . . . . . . . 69

Page 59: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 3

Applications to Research inBiology and Medicine

“[physical and genetic maps 1 will certainly be very useful [but] you have to interpretthat sequence, and that’s going to be a lot of work. It will be like having a whole historyof the world written in a language you can’t read. ”

Joseph GallAmerican Scientist 76:17-18,

February 1988

INTRODUCTION

Research efforts aimed at creating genetic link-age and physical maps of chromosomes or entiregenomes are collectively referred to in this reportas genome projects (see chs. 1 and 2). The goalsof genome projects are to develop technologiesand tools for mapping and sequencing DNA andto complete maps of human and other genomes.Proponents expect that the products and proc-esses generated from genome projects will enableresearchers to answer important questions in bi-ology and medicine. Meeting this objective, how-ever, will depend on the success of concurrentprojects aimed at analyzing the information gen-erated from mapping genomes. Interpreting ge-nome maps will require the combined efforts ofindividuals with expertise in structural biology,cell biology, population biology, biochemistry,genetics, computer science, and other fields.

Biology and medicine have already benefitedfrom efforts to map and sequence specific genesfrom human and other organisms. Some questionsmight be addressed sooner or better, however,if more extensive genetic linkage maps, cDNAmaps, contig maps, and DNA sequences were avail-

able (figure 3-1). (See ch. 2 for detailed discussionof the types of genetic linkage and physical maps,)Research on inherited and nongenetic diseases,the physiology and development of organisms, themolecular basis of evolution, and other fundamen-tal problems in biology could all be facilitated inthe long run by genome projects.

Scientists continue to debate about which ap-plications depend on information from maps ofentire genomes and which require only maps ofspecific regions. The value of a complete DNA se-quence of a reference human genome is the mosthotly contended scientific issue (see box 3-A). Fo-cused research has been the mode of moleculargenetics to date: Scientists have targeted specificregions of genomes for intensive study. Many ofthe potential applications of genome mapping sum-marized in this chapter have already been andwill continue to be achieved by targeted researchprojects. Wherever possible, therefore, this chap-ter attempts to differentiate between the uses forwhich extensive maps will be necessary and thosefor which partial maps are adequate.

55

Page 60: Mapping Our Genes - Genome Projects: How Big? How Fast?

— . ——.—

56

Figure 3-1 .—Mapping at Different Levels of Resolution

“’’’’’’’’’’” “’’’’’’’’’’’’” “’’’’’’’’’’’” “’’’’’’’’’’’” “’’’’ ’’’1’ 1’’’’’’” “’’’’’’’’’’’” “Disease

Genet

MAPPING GENES TO WHOLE CHROMOSOMES

Chromosome BandingDNA Hybridization to Somatic Cell Hybrids

or Sorted ChromosomesIn Situ Hybridization

PHYSICAL MAPPING OF LARGE DNA FRAGMENTS

Yeast Artificial Chromosome Cloning(Up to 1 Million Base Pairs)

Pulsed Field Gel Electrophoresis(Up to 10 Million Base Pairs)

APPLICATIONS

Genome projects have accelerated the produc-tion of new technologies, research tools, and basicknowledge. At current or perhaps increased levelsof effort, they may eventually make possible con-trol of many human diseases—first through moreeffective methods of detecting disease, then, insome cases, through development of effective ther-apies based on improved understanding of dis-ease mechanisms. Advances in human geneticsand molecular biology have already provided in-sight into the origins of such diseases as hemo-philia, sickle cell disease, and hypercholester-olemia.

The new technologies for genetics research willalso help in the assessment of public health needs.Techniques for sequencing DNA rapidly, for ex-ample, should permit the detection of mutationsfollowing exposure to radiation or environmental

IN

PHYSICAL MAPPING OF SMALL DNA FRAGMENTS

Cosmid Cloning (Up to 45,000 Base Pairs)Plasmid and Phage Cloning

(Up to 23,000 Base Pairs)

MEDICINE

agents. Susceptibilities to environmental and workplace toxins might be identified as more detailedgenetic linkage maps are developed, and specialmethods of surveillance can be used to monitorindividuals at risk. By providing tools for deter-mining the presence or absence of pathogens (e.g.,bacteria and viruses) in large numbers of individ-uals as well as identifying genetic factors that ren-der some human beings more susceptible to in-fection than others, genome projects might alsoyield methods for tracking epidemics through pop-ulations.

Developing Diagnostic Tools

The use of DNA hybridization probes for de-tecting changes, such as restriction fragmentlength polymorphisms (RFLPs), in the DNA of in-

Page 61: Mapping Our Genes - Genome Projects: How Big? How Fast?

57

Box 3-A.—Why Sequence Entire Genomes?Rapid advances in technology have made it feasible to sequence the entire genome of an organism, at

least a small one such as bacteria or yeast. Researchers do not yet agree, however, on the value of a com-plete DNA sequence of a genome the size of the human genome. Several types of arguments have beenmade in favor of sequencing entire genomes:

. The information in a genome is the fundamental description of a living system—it is what the celluses to construct a copy of itself—and so is of fundamental concern to biologists.

. Genome sequences provide a conceptual framework within which much future research in biologywill be structured. Questions concerning control of gene expression (signals for control of gene expres-sion, genome replication, development mechanisms, and so on) ultimately depend on knowing genomesequences.

. The genomes of some higher organisms, including those of human beings, have repeated DNA se-quences, sequences of unknown function, and some sequences which are likely to have no function,comprising nearly 90 percent of the total DNA content. Without the complete DNA sequence of severalgenomes, it will be impossible to determine whether such sequences have meaning or are ancestral“junk” sequences.

. Genome sequences are important for addressing questions concerning evolutionary biology. The recon-struction of the history of life on this planet, the definition of gene families (also critical to other areasof biology), and the search for a universal ancestor all require an understanding of the organizationof genomes.

. Genomes are natural information storage and processing sytems; unraveling them may be of generalinterest to computer and physical scientists.

Other scientists would argue that these possible applications can be derived from sequences of singlegenes or larger regions of chromosomes. They believe it is a waste of time and money to sequence theentire human genome, particularly because some regions have no known or essential function. Many ofthese researchers favor sequencing only those regions believed to be clinically or scientifically important,including expressed sequences and sequences involved in the control of gene expression, and putting theothers off indefinitely.

SOURCES:National Research Council, Mapping and Sequencing the Human Genome (Washington, DC: National Academy Press, 1988).C. Woese, LJniversity of Illinois, Urbana, personal communication, June 1987.

Table 3-1 .—Examples of Single-Gene Diseases

Disease Description

Duchenne muscular dystrophy Progressive muscle deterioration

Cystic fibrosis Lung and gastrointestinal degeneration

Huntington’s disease Late-onset disorder with progressive physicaland mental deterioration

Sickle cell anemia Deformed red blood cells block blood flow

Genetic marker Gene Protein

Hemophilia Defect in clotting factor Vlll causes uncontrolledbleeding

Beta-Thalassemia Failure to produce sufficient hemoglobin

Chronic granulomatous disease Frequent bacterial and fungal infectionsinvolving lungs, liver, and other organs

Pherrylketonuria Enzyme deficiency that causes brain damageand mental retardation

Polycystic kidney disease Pain, hypertension, kidney failure in half ofvictims

Retinoblastoma Cancer of the eye

identified cloned

Yes Yes

Yes No

Yes No

Yes Yes

Yes Yes

Yes Yes

Yes Yes

Yes Yes

Yes No

Yes Yes

identified

Yes

No

No

Y e s

Y e s

Yes

Yes—

tentative

Yes

No

Yes

SOURCE: Office of Technology Assessment, 198S.

Page 62: Mapping Our Genes - Genome Projects: How Big? How Fast?

58

I

M

Photo credit: Ray White, The University of Utah Medical Center, Salt Lake City,UT. Repr/nted with permission from American Journal of Human Genetics 39:3(B

306, 1986.

Identification of a genetic marker showing linkage be-tween high levels of low-density lipoprotein (LDL) cho-lesterol and the genetic locus for the LDL receptor geneusing restriction fragment length polymorphism (RFLP)analysis of the LDL receptor genes from a multigener-ational family with inherited hypercholesterolemia. Aradioactively labeled DNA fragment from the clonedLDL receptor gene was used as a probe to observedifferences among affected and unaffected individualsin the numbers of electrophoretically separated DNAfragments after cutting the DNA with a restriction en-zyme. Individuals without the polymorphism are rep-resented as unfilled squares (males) or circles (females)and show only one DNA fragment. Hal f-f illed symbolsrepresent individuals with one allele for the defectivegene and one for the normal gene and show two DNA

fragments. The lane marked “M” is a set ofDNA fragment size markers.

dividuals with genetic diseases is described in de-tail in chapter 2. Such methods of DNA analysisoffer several advantages over traditional ap-proaches to the study of human disease. Know-ing the organization of genes on chromosomesand their DNA sequences could enable cliniciansto detect mutant genes before a disease manifestsitself in the form of damaged cells or tissues andwill eventually lead to a more complete under-standing of the pathogenesis of human disease[Friedmann, see app. A].

The study of randomly selected RFLP markersin human families has revealed linkages to a num-ber of genetic diseases (table 3-1) (1,3,6, 10,16,17,23)25,29,32,37,38,41,42,46,52,55,56), As the chro-mosomal locations of more disease-causing genesare identified, more probes for diagnosing geneticdiseases will become available. A genetic linkagemap saturated with RFLP markers (or onewith other polymorphic markers) is viewed bymany molecular geneticists as crucial to the

development of diagnostic reagents for theremaining human genetic diseases [Friedmann,see app. A]. (See table 3-2 for a list of companiesdeveloping diagnostic probes for such diseases.)

It is important to recognize that DNA probesfor RFLP markers are not always reliable toolsfor diagnosing genetic diseases before the onsetof symptoms. Without enough data from relativesof potential disease carriers, it may not be possi-ble to confirm the linkage between a particularRFLP marker and a genetic disease. The main limi-tation to reliable diagnosis of most genetic diseasesis the lack of an adequate number of DNA sam-ples from several generations of affected and un-affected individuals.

Many available RFLP markers can be used onlyin a few families, and the RFLP marker map isa cumulative one that aggregates the data frommany families. The largest standard data set isderived from the Center for the Study of HumanPolymorphism (CEPH) in Paris (see ch. 7 on inter-national efforts in genome mapping). The data col-lected by CEPH are taken from 40 families aroundthe world, most of which do not have any knowngenetic disease. Materials from these families areused to locate RFLP and other polymorphic mar-kers. Once markers have been identified, they canbe tested for linkage to a particular genetic dis-ease in families known to have that disease. The

Table 3-2.—Some Companies Developing DNA Probesfor Diagnosis of Genetic Diseases

Company Probes under development

California Biotechnology Susceptibility to heart(Mountain View, CA) disease

Cetus Corporation sickle cell anemia(Emeryville, CA)

Collaborative Research Cystic fibrosis(Bedford, MA) Duchenne muscular

dystrophyPolycystic kidney disease

Integrated Genetics Cystic fibrosis(Framingham, MA) Hemophilia B

Huntington’s diseasePolycystic kidney diseaseSickle cell anemia

Lifecodes Cystic fibrosis(Elmsford, NY) Down’s syndrome

Polycystic kidney diseaseSickle cell anemia

SOURCE: Office of Technology Assessment, 198S.

Page 63: Mapping Our Genes - Genome Projects: How Big? How Fast?

59

Photo credit: The Bettmann Archive, New York, NY

A large New England family of the early 1900s spanning three generations. Samples of genomic DNA from membersof such families are very useful for constructing genetic linkage maps, such as a RFLP map.

CEPH families are large, selected to enable scien-tists to trace DNA markers through at least threegenerations.

Isolating Genes Associated WithDisease

Some inherited human diseases arise from orcause differences in detectable proteins that cir-culate in the blood, such as human growth hor-mone and insulin. A research scheme called for-ward genetics has been used to isolate the genesencoding these proteins. In this strategy, a geneis cloned after the altered protein product hasbeen characterized. Other genetic diseases, suchas retinoblastoma, chronic granulomatous disease,and Duchenne muscular dystrophy, involve pro-tein products that were not identified before thecorresponding gene was cloned. An experimentalapproach called reverse genetics was used to findthese genes. First the gene containing the muta-tion responsible for the disease is linked to a RFLPor other polymorphic marker, then the gene and

its protein product are isolated and characterized [Friedmann, see app. A].

Forward Genetics

Until recently, most methods for cloning disease-associated genes required prior characterizationof the biochemical defect responsible for the dis-ease. Using the forward genetics approach, re-searchers identify the mutant gene product—aprotein—then isolate a clone of the gene from alibrary of cDNA clones (clones made from DNAcopies of the mRNA transcripts of genes—see ch.2). If the protein has been purified, antibodies canbe made and used to select for clones of cells ex-pressing the product. Alternatively, if part of theprotein’s amino acid sequence is known, syntheticDNA prohes complementary to the exons, orprotein-coding sequences, of the gene can be de-signed, based on the genetic code (figure 3-2). Oncethe cDNA clones are isolated, they can be usedas DNA probes to pick out clones from genomiclibraries.

Page 64: Mapping Our Genes - Genome Projects: How Big? How Fast?

60

Figure 3=2.-The Use of Synthetic DNA Probes To Clone Genes

An mRNA Sequence Predicted from AUG UAC AGG AUG CAA CUG UCU UGCAmino Acid Codon Usage Frequencies

DNA PROBE Sequence Complementary TAC ATG TCC ATC GTT GAC AGA ACGto Predicted mRNA Sequence

Culture Plate withCdonios of BacteriaCarrying Cosmids withcDNA Inserts

Fiitar

Page 65: Mapping Our Genes - Genome Projects: How Big? How Fast?

61

The difference between cDNA copies of genesand genes on chromosomes is that the latter haveboth exons and introns (noncoding sequences in-terrupting protein-coding sequences). The genesin the human genome range from fewer than1)000 base pairs to more than 2 million base pairsin size and are thus typically too large to be con-tained on standard cloning vectors (table 3-3). ThecDNA clones, which are smaller because they con-tain only exons, are useful because they can beintroduced into bacteria, yeast, or mammalian tis-sue culture cells and transcribed and translatedinto protein. The resulting proteins can be usedin studies of the physiology of diseases or in somecases as human therapeutics.

The utility of the various types of physicalmaps in the forward genetics strategy dependson the purpose of isolating the gene. If onlycDNA copies of a particular gene are needed formaking large quantities of the protein product,then extensive genomic maps would not be nec-essary. If the cDNA copy of the gene is to be usedas a DNA probe for isolating the whole gene from

a collection of genomic DNA clones, or for study-ing the organization of the genome in the regionof interest, then a contig map illustrating the or-der of DNA segments from the relevant portionof the genome would be very useful.

Reverse Genetics

Reverse genetics has made it possible to isolategenes associated with inherited diseases for whichno specific biochemical defect has been estab-lished. To do this, the genetic disease is usuallylinked first to a particular chromosome by study-ing inheritance patterns at the DNA level. The gen-eral region of the gene on the chromosome is iden-tified using DNA probes for RFLP markers on thatchromosome. Samples of DNA from families ofindividuals afflicted with the genetic disease aretested with a set of DNA probes which hybridizeto markers spaced throughout the chromosomeuntil a linkage between the mutant gene thatcauses the disease and the RFLP is detected. Thelocation of the gene is then identified more pre-

Table 3.3.—The Size of Human Genes

Gene size mRNA size(in thousands (in thousands Number of

Gene of nucleotides) of nucleotides) introns

Small:Alpha-globin . . . . . . . . . . . . . . .Beta-globin. . . . . . . . . . . . . . . . .Insulin. . . . . . . . . . . . . . . . . . . . .Apoliproprotein E . . . . . . . . . . .Parathyroid. . . . . . . . . . . . . . . . .Protein kinase C . . . . . . . . . . . .

Medium:Collagen I

Pro-alpha-l(1) . . . . . . . . . . . . .Pro-alpha-2(1) . . . . . . . . . . . . .

Albumin . . . . . . . . . . . . . . . . . . .High-mobility group

COA reductase . . . . . . . . . . .Adenosine deaminase . . . . . . .Factor IX . . . . . . . . . . . . . . . . . .Catalase . . . . . . . . . . . . . . . . . . .Low-de nsity-lipoprotei n

receptor . . . . . . . . . . . . . . . . .

Large:Phenylalanine hydroxylase . . .Factor Vll . . . . . . . . . . . . . . . . . .Thyroglobulin . . . . . . . . . . . . . .

Very large:Duchenne muscular dystrophy

0.81.51.73.64.2

11.0

18.038.025.0

25.032.034.034.0

45.0

90.0186.0300.0

2,000<0

0.50.60.41.21.01.4

5.05.02.1

4.21.52.81.6

5.5

2.49.08.7

17.0

222327

505014

1911

712

1 7

50SOURCE: Victor McKusick, The Johns Hopkins University, Baltimore, MD.

Page 66: Mapping Our Genes - Genome Projects: How Big? How Fast?

62

cisely by using additional probes for closely spacedmarkers that cover the region of interest.

In order to distinguish the gene locus that actu-ally causes the disease from nearby, but unrelated,genes, it is generally necessary to demonstratethat the identified gene is expressed abnormallyin tissues from patients with the disease. Agenomic region 1 million base pairs in length, forexample, could contain as many as 100 genes. Insuch cases, it is necessary to use biochemical meth-ods to identify the gene that is responsible for thedisease. Techniques for detecting messenger RNAtranscripts or proteins can be used to search fordifferences in amounts of geneprociuct in the tis-sues of affected and unaffected individuals; thesedifferences can then be correlated with an alter-ation in a particular gene. Retinal cells were ana-lyzed in this way as part of the search for theretinoblastoma gene (18,28), as were muscle cellsin individuals with and without Duchenne mus-cular dystrophy (see box 3-B). Once the gene prod-uct has been identified, it is possible to study thephysiology of a particular disease with the aimof identifying a therapy or preventive treatment.

Although reverse genetics is generic in concept,the amount of effort involved in isolating and char-acterizing genes using genetic and physical mapdata varies. Over 100 person-years have beenspent searching for the gene that causes cysticfibrosis–an effort that has led to localizing thegene on a small region of chromosome 7 but notto finding the gene itself or determining how itcauses the disease (17). On the other hand, re-searchers identified and isolated the gene forchronic granulomatous disease in far fewerperson-years (38). The existence of DNA probesfor RFLP markers has also made possible the iden-tification of the genes for Duchenne muscular dys-trophy (32) and retinoblastoma (18,28) [Friedmann,see app. A].

The technical difficulty involved in locating thegene responsible for a particular disease by re-verse genetics usually depends on the physicalmap distance between the nearest RFLP markerand the linked gene. Existing RFLP maps of thehuman genome have a resolution of only about10 centimorgans (approximately 10 million basepairs). A map with markers spaced every 1

centimorgan would make it much lesstimeconsuming to locate the genes by reversegenetics (7,33). Such a map would be constructedusing a pool of several thousand DNA probes thatdetect RFLP markers spaced about every 1 mil-lion base pairs throughout the genome. A libraryof clones made from overlapping segments ofthe genome and a contig map illustrating therelative position of each clone with respect toits neighbors would also be useful in reversegenetics. These tools would spare researchersthe labor-intensive step of isolating and charac-terizing all of the genomic clones between themarker and the gene of interest; the only workremaining would be to associate the characteris-tics of the disease with the correct clone or clones.

Identification of Genes Involved inPolygenic Disorders

Genetic linkage maps of the human genome arealso useful for characterizing inherited diseasescaused by more than one factor, often referredto as polygenic disorders. Among the diseases forwhich more than one gene is likely to be respon-sible are certain cancers, diabetes, and coronaryheart disease (27). For example, in a complex dis-order such as coronary heart disease, bloodplasma lipoproteins, the coagulation system, andelements of the arterial walls all play a role, sothe number of genes involved can be very large(40). Some scientists argue that the RFLP mapscurrently available, with markers spaced an aver-age of 10 centimorgans apart, area sufficient start-ing point for studies of polygenic diseases (11).Highenresolution RFLP maps, such as a 1-centimorgan map, would no doubt simplifythe job of identifying the genes responsible forpolygenic disorders.

Developing Human Therapeutics

Forward genetics has yielded important resultsin the area of drug development. As stated earlier,the ability to use cDNA clones has been crucialto the development of commercial products suchas human growth hormone and insulin and to po-tential human therapeutics such as tumor necro-sis factor and interleukin-2, therapeutics thatwould not otherwise be available in the quantityor quality necessary for effective use (table 3-4)

Page 67: Mapping Our Genes - Genome Projects: How Big? How Fast?

63

BOX 3-B.— Duchenne and Becker’s Muscular Dystrophies

Duchenne muscular dystrophy (DMD) is a genetic disease that affects 1 in 3,000 to 1 in 3,500 maleinfants born. Becker’s muscular dystrophy is a similar but milder disorder with much lower incidence.Both diseases begin in childhood and lead to muscle wasting. DMD typically results in death before age20. The search for the gene causing these diseases and the protein encoded by that gene has been an excit-ing story of molecular biology in the 1980s. The effort in many ways typifies modern genetics, with exten-sive international collaboration, study of nonhuman species, and creative use of molecular methods.

The gene causing these diseases had been known for some time to be on the X chromosome becauseof inheritance patterns. Duchenne and Becker’s muscular dystrophies affect primarily boys, who have onlyone X chromosome, inherited from their mothers. Girls have two X chromosomes and therefore must,as a rule, receive abormal genes from both parents in order to develop Duchenne or Becker’s musculardystrophy–a much less likely occurrence.

The search for the gene started with studies of families. DNA from persons with DMD, including severalgirls and one boy, was collected in an effort to find a common area of the X chromosome that had beenlost or altered. once the correct region of the X chromosome had been identified (its absence was foundto cause DMD), DNA from that region was obtained and cloned. The clones were used as DNA probesfor complementary mRNA sequences in muscle tissue from affected and unaffected individuals. The pur-pose was to identify the mRNA gene transcript that was present in unaffected individuals but altered inpersons with DMD. The mRNA was located and subsequently shown to encode a large protein called dystro-phin found in muscle cells.

The DMD search has uncovered some extraordinary facts. Duchenne and Becker’s muscular dystrophiesare caused by different changes in the same gene. That gene is the largest found to date, spanning over2 million base pairs (table 3-3). It is broken into at least 60 exons.

The scientific collaboration that led to the discovery of dystrophin was notably efficient. One paperalone listed 77 authors from 24 research institutions in 8 countries. Molecular probes, clones, and materialsfrom affected patients were openly exchanged, hastening researchers in their quest for the culprit gene.

so[TR(:h;sK.H F]schheck, ,\. tt’ Ritter, 0.1. ‘1’irschwetl, et al., “Recombination W’ith pER’I’87 (DXS164) in Families With X-linked Muscular DystrophJ, [.ancef

2LJuly ) 104, 1986E P H[}ffnlal], A P. Monaro, C C. Peenel, et al , “Conservation of the Duchenne hluscular Dystrophy Gene in Mice and Humans” Srirwre 238347-3.50, 1987E.P Hoffman, R.H Brown, and I..hl Kunkel, “Dystrophin: The Protein Product of the Duchenne Muscular Dystroph?r Locus, ” (’e)] .51 919.928, 1987\l Koenig, E P Hoffman, C.J, Bertelson, et al , “Complete Cloning of the Duchenne Muscular Dystrophy (DMD) cDNA and Preliminary (,enom]r organlza-

t]on of the DMD Gene in Normal and Affected Individuals, ” Cc// 50:509-517, 1987.1,. M. Kunkel et al., “Analysis of Deletions From Patients With Becker and Duchenne Muscular Dystrophy, ” Nature 322 ”73-77, 198[;A P k!onaco, R .1,. Neve, C. Colletti-Feener, et al., “Isolation of Candidate cDNAs for Portions of the Duchenne Muscular f)J strophj (km, “ ,\’.ifurt> :j~:j 646.

650, 1986,A.P Monaco, C.J. Bertelson, C. Colletti-Feener, et al., “1.oca]ization and Cloning of Xp21 Deletion Breakpoints tmolved in Muscular L)jstrophJ ‘ ffuman

Genetics 75:221-227, 1987.G J ~an Omenn, J .M. Verkerk, M .H. Hofker, et al., “A Physical Map of 4 Million Base Pairs Around the Duchenne Muscular D\;st ruph}, (+ne on the

X Chromosome,” Cell 47:499-504, 1986

(53). The cDNA clones isolated by forward genetics ural human proteins made from isolated humancould be used to make a cDNA map that illustratesthe chromosomal locations of expressed regionsof DNA. This cDNA map, plus a library of previously ordered clones of genomic DNA, wouldbe valuable tools for studying the role of ceptain genes in the manifestation of disease.Knowledge of the mechanisms directing normalcellular functions will probably lead to importantsources of new therapies for human diseases: nat -

genes, engineered proteins, and conventionallysynthesized drugs designed from a knowledge ofthe structure of the proteins they target. Ad-vances in the development of human therapeu-tic products will be made more rapidly if re-

search in the areas of protein engineering, therelationship of protein structure to function,rational drug design, and others parallels ge-nome mapping efforts.

Page 68: Mapping Our Genes - Genome Projects: How Big? How Fast?

.—

64

Photo credit: Nancy Wexler, Columbia University, New York, NY

A Venezuelan man with Huntington’s disease, a rare,late-onset genetic disease that causes degeneration

of nerve cells in the brain.

Prospects for Human Gene Therapy

Clinical use of human genetic linkage and phys-ical maps, now largely restricted to diagnosis, mayeventually include the insertion of normal DNAdirectly into human cells to correct a particulargenetic defect (54). This practice is called humangene therapy. Advances in gene therapy will de-pend on development of ways to insert DNA intocells safely and to ensure that the’ inserted DNAcorrects the defect (54). Gene mapping will notimprove gene therapy directly, and for most dis-eases the ability to make a diagnosis will precedethe availability of an effective treatment. Theknowledge gained through use of gene maps will,however, enhance knowledge about the functionof genes and thus indirectly improve the prospectsfor gene therapy [Friedmann, see app. Al.

Table 3=4.—Some Human Gene Products WithPotential as Therapeutic Agents

Gene product● Actual or potential therapeutic application

Atrial Natluretic FactorŽ Possible applications in treatment of hypertension

and other blood pressure disorders, and for somekidney diseases affecting excretion of salts andwater.

Alpha Interferona

• Approved for treatment of hairy celI leukemia;possible broader applications in other cancers.

Beta Interferon● Inhibits viral infections and may be useful as an

anticancer treatment.Epidermal Growth Factor

. Expected to have applications in wound healing,including burns, and for cataract surgery.

Erythropoietin. Anticipated treatment use for anemia resulting

from chronic kidney disease.Factor Vlll:Cb

● Prevents bleeding in patients with hemophilia Aafter injury.

Flbroblast Growth Factor. Possible use in wound healing and treating burns.

Gamma Interferon. Possible treatment for scleroderma and arthritis.

Granulocyte Colony Stimulating Factor. Possible treatment for Acquired Immune Deficiency

Syndrome (AIDS) and leukemia.Human Growth Hormonea

• Approved as a treatment for chiId hood dwarfism;expected to have broader therapeutic potential intreatment for short stature resulting from Turner’ssyndrome and for wound healing.

Insulina

. Approved for treatment of diabetes.lnterleukin.2

● Possible treatment for various cancers.Microphage Colony Stimulating Factor

. Potential applications are for treatment ofinfectious diseases, primarily parasites; possiblecancer therapy.

Superoxide Dismutase. Possible preventive treatment for damage caused

by oxygen-rich blood entry into oxygen-deprivedtissues (e.g., during organ transplants).

Tissue Plasminogen Activator. Approved as treatment for dissolving blood clots

associated with heart attacks.Tumor Necrosis Factor

● Possible anti-tumor therapy.aApproved for commercial gale in the United States by the Food and Dru9

Administration.%on-recombinant DNA version has been approved for sale, but the cloned gene

product has not.SOURCE: Office of Technology Assessment, 1988.

Page 69: Mapping Our Genes - Genome Projects: How Big? How Fast?

65

APPLICATIONS IN HUMAN PHYSIOLOGY AND DEVELOPMENT

Studies aimed at understanding the molecularbasis of inherited diseases may yield informationthat can be generalized to other physiological proc-esses. Knowledge of the structure and functionof genes associated with Alzheimer’s disease, forexample, might give important clues to the cellu-lar mechanisms regulating aging of brain tissue.

The organization of genes in genomes is anotherfundamental issue in biology. Is it important forgenes to exist on a particular chromosome in aparticular order? Comparisons of physical mapsof the chromosomes of higher organisms couldshed some light on the extent to which gene organ-ization is associated with gene expression and genefunction.

The nucleotide sequences of human genes havebeen and will continue to be important researchtools for understanding the basic cellular proc-esses underlying physiology and development.Nevertheless, knowing the DNA sequences ofgenes and how they translate into the amino acidsequences of protein products is not sufficient toestablish how such genes are controlled or howthe gene products function in a particular cell orin the organism as a whole. The genetic code thatguides the translation of DNA sequence into pro-tein sequence offers only the first step in unravel-ing the mysteries of the human genome. Under-standing the relationship between proteinstructure and function is the crucial next step,but it faces the greatest number of technical bot-tlenecks (see box 3-C).

Identification of Protein-CodingSequences

Individual efforts to clone particular geneswill not be eliminated by the availability ofgenetic linkage and physical maps; rather theywill be redirected toward localizing a particu-lar gene within a region of a chromosome orwithin the DNA sequence of that region. Be-cause human genes are more often interruptedby introns, the identification of the exon and reg-ulatory sequences in and around genes has provedmore difficult in human beings than in lowerorganisms. The most reliable method of identify-

ing exons is to know the amino acid sequence ofthe protein product and, using the genetic code,find the corresponding DNA sequence by inspect-ing the whole gene sequence. DNA sequences canbe determined at a faster rate than proteins canbe isolated and sequenced, however, so computer-assisted methods offer a more practical approach.

There is a variety of computer software avail-able for predicting exon sequences, some of itmore reliable than others (12,14,48) [Mount, seeapp, A]. As more DNA sequences become avail-able, methods for predicting exons can continueto be refined. Computer scientists argue thatthe analysis phase of whole genome sequenc-ing projects will progress efficiently only if thedevelopment of new computational and othertheory-based predictive methods that can ac-commodate large sequences is emphasized.

Photo credit: Shirley Ti/ghman, Princeton University, Princeton, NJ

Electron micrograph revealing an intron sequence in-terrupting the protein-coding sequences of the mousebeta-g lobin gene. DNA containing the gene (includingintron sequences) was allowed to hybridize (base pair)with beta-globin mRNA that had been isolated fromcells in its mature form with no intron sequences. Aloop appears in the region of the intron where no com-

plementary sequences exist between the twomolecules (see arrow).

Page 70: Mapping Our Genes - Genome Projects: How Big? How Fast?

66

Box 3-C.— From Gene Structure to Protein Structure: The Protein-Folding Problem

“Protein-folding is the genetic code expressed in three dimensions,” according to Fetrow and co-workers.How does the linear sequence of amino acids code for a protein’s structure? How does the three dimensionalconformation of a protein drive its function? Sometimes the amino acid sequence of a protein with anunknown function is similar to that of a protein with a known function; in many such cases, the similarityis a valid indicator of comparable jobs. In other cases, the threedimensional structure of a protein (theamino acid sequence folded into the actual structure of the protein) gives more reliable clues about func-tion. It is therefore important to develop experimental and theoretical means for determining the three-dimensional structures of proteins. Because proteins are so large, often consisting of multiple domains (dis-crete portions) with different functions, this generally involves analysis of how each part of a protein con-tributes to its overall structure.

There is experimental evidence that certain structural domains can serve similar functions in a numberof different proteins. It is the combination of domains that gives a protein its unique overall function. Astretch of amino acids in one protein can be nearly identical in sequence to that in another protein, butif the surrounding amino acid sequences are different, then the sequences might fold into domains withquite different threedimensional structures. At present, scientists cannot predict with certainty how thelinear sequence of amino acids in a protein will fold into the protein’s threedimensional structure—thusthe protein-folding problem. As genome mapping projects make more gene sequences available, the prob-lem will take on even greater significance. The National Academy of Sciences in a recent report calledprotein folding “the most fundamental problem at the chemistry-biology interface, and its solution has thehighest long-range priority.”

Most predictions of three-dimensional structure are based on theories of the behavior of amino acidsin certain chemical and physical environments and on information gleaned from viewing the atomic struc-tures of proteins through X-ray diffraction. (X-ray diffraction of protein crystals is an important tool instructural biology-the field dedicated to the study of proteins and other macromolecular structures. Itis the most important technique for determining the threedimensional structure of large proteins at theatomic level. ) Existing methods for predicting structure are not reliable for all proteins or protein domains,because structural data are available for only about zoo proteins and for even fewer classes of proteins.There are few membrane proteins in the structure database, for example, and thus little experimentalbasis for testing predictions about how such important proteins will fold. More structures of proteins needto be determined, using X-ray crystallographic and other biophysical technologies, in order to provide asolid foundation for protein-folding theories. Once the protein-folding problem is solved, the road fromgene sequence to gene function will be considerably shortened and will lead, in some cases, toward thedevelopment of promising new human therapeutic products.

SC) URCES:T, Blundell, B.L. Sibanda, M.J,E, Sternberg, and J.M. Thornton, “Knowledge-Based Prediction of Protein Structures and the Design of Novel Molecules, ”

,Vature 326:347-352, 1987.J.S, Fetrow, M.H. Zehfus, and G.D, Rose, “Protein Folding: New Twists, ” Bio~echnology 6:167.171, 1988,T Koetzle, Brookhaven National Laboratory, Upton, NY, personal communication, March 1987,National Academy of Sciences, Research Briefings (tvashington, DC: 1986, National Academy Press, 1986).

Approaches to UnderstandingGene Function

Isolating a gene is not nearly as difficult as de-termining how the gene and its products func-tion in the cell. The following are some experi-mental approaches to solving this problem:

Ž to modify or inactivate the normal functionof a gene by replacing it with a modifiedversion,

● to inhibit the function of a gene’s mRNA orprotein product by using antibodies to theprotein or an RNA complementary to themRNA, and

Page 71: Mapping Our Genes - Genome Projects: How Big? How Fast?

67

● to compare the DNA sequence of a clonedgene of unknown function with those of geneswhose functions are known.

Using the first two methods, scientists have stud-ied the function of gene products by identifyingalterations in the biochemical or physical charac-teristics of the affected cell or organism (2,15,24,26,30,36,39,44,50,51 ). The third strategy is the-oretical, using sequence data accumulated frompreviously characterized gene products to pre-dict a function for a newly identified gene. Suchpredictions can then be tested experimentally.

Probably the most widely used first step in de-termining the role of a gene is to find similaritiesbetween its DNA sequence and those of genesfrom other organisms. Yeast, for example, shareswith animal cells many of the molecules and proc-esses that are being studied intensively in mod-ern cell biology, including the factors modulatingcell structure and dynamics, the components ofthe machinery that modulates protein secretionfrom cells, the constituents of basic chemical path-ways, and analogs of several mammalian on-cogenes (genes involved in controlling the rate ofcell growth). Many of these factors and processeswere first identified or characterized, or both, inhigher organisms, but the application of them toyeast genetics has provided new insights (57).

A recent study reported the use of yeast cellsto isolate a human gene that can substitute fora yeast gene in regulating the yeast cell’s life cy-cle (34). Plasmid vectors carrying cDNA were in-troduced into yeast cells to find a human geneproduct that was similar enough to a yeast geneto replace it in the regulation of the yeast cell’slife cycle. This was accomplished by mutating theyeast’s copy of the gene and then finding cells that,upon introduction of the appropriate human gene,appeared to regain their normal function. The hu-man cDNA clone identified by this technique isthus a candidate for a protein that regulates lifecycles in human cells. Studies of the genomic ver-sion of the human gene will be necessary to defini-tively establish the role of this product in humancell cycle regulation. This example illustrates howyeast genetics and biochemistry can be used to

Photo credit: Oonald Riddle, University of Missouri, Columbia, MO

Micrograph comparing the appearance of a short, fatnematode mutant called “dumpy” (above) with that ofa normal nematode (below). The mutant grows to onlytwo-thirds of the normal body length because of a mu-tation in a gene for a type of collagen (a protein) needed

for normal development (magnified 80 times).

identify human genes with important functions(57).

In fruit flies, genetic research and the tools ofrecombinant DNA have made it clear that certainDNA sequences are involved in regulating the de-velopment of the organism. Different sets of genesappear to be expressed at different times in thecourse of development, causing the patterns ob-served in the developing embryo. How gene ex-pression is regulated to create developmental pat-terns is a central question in biological studies ofmany organisms. Fruit flies are easy to dissect andto manipulate genetically, and much is knownabout their development; they have thereforeproven to be a very useful model. One DNA se-quence, called the horneo box, was first identi-fied in the genes of fruit flies and later in thoseof higher organisms, including human beings (31).There is substantial evidence that the homeo box,a short stretch of nucleotides of nearly identicalsequence in the genes that contain it, determineswhen the expression of particular groups of genesis turned on and off during development of thefruit fly (35). As more gene sequences from fruitflies, human beings, and other organisms are de-termined, more knowledge about the signalsgoverning developmentally expressed genes islikely to be acquired [Mount, see app. Al.

Page 72: Mapping Our Genes - Genome Projects: How Big? How Fast?

68

Photo credit: John Post/ethwalt, University of Oregon, Eugene

Mutations in the gene Antennapedia, a homeotic gene, cause the fruit fly to develop an extra pair of antennas. Picturedat left is the normal fruit fly, and at right a fly with the mutation. Homeotic genes have counterparts in humans andvertebrates; each gene has a characteristic DNA sequence within its protein-coding sequences called the homeo box.

APPLICATIONS IN MOLECULAR EVOLUTION

The disciplines of population biology, genetics,molecular biology, and cellular biology merge inthe study of how species evolve, constituting thefield of molecular evolution. The construction ofa physical map of the human genome will permitmolecular analysis of several questions fundamen-tal to evolution, including how genomes changeand what factors cause them to change, as wellas how small-scale changes relate to the overallevolution of the organism (45).

Species with different degrees of relatednesscan be usefully compared because their genes,and thus the proteins encoded by those genes,will have differing rates of sequence divergence.The course of human evolution can be read inthe sequences of proteins (14). Comparisons ofhuman and mouse DNA sequences are probablythe most useful in the identification of genes

unique to higher organisms because mice genesare more homologous to human genes than arethe genes of any other well-characterized organ-ism, Comparisons of human DNA sequences withthose of lower organisms such as the fruit fly ornematode are most useful in the identification ofgenes encoding proteins that are essential to allmulticellular organisms. Finally, since yeasts aresingle-celled eukaryotes (cells whose chromo-somes are contained in nuclei), their sequencesare most useful in the identification of genes thatmake proteins whose functions are essential tothe life of all eukaryotic cells because such pro-teins would be least likely to have undergone ma-jor changes in the course of their evolutionIMount, see app. A]. Table 3-5 shows how humanproteins can be classified by their period of in-vention: from ancient, to middle-age, to modern(14).

Page 73: Mapping Our Genes - Genome Projects: How Big? How Fast?

69

Table 3-5.—Classification of Human Proteinsby Invention Period

1. Ancient proteinsA. First editions. Direct-line descendancy to human

and contemporary prokaryotes. Mostly enzymes in-volved in metabolism.

B. Second editions. Homologous sequences in humanand prokaryotic proteins, but apparently differentfunctions.

Il. Middle-age proteinsProteins found in most eukaryotes but prokaryoticcounterparts are as yet unknown.

Ill. Modern proteinsC. Recent vintage. Proteins found in animals or plants

but not both. Not found in prokaryotes.D. Very recent inventions. Proteins found in vertebrate

animals but not elsewhere.E. Recent mosiacs. Modern proteins clearly the result

of shuffling exons.SOURCE: Adapted from Doollttle, R F., Feng, D, F., Johnson, MS., and McClure,

M.A , “Relationships of Human Protein Sequences to Those of OtherOrganisms,” Cold Spring Harbor Symposia on Quantitative Biology51 ”447-456, 1986.

Physical map and sequence data accumulatedfrom many species over the past 10 years haveled scientists to recognize patterns of genomechange quite different from those proposedearlier. Now, molecular evolutionists are begin-ning to understand such patterns as the duplica-tion and acquisition of new genes and their cor-responding functions, differences in the use ofthe genetic code among different organisms (48),and differences in the occurrence of gene fam-ilies in different species (45).

important questions in molecular evolution arisefrom the fact that the genes of prokaryotes (organ-isms without nuclei, e.g., bacteria), as well as manygenes in yeast and some multicellular organisms,are not interrupted by introns:

Page 74: Mapping Our Genes - Genome Projects: How Big? How Fast?

70

● Are the introns found in genes today descend-ants of extra or unused DNA from bacteriaand eukaryotes such as yeast?

● Did prokaryotes rid themselves of intron se-quences, or did they never have them?

● How did intron sequences get into the genesthat code for modern proteins (14)?

By sequencing similar genes from many species,scientists have found that some introns have beenin place for very long evolutionary periods andthat the positions of introns within genes dividethe genes in ways that correspond to the distinctfunctional domains of the proteins’ structures(5,22,49). These observations have led tO n e wmodels of molecular evolution (8,13,19-21,47). Theavailability of more gene sequence data shouldfacilitate the assessment of theories about the evo-lution of genes and gene structures.

Sequences of more human genes, high-levelunderstanding of variations in genomic orga-nization among individuals, and analyses ofdifferences between human beings and otherorganisms should aid in the evaluation ofmolecular evolutionary theories on how species originate (see boxes 3-D) 3-E) and 3-F). Rec-ognition of differences in rates of nucleotide sub-stitution, recombination, and other mechanismsresponsible for variation in the human genomewill lead to a better understanding of the molecu-lar basis of these processes and of the constraintson each. Evaluation of proposed models for thepropagation and evolution of multigene families,such as certain classes of cell surface receptors,requires a detailed knowledge not only of the relat-edness of the DNA sequences in these genes, butalso of their locations in the genome and the DNAsequences of the regions surrounding them (45).

Box 3-D.—Constructing the Evolutionary Tree: Morphology v. Molecular Geneticsin the Search for Human Origins

Ever since Linnaeus, biologists have classified animals according to similarities and differences in formand structure. When the concept of evolution took root, these morphological features were used to estab-lish phylogenies—trees or lineages that indicate the evolutionary relationships among species. New andsophisticated methods of genetic analysis have challenged morphology as the prime determinant of familytrees. Recent debates about human origins have revealed the potential power of genetic techniques forevolutionary studies.

For the past two decades or so, anthropologists and biologists studying the problem of primate evolu-tion have agreed that chimpanzees and gorillas are closely related enough to be classified in the same fam-ily, while humans stand alone in a separate, more distant family. Morphological evidence favors this view.Both chimps and gorillas, for example, walk on their knuckles; humans do not, and the fossils of theirmost direct ancestors show no features associated with knuckle walking. Chimps and gorillas also sharesimilarities in the thickness and structure of their tooth enamel which suggest a common ancestry separatefrom humans.

Analyses of the DNA of chimps, apes, and human beings contradict this view, Scientists recently exam-ined comparable segments of DNA in the region of the beta -globin gene from human beings, chimpanzees,gorillas, and orangutans. They sequenced 4,900 base pairs of DNA from this region in each organism, thenappended data for nearby regions for which sequences had already been published. In all, they compareda 7,100-base-pair region and concluded that chimpanzee and human gene sequences were the least diver-gent. The most parsimonious explanation of the data was that human and chimpanzee are more closelyrelated to each other than either is to the gorilla.

The beta-globin study, while it strongly suggests that chimpanzees are the closest cousins of humanbeings, does not conclusively end the search for human origins. Contradictions remain in the evidencegathered from comparative anatomy and from genetic analysis; studies of other gene loci will be necessaryto settle the matter.

SOURCES:R.L. Cann, “in Search of Eve,” The Sciences (September/October) :30-37, 1987.R. Lewin, ‘<My Close Cousin the Chimpanzee,” Science 239:273-275, 1987.M.M. Miyamoto et al., “Phylogenetic Relations of Humans and African Apes From DNA Sequences in the Globin Region,” Science 239:369-373, 1987

Page 75: Mapping Our Genes - Genome Projects: How Big? How Fast?

71

BOX 3-E. —The Origin of Human Beings: Clues From the Mitochondrial Genome

For more than a century, archaeologists, anthropologists, and biologists have been digging throughlayers of dirt and rock, sieving fossils and artifacts, in an attempt to figure out when, where, and howhuman beings differentiated from other primates to become a unique species. These scientists have reliedon a variety of tools, everything from the picks and axes used to dig up fossils to sophisticated techniquesfor determining the age of the bones they have unearthed. Unfortunately, archaeological digs do not alwaysyield perfect clues: Even well-preserved fossil remains are generally incomplete, and there are still missinglinks, cases in which fossils that could hint at the genealogy of several precursor species have not beenfound. Thus, it has been difficult to determine exactly when human beings diverged from prehistoric ances-tors to become the species now known as Homo sapiens.

The development of molecular genetic techniques for analyzing DNA offers a new source of evidencein the ongoing debate about human origins. Techniques for mapping and sequencing DNA allow research-ers to compare different species and different individuals from the same species at the most basic le~’el.These comparisons can aid evolutionary studies.

one promising approach is the study of the DNA sequences of mitochondria, small structures that arefound in the cells of all multicellular organisms. Mitochondria are the power plants of eukaryotic cells.They produce energy for life processes by providing a site for the combination of oxygen and food molecules.Without them, cells would depend on less efficient processes of energy production and could not survi~~ein an environment containing oxygen. Mitochondria have much in common with bacteria: They are similarin size and shape, they both contain DNA, and they each reproduce by dividing in two.

The DNA in mitochondria can be more useful for some evolutionary studies than the DNA in cell nuclei,for several reasons. First, since it lies outside the cell nucleus and sexual recombination occurs only withinthe nuclei of sperm and egg cells, mitochondrial DNA is not recombined during sexual reproduction. Itis inherited only from the mother. Consequently, changes in the nucleotide sequences are due only to muta-tion and not to the natural shuffling of DNA that occurs during reproduction. Second, DNA in the mitochon -dria is not protected as well as DNA in the nucleus, nor does it have the same kinds of mechanisms forrepair. Thus, mitochondrial DNA mutates about 10 times as fast as the chromosomal DNA in the cell’s nu-cleus, which means that the mitochondrial genome has evolved more rapidly than the chromosomal ge-nome. Finally, mitochondria are relatively small: They contain approximately 16,000 base pairs, considera-bly fewer than the 3 billion base pairs in the entire set of human chromosomes, making them easier to analyze.

These three characteristics of mitochondrial DNA—absence of sexual recombination, a high naturalmutation rate, and small size—have helped scientists construct a “molecular clock” that can be used tohelp establish the approximate time and place of human origins. By calculating the rate at which mitochon -drial DNA changes and then comparing the DNA sequences of mitochondria from many individuals, re-searchers have begun to formulate genealogical trees. For example, scientists sequenced samples of mitochon -drial DNA from 140 people around the world and used the information to propose that the first Homosapiens lived .200,000 years ago on the African continent. Prior to these findings, anthropologists speculatedthat human beings originated nearly 1 million years ago. Debate continues among scientists about the valid-itv and proper application of mitochondrial DNA sequences in evolutionary studies, but it is clear that molecu -lar genetics will play a growing role in this area.

soLIRct’sB. Alberts, D Braj, J . Lew’is, et al., “’l’he E~olution of the Cell, ” in Afofecular Biology of the Cell (New York, NY: Garland Publishing 1983)R.L Cann, “In Search of El e,” The Sciences (September fOctober):30-37, 1987.RI.. Cann, M. Stoneking, and A C, Wilson, “Mitochondrial DNA and Human Evolution,” Nature 325:31-36, 1986.R Lew’in, “k[olecular Clocks Turn a Quarter Century, ” Science 239:561 -56~, 1988.J Tierney, 1. t$’right, and K Springen, ‘ The Search for Adam and El’e, ” Newsweek, Jan. 11, 1988, pp. 46-52.J Wainscot, ‘[)ut of the Garden of Eden, ” ,Wttme 32,5.13, 1986.J D. W’atson, N H Hopkins, J \\’. Roberts, et al , ,$ folecdar Bio/o@’ of the Gmw (hlenlo Park, NJ: f3en)amin (’umrnings Publishing, 1987)

Page 76: Mapping Our Genes - Genome Projects: How Big? How Fast?

72

Box 3-F.—Molecular Anthropology

Anthropologists working in a central Florida bog recently discovered 8,000-year-old human skeletonswith well-preserved brains, some of which have provided the oldest available samples of human DNA. Be-fore this discovery, samples of DNA had been available only from the dried tissue remains of archaeologicalspecimens from more arid regions. The fact that DNA can be preserved in other than excessively dry condi-tions greatly increases the number of archaeological sites at which more ancient DNA samples may bediscovered. DNA fragments have also been prepared from Egyptian mummies, from an extinct animal calleda quagga, and from a 35,0()()-year-old bison from Alaska. Biologists have been trying to clone these DNAfragments for use in studies of evolution. The sample from the extinct bison is likely to be old enoughfor comparison with modern buffalo DNA; this comparison may provide clues to the mechanisms of ge-nome evolution. The human DNA samples, although important discoveries, are too recent to be particularlyinformative in studies of molecular evolution. As methods for working with the DNA extracted from theseancient species are improved, and as more specimens are uncovered, the application of gene mapping andsequencing technologies to anthropology and archaeology will be more feasible.SOURCES:B. Bower, “Human DNA Intact After 8,000 Years,” Science News, Nov. 8, 1986, p, 293,G,H. Doran, D. A’. Dickel, W.E. Ballinger, et al., “Anatomical, Cellular and Molecular Analysis of 8,000-Year-Old Brain Tissue From the Windover Archaeo-

logical Site,” Nature 323:803-806, 1986.

APPLICATIONS IN POPULATION BIOLOGY

Population biologists study populations byanalyzing many individuals. They are interestedin similarities and differences among individuals,among groups, among varieties, and among spe-cies. To address such questions as how geog-raphy and environment affect inheritance pat-terns of certain traits, a physical map and acomplete sequence of a single reference ge-nome are not particularly valuable. It wouldbe more useful to have corresponding sequenceinformation from widely diverse geographicalareas, from various religious and ethnic sub-groups, and from all races (9).

Population geneticists studying human beings,plants, or animals make great use of molecularmarkers—RFLPs and, increasingly, sequences ofspecific regions —to assess the extent of geneticvariability (see box 3-G). Information on the samesmall chromosomal region (e.g., a gene or a re-gion important for gene expression) from manyindividuals might be more useful than informa-tion on larger chromosomal regions from a fewpersons (43), Genes for rare diseases are not allfound in a single human genome: Sickle cell he-

moglobin, for instance, might not have been dis-covered if only Northern Europeans had beenstudied (4,9).

Problems in population genetics that bear onpublic health involve finding means for estimat-ing human mutation rates,l for studying suscep-tibility to pathogens such as the virus responsi-ble for acquired immune deficiency syndrome(AIDS), and for assessing possible environmentalinfluences on these phenomena. The mechanismsgenerating physical variability among human be-ings are by no means well understood and involvenot only genetic factors, but, among other things,a complex set of environmental factors. DNA se-quences from representative portions of manyhuman genomes would also be of more imme-diate use than whole genome sequences formonitoring the effects of specific environ-mental factors on the structure of the humangenome (9).

‘An OTA report assesses these scientific issues: U.S. Congress, Of-fice of Technology Assessment, Technologies for Detecting Herit-able Mutations in Human Beings, OTA-H-298 (Washington, DC: U.S.Government Printing Office, September 1986).

Page 77: Mapping Our Genes - Genome Projects: How Big? How Fast?

73

BOX 3-G.—Implications of Genome Mapping for Agriculture

Since the dawn of agriculture, people have manipulated plants to enhance desired traits simply byobserving the results of breeding, with no true understanding of the genetic principles involved. Manyscientists working in the field of plant molecular biology believe that genome projects will have importantimplications for agriculture, by increasing knowledge about the genes that control or influence yield, timeto maturation, nutritional content, resistance to disease, insects, and drought, and other factors in the pro-duction of crops.

The first gene maps ever constructed were assembled as a result of a series of painstakingly detailedcrosses of pea plants and statistical analyses of data carried out by Austrian monk Gregor Mendel. Mendelwas the first to recognize that some traits could be transmitted according to regular hereditary patterns.All modern genetics and much of modern biology build upon the foundation laid by Mendel.

Construction of RFLP marker maps has begun for corn, tomatoes, cabbage, and other crop plants. Suchgenetic maps give plant breeders the ability to use gene structure rather than observable characteristicsto develop new varieties of plants. This ability should facilitate the development of intricate strategies formanipulating complex traits controlled by multiple, interacting genes.

The availability of RFLP maps makes it possible to select for several unrelated traits simultaneouslyor to manipulate traits controlled by clusters of genes that interact in complex ways. Researchers havemapped three genes that control efficiency of water use (drought tolerance) and five genes that have amajor impact on flavor and soluble solids in tomatoes. Three genes that make a major contribution to insectresistance in tomatoes have also been mapped. A group of genes that influences yield has been found incorn. RFLP maps of genes influencing equally important traits are being developed for alfalfa, azaleas,cucumbers, onions, roses, sugar beets, and grasses.

Recently, there has been renewed interest in a small flowering plant called Arabidopsis thaliana, a duck-weed in the mustard family. Although this plant has no obvious economic or nutritional value, it is a valu-able research tool for plant molecular biologists. The Arabidopsis genome, at about 70 million base pairs,is about 10 percent or less the size of some of the major crop plant genomes, such as cotton, tobacco,or wheat. The small size of this plant makes it an important model system for studying general mechanismsof gene regulation that may be directly applicable to economically important but genetically less tractableplants. For these reasons, work has already begun on making complete genetic linkage and contig mapsof the Arabidopsis genome.SOURCES:H. Bollinger, Native Plants Incorporated, personal communication, Jan 19, 1988.Judson, see app A.Mount, see app. AS J O’Brien, Genetic Maps 1987 (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory, 1987).P,P Pang, and E.M. Meyerowitz, “Arabidopsis Thaliana: A Model System for Plant Molecular Biolo~,” Biotechnology 5:1 177.1181, 1987.hi. Walton, and T. Helentjaris, “Application of Restriction Fragment Length Polymorphism (RFLP) Technolo&y to Maize Breeding, ” unpublished manu-

script, 1988.J D W’atson, N.H. Hopkins, J,W. Roberts, et al , in “Molecular Biolo@ of the Gene, ” vol. I, (Menlo Park, NJ: Benjamin/Cummings Publishing, 1987)

CHAPTER 3 REFERENCES

1. Barker, D., Wright, E., Nguyen, K., et al., ‘(Genefor von Recklinghausen Neurofibromatosis Is inthe Pericentromeric Region of Chromosome 17, ” 4.Science 236:1100-1102, 1987.

2. Bass, B. L., and Weintraub, H., “A DevelopmentallyRegulated Activity That Unwinds RNA Dup]exes” 5.Cell 48:607-613, 1987,

3. Bodmer, W. F., Bailey, C. J., Bodmer, J., et al.,“Localization of the Gene for Familial

Adenomatous Polyposis in Chromosome 5)” Nature328:614-616, 1987.Bowman, J. E., Department of Pathology, TheUniversity of Chicago, personal communication,December 1987.Branden, C .-1., “Anatomy of a/b Proteins,” CurrentCommunications in Molecular Biology: ComputerGraphics and Molecular Modeling, R. Fletterickand M. Zoner (eds.) (Cold Spring Harbor, NY: Cold

Page 78: Mapping Our Genes - Genome Projects: How Big? How Fast?

74

6.

7.

8.

9.

10.

11.

12.

13.

Spring Harbor Laboratory, 1986), pp. 45-51.Cavenee, W. K., Hansen, M. F., Nordernskjoid, M.,et al., “Genetic Origin of Mutations Predisposingto Retinoblastoma,” Science 228:501-503, 1985.Costs of Human Genome Projects, OTA workshop,Aug. 7, 1987.Craik, C, S., Rutter, W .J., and Fletterick, R., “SpliceJunctions: Association With Variation in ProteinStructure,” Science 204:264-271, 1983.Crow, J. F., Laboratory of Genetics, University ofWisconsin, Madison, personal communication,May 1987.Davies, K. E., Pearson, P. L., Harper, P. S., et al.,‘(Linkage Analysis of Two Cloned SequencesFlanking the Duchenne Muscular Dystrophy Locuson the Short Arm of the Human X Chromosome, ”Nucleic Acids Research 11:2302-2312, 1983.Donis-Keller, H., Green, P., Helms, C., et al., “AGenetic Linkage Map of the Human Genome,” Cell51:319-337, 1987.Doolittle, R. F., Of URFS and ORFs: A Primer onHow To Analyze Derived Amino Acid Sequences(Mill Valley, CA: University Science Books, 1987).

Doolittle, R. F., ‘(Genes in Pieces: Were They EverTogether?” Nature 272:581-582, 1978.

14, Doolittle, R. F., Feng, D. F., Johnson, M. S., et al.,“Relationships of Human Protein Sequences toThose of Other Organisms,” Molecular Biology ofHomo Sapiens: Cold Spring Harbor Symposium onQuantitative Biology 51(part 1):447-455, 1986.

15. Ecker, J. R., and Davis, R. W., “Inhibition of GeneExpression in Plant Cells by Expression of AntisenseRNA, ” Proceedings of the National Academy of Sci-ences USA 83:5372-5376, 1986.

16. Egeland, J., Gerhard, D., Pauls, D., et al., “BipolarAffective Disorders Linked to DNA Markers onChromosome 11,” Nature 325:783-787, 1987.

17. Estivill, X., Farrall, M., Scambler, P., et al., “A Can-didate for the Cystic Fibrosis Locus Isolated bySelection for Methylation-Free Island,” Nature326:840-845, 1987.

18. Friend, S. H., Bernards, R., Rogelj, S., et al., “A Hu-man DNA Segment With Properties of the GeneThat Predisposes to Retinoblastoma and Osteosar-coma, ” Nature 323:643-646, 1987.

19. Gilbert, W., “Genes in Pieces Revisited,” Science228:823-824, 1985.

20. Gilbert, W., “Why Genes in Pieces? ’’Nature 271:501,1978.

21. Gilbert, W., Marchionni, M., and McKnight, G., “Onthe Antiquity of Introns, ” Cell 46:151-154, 1986.

22. Go, M., “Correlation of DNA Exonic Regions WithProtein Structural Units in Hemoglobin,” Nature271:90-92, 1981.

23. Gusella, J., Wexler, N., Conneally, P., et al., “A Poly-morphic DNA Marker Genetically Linked to Hun-tington’s Disease,“ Nature 306:234-238, 1983.

24. Herskowitz, I., “Functional Inactivation of Genes

25

26

by Dominant Negative Mutations,” Nature 329:219-222, 1987.Knowlton, R. G., Cohen-Haguenauer, O., Van Cong,N., et al., “A Polymorphic DNA Marker Linked toCystic Fibrosis Is Located on Chromosome 7,” Na-ture 318:381-382, 1985.Kucherlapti, R., “Gene Replacement by HomologousRecombination in Mammalian Cells)” Somatic Celland Molecular Genetics 13:447-449, 1987.

27. Lander, E. S., and Green, P., “Construction of Mul-

28

29

tilocus Genetic Linkage Maps in Humans, ” Proceed-ings of the National Academy of Sciences USA84:2363-2367, 1987,Lee, W.-H., Bookstein, R., Hong, F., et al., “HumanRetinoblastoma Gene: Cloning, Identification, andSequence)” Science 235:1394-1399, 1987.Mathew, C. G. P., Chin, K. S., Easton, D. F., et al., “ALinked Genetic Marker for Multiple Endocrine Neo-plasia Type 2a on Chromosome 10,” Nature328:527-528, 1987.

30. McGarry, T.J., and Lindquist, S., “Inhibition of HeatShock Protein Synthesis by Heat-Inducible Anti-sense RNA,” Proceedings of the National Academyof Sciences USA 83:399-403, 1986.

31.

32

33.

34.

35.

36.

37.

38.

McGinnis, W., Garber, R. L., Wirz, J., et al., “AHomologous Protein-Coding Sequence in Drosoph-ila Homeotic Genes and Its Conservation in OtherMetazoans,” Cell 37:403-408, 1984.Monaco, A., Neve, R., Colletti-Feener, C., et al., “Iso-lation of Candidate cDNAs for Portions of the Du -chenne Muscular Dystrophy Gene,” Nature 323:646-650, 1986.National Research Council, Mapping and Sequenc-ing the Human Genome (Washington, DC: NationalAcademy Press, 1988).Nurse, P., and Lee, M. G., “Complementation UsedTo Clone a Human Homologue of the Fission YeastCell Cycle Control Gene cdc2, ” Nature 327:31-35,1987.Patrusky, B., “Homeoboxes: A Biological RosettaStone,” Mosaic 18:26-35, 1987.Rebagliati, M. R., and Melton, D. A., “Antisense RNAInjections in Fertilized Frog Eggs Reveal an RNADuplex Unwinding Activity,” Cell 48:599-605, 1987.Reeders, S, T., Breuning, M. H., Davies, K. E., et al.,“A Highly Polymorphic DNA Marker Linked toAdult Polycystic Kidney Disease on Chromosome16)” Nature 317:542-544, 1985.Royer-Pokora, B., Kunkel, L. M., Monaco, A. P., etal., ‘(Cloning the Gene for an Inherited Human

Page 79: Mapping Our Genes - Genome Projects: How Big? How Fast?

75

Disorder---- Chronic Granulomatous Disease---on theBasis of Its Chromosomal Location,” Nature 322:32-38, 1986.

39. Salmons, B., Groner, B., Friis, R. et al., “Expressionof Antisense mRNA in h-ras Transected NIH-3T3Cells Does Not Suppress the Transformed Pheno-type, ” Gene 45:215-220, 1986.

40. Scott, J., “Molecular Genetics of Common Diseases, ”British Medical Journal 295:769-771, 1987.

41. Seizinger, B. R., Rouleau, G. A., Ozelius, L. J., et al.,“Genetic Linkage of von Recklinghausen Neuro-fibromatosis to the Nerve Growth Factor Recep-tor Gene, ” Gene 49:589-594, 1987.

42. Simpson, N. E., Kidd, K. K., Goodfellow, P. J., et al.,‘(Assignment of Multiple Endocrine Neoplasia Type2a to Chromosome 10 by Linkage, ’’Nature 328:528-530, 1987.

43. Siniscalco, M., “On the Strategies and Priorities forSequencing the Human Genome: A Personal View,”Trends In Genetics 3:182-184, 1987.

44. Smithies, O., Gregg, R. G., Boggs, S. S., et al., Nature317:230-234, 1985.

45. Stephens, J. C., Human Gene Mapping Library,Howard Hughes Medical Institute, New Haven, CT,personal communication, 1987.

46. St. George-Hyslop, P. H., Tanzi, R. E., and Polinsky,R.J., “The Genetic Defect Causing Familial Alz-heimer’s Disease Maps on Chromosome 21,” Sci-ence 235:885-890, 1987.

47. Stein, J. P., Catterall, J. F., Kristo, P., et al., “Ovomu-coid Intervening Sequences Specify Functional Do-mains and Generate Protein Polymorphism, ” Cell21:681-687, 1980.

48. Stormo, G. 1987; ‘(Identifying Coding Sequences, ”in Nucleic Acid and Protein Sequence Ana{vsis: APractical Approach, M.J. Bishop, and C.J. Rollings(eds.) (Oxford: IRL Press, 1987).

49. Sudhof, T. C., Russell, D. W., Goldstein, J. L., et al.,“Cassette of Eight Exons Shared by Genes for LDLReceptor and EGF Precursor,” Science 228:893-895,1985.

50. Thomas, K. R., and Capecchi, M. R., Nature 324:34-38, 1986.

51. Thomas, K. R., Folger, K. R., and Capecchi, M. R., Cell44:419-428, 1986.

52. Tsui, L.-C., Buchwald, M., Barker, D., et al., “CysticFibrosis Locus Defined by a Genetically LinkedPolymorphic DNA Marker,” Science 230:1054-1057,1985.

53. U.S. Congress, Office of Technology Assessment,New Developments in Biotechnology, 4: U.S. Invest-ment in Biotechnology, OTA-BA-360 (Washington,DC: U.S. Govenment Printing Office, in press).

54. U.S. Congress, Office of Technology Assessment,Human Gene Therapy, OTA-BP-BA-32 (Washington,DC: U.S. Government Printing Office, December1984).

55. Wainwright, B.J., Scambler, P. J., Schmidtke, J., etal., “Localization of Cystic Fibrosis Locus to HumanChromosome 7cen-q22)” Nature 318:384-386, 1985.

56. White, R., Woodward, S., Leppert, M., et al., ‘(AClosely Linked Genetic Marker for Cystic Fibrosis, ”Nature 318:382-384, 1985.

57. Wise, J. A., Department of Biochemistry, Univer-sity of Illinois, Urbana, personal communication,June 1987.

Page 80: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 4

Social and EthicalConsiderations

Page 81: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTS

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Basic Research ., . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Levels of Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Access and Ownership. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Commercialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Diagnostic~herapeutic Gap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Physician Practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Reproductive Choices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Eugenic Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Positive Eugenics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Negative Eugenics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Eugenics of Normalcy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Attitudes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Role of Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Duties Beyond Borders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Chapter 4 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Boxes

7981818282838383848485858587878888

Box Page

A.DNA Fingerprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......80B. Determinism and the Human Genome. . . . . . . . . . . . . . . . . . . . . . . . . . . . .......86

Page 82: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 4

Social and Ethical Considerations

“Science is a match that man has just gotta light. He thought he was in a room—inmoments of devotion, a temple—and that his light would be reflected from and displaywalls inscribed with wonderful secrets and pillars carved with philosophical systemswrought into harmony. It is a curious sensation, now that the preliminary sputter isover and the flame burns up clear, to see his hands lit and just a glimpse of himselfand the patch he stands on visible, and around him, in place of all that human comfortand beauty he anticipated—darkness still. ”

H.G. Wells, 1891

“The moral significance of humankind is no more threatened by peeking at the un-derlying musical notation, the base sequences, than is reading the score of Beethoven’slast symphony diminishing to that piece of work.”

Thomas H. Murray,Case Western Reserve University, 1987

INTRODUCTION

As projects to map and sequence the human ge -nome are undertaken, their long-range social andethical implications need to be considered as partof policy analysis, yet further knowledge is neededbefore many of these implications emerge. Somewill arise in the course of deciding what priorityto give genome projects and what level of resolu-tion (coarse genetic linkage map, complete DNAsequence) is most appropriate. More profoundethical questions are posed by possible applica-tions of genetic data for altering the basis of hu-man disease, human talents, and social behavior.Questions about personal freedom, privacy, andsocietal versus individual rights of access to geneticinformation are among the most important. A fullpicture of the human genome will of necessityraise questions about the desirability of usinggenetic information to control and shape the fu-ture of human society. The complexity and ur-gency of these issues will increase in proportionto advances in mapping and sequencing.

Part of the reason for studying genomes is tosee how variations in genes account for differ-ences among people. Some of the issues raisedin this chapter relate specifically to these varia-tions: What will be the impact of discovering that,in their genetic endowment, human beings are

either more equal or more unequal than we nowsuppose? other problems do not concern geneticdifferences, but rather the impact of discoveringthe extent to which genes do or do not limit theoptions of human beings in general. One commen-tator has argued that scientists bear a responsi-bility for using “moral imagination” to anticipatethe full range of uses and consequences of theirwork, especially when that work is in the basicsciences (2).

The social considerations raised by genome proj-ects include ethical issues. Ethical issues often arisein the context of debates about values, principles,or human actions that have had particular meritin the past. Such debates about what ought to bedone often cannot be resolved by empirical in-quiry. Specific genetic information such as thelocation of a gene along a chromosome or the se-quence of nucleotide bases composing a specificgene is value-neutral and as such is not ethicallytroublesome. However, questions about privateinvestment versus the allocation of Federal re-sources or about the proper use and availabilityof genetic information are ethical questions be-cause they involve choices among actions basedupon competing notions about what is good, right,or desirable.

79

Page 83: Mapping Our Genes - Genome Projects: How Big? How Fast?

80

Competing ideas about the desirable course ofhuman action are developed from considerationsabout the greater good, personal freedom, bene-fiting others, avoiding harm, and fairness andequality. It is important to note that the ethicalissues surrounding the use of and access to geneticinformation are not unique to the enterprise ofmapping and sequencing the human genome (10)(see box 4-A). The existing uses of genetic screen-

ing, which in most cases are based on incompleteinformation about the location of a specific gene,already raise ethical questions. In addition, somegeneral ethical questions are moot because of con-temporary realities, for example, the question ofwhether there should be any human genome map-ping and sequencing activities at all. This ques-tion is moot because mapping and sequencing proj-ects have been underway for over a decade and

BOX 4-A.—DNA Fingerprints

DNA fingerprints are derived from traces of human biological material such as blood, semen, hair,or other tissue. Recombinant DNA technology is applied to these samples to identify patterns of geneticsequence that are unique to each human being. Matched DNA fingerprints can establish the identity ofa given individual with near certainty. DNA fingerprints, therefore, have great practical use in establishingthe identity of criminals, family members, or bodily remains.

Genetic fingerprinting raises ethical issues such as the maintenance of personal autonomy when tissuesamples are requested for identification purposes and the maintenance of confidentiality of individual geneticprofiles. Even after tissue specimens have been discarded, there is considerable fear that genetic recordswill be retained in spite of the wishes of the human source of the tissue. California requires convictedsex offenders to give blood and saliva samples before their release from prison. The provision of suchsamples also makes it possible to discover information that may be incidental to past criminal records (e.g.,XYY chromosome, drug use) but that could be used against the present or former inmates.

In the United States to date, practical applications of DNA fingerprinting have involved tests of specificsuspects or known criminals. There are plans in California to store this information in the world’s firstcomputerized data bank of DNA fingerprints. In Great Britain, however, a DNA analysis of blood samplesfrom all men and boys between the ages of 13 and 30 in Leicester County was conducted in an attemptto identify the person who raped and murdered two teenage girls. A 17-year-oId boy originally chargedwith the crimes was released when his genetic profile did not match that derived from the semen leftin the victims. More conventional investigative methods were later used to catch a suspect, a local bakerwho had avoided the test. The mass screening effort left investigators with a genetic profile on every youngman in the county, information they later destroyed.

DNA fingerprinting has also been used as proof of paternity for immigration purposes. In 1986, Bri-tain’s Home Office received 12,000 immigration applications from the wives and children of Bangladeshiand Pakistani men residing in the United Kingdom. The burden of proof is on the applicant, but establish-ing the family identity can be difficult because of sketchy documentary evidence. Blood tests can also beinconclusive, but DNA fingerprinting results are accepted as proof of paternity by the Home Office.

Testing of extended families has been used in Argentina to identify the children of at least 9,OOO Ar-gentineans who disappeared between 1975 and 1983, abducted by special units of the ruling military andpolice. Many of the children born to the disappeared adults were kidnapped and adopted by military “par-ents, ” who claimed to be their biological parents. Once genetic testing of the extended family revealedthe true identity of the child in question, the child was placed back in the home of its biological relatives.It was initially feared that transferring a child from its military “parents” who were kidnappers but whohad nevertheless reared the child for years would be agonizing. In practice, the transferred children be-came integrated into their biological families with minimal trauma,St) LIRCES’Offwe of ‘rechnolo&v Assessment, based on Herman, R , “British Police Embrace ‘DNA’ Ihngerprints, ” The Washington Post, Xov 24, 1987A J Jeffreys, J F Y Brookfiekf, and R. Semeonoff, “Positive Identification of an Imm]gratlon Test<ase [Jsmg Human DNA Fmgerprmts, ” ,Vature 317818.819, 1985.J M Diamond, “Abducted orpbans [dentlfied by Grandpatwnity Testing,” Nature 327552.553 1987

Page 84: Mapping Our Genes - Genome Projects: How Big? How Fast?

81

there has been no concerted effort to prohibitthem. The more immediate questions, therefore,are how these projects should best proceed fromnow on and what use should be made of newgenetic information.

Each of the following sections begins with a listof important social and ethical questions, followedby a short general discussion establishing the con-text of these issues and, in some cases, outliningopposing arguments. Decisions about mappingand sequencing rest in part on arguments aboutappropriate allocation of resources, Argumentsabout access to versus control of knowledge turnon debates about the relative importance of ethi-cal principles such as autonomy (that is, self-determination or personal freedom of action) and

beneficence (the duty to act in ways that benefitand do not inflict harm on others). There is gen-eral concern about the ways in which personalfreedom of action might be either enhanced ordiminished by increased knowledge about humangenetics. Finally, there is significant concern aboutthe possibility of eugenics, that is, that new andexisting information will be used in attempts toimprove hereditary qualities. The social and ethi-cal arguments relevant to mapping and sequenc-ing the human genome reveal the tension betweenan attempt to arrive at some clear insight aboutduties and obligations and an attempt to weighbenefits versus harms. The purpose of this chap-ter is to describe and clarify important points ofsocial and ethical controversy, not to resolve them.

BASIC RESEARCH

How should the conduct of research in thebasic sciences, such as genome mapping andsequencing, be influenced by a concern forthe social good?What are the considerations when basic re-search in the biological sciences seems to takeresources away from areas of research thatmight have more immediate social benefit?

A genetic linkage map of the human genomealready exists and progress has been made in thedevelopment of a physical map. Practical debate,therefore, centers on questions about the mostefficient and effective way to develop the com-plete physical map, that is, whether the whole hu-man genome should be sequenced in a system-

LEVELS OF

● What level of resolution of the physical mapis really needed, and for what purposes?

while even a rough genetic map, permitting theidentification of markers linked with major dis-eases, might prove useful to insurers or othersbent on identifying high-risk individuals, it wouldhave less value for basic researchers than a moreprecise map. From an ethical standpoint, the keyarguments about levels of resolution, or molecu-lar detail, are based on the distribution of costsand benefits involved. If the public is asked to pay

atic way and how new genetic information shouldbe applied.

How these questions are answered dependsupon the values attached to scientific progress andthe relationship between scientific progress andhuman good, There is a strong argument that basicscientific research is valuable in and of itself andshould be pursued for its own sake. Coordinated,systematic mapping of the human genome is con-sistent with this view, and proponents argue forresources and against constraints in the name ofconducting good science. Others argue that sci-entists need to be responsive to and sometimeseven constrained by the public interest (7).

RESOLUTION

an appreciable portion of the cost, then it deservesto participate in the political debate about embark-ing on an expensive, full-scale project. Scientificand technical factors being equal, chromosomalregions in which greater clarity would benefitmany people (e.g., those associated with preva-lent genetic diseases) might be addressed first. Ifthe largest share of the costs is borne by the pri-vate sector, then few, if any, questions of prioritywill be posed, other than those chosen by the per-sons investing in the projects.

Page 85: Mapping Our Genes - Genome Projects: How Big? How Fast?

82

ACCESS AND

What are the ethical considerations pertain-ing to control of knowledge and access to in-formation generated by mapping and se-quencing efforts?Who should have access to map and sequenceinformation in data banks?Do scientists have a duty to share informa-tion; what are the practical extent and limitsof such an obligation?Who owns genetic information?Do property rights to individuals’ genetic iden-tities adhere to them or to the human spe-cies (14)?Is genetic information merely a more detailedaccount of an individual’s vital statistics, orshould this information be treated as intrin-sically private, not to be sought or disclosedwithout the individual’s express consent (10)?

There is a method in scientific research that al-lows investigators to pursue their hunches, testtheir hypotheses, replicate their results, and pub-lish their findings in roughly that order. Carefuladherence to this process ensures accuracy andthe orderly development of knowledge. The timelag between discovery of new information andcommunication of it, however, has caused somecommentators to question whether scientists havethe right to withhold information about geneticmarkers that might be of great interest to the pub-lic at large.

OWNERSHIP

From an ethical perspective, it may be arguedthat genetic information is by definition in the pub-lic domain: The human genome is a collective prop-erty that should be held in common among allpersons of human heritage (8). An opposing argu-ment is that, since gene sequences are not com-monly knowable and understanding them re-quires the use of expensive and often patentablemachinery, discovery of sequences and the fruitsthat derive from them belong to the person whouncovered them. By this reasoning, it does notmatter whether the sequences are unique or howthey might be used, it is the labor and inventive-ness associated with the discovery of them thatmakes them valid intellectual property. Currentpatent law takes the latter tack but limits patent-ability by preventing the patenting of a personor an idea.

One prominent scientist has acknowledged thepublic’s special claim to the genome but arguesthat a public enterprise may not be the best wayto satisfy this claim and that delay on so urgenta project serves no one (5). A significant portionof the value of the genetic information gatheredthrough human genome projects will not be fullyrealized until some decades after the projects arecompleted, but there is little doubt that it will helpelucidate the function and physical location ofgenes that cause or predispose to illness and dis-ease. For this reason alone, the sequences will havesubstantial commercial value.

COMMERCIALIZATION

● What facets, if any, of human genome map- which are held by individual companies. The ethi -ping and sequencing activities should be com- cal issues of privatization of this knowledge turnmercialized? on the importance of sequences lost to others by

The commercial value of genome sequences hasalready been recognized by companies that haveapplied for patents on a number of specific mate-rials and techniques. At least one company hasargued that it has the right to copyright and con-trol the materials and maps that it develops (5).

academic - communities ‘or corporations whichhave restricted the use of them. On one level, theproblem is largely academic, since the data neededfor a complete map and sequence could be assem-bled by the public sector, with duplication or pur-chase of the data held by private parties. Onanother level, however, the Potential loss of criti-

The selective forces of the marketplace have cal data, the duplication of effort, and the controlgenerated a database network, some portions of of knowledge raise serious questions about a com -which are in the public domain and others of bined scheme of public versus proprietary hold-

Page 86: Mapping Our Genes - Genome Projects: How Big? How Fast?

83

ing of fundamental knowledge. There is a strong lowing scientists and others to retain the bene-argument that parts of research that are funded fits of commercial exploitation of inventions.publicly should yield public information, while al-

DIAGNOSTIC/THERAPEUTIC GAP

What are the ethical implications of the grow-ing gap between diagnostic and therapeuticcapabilities?Should diagnostic information about geneticdisorders for which there is no therapeuticremedy be handled differently from thatabout disorders for which there are therapeu-tic interventions?

There is no doubt that continuing scientific ad-vances in mapping and sequencing the humangenome accelerate diagnostic applications. Onephilosopher has noted that the ability to map thehuman genome yields information about suscep-tibility that is more precise, more certain, and

PHYSICIAN

● Do physicians and other health care providersface a conflict between an increasingly reduc-tive approach to medical science and a focuson holistic patient care (17)?

Increased information about human geneticschanges attitudes and alters the knowledge thatserves as a basis for health care interventions. Phy-sicians and other health care providers must con-stantly alter their views and understanding of hu-man behavior, health, and disease. There are manyexamples of diseases that were once thought tobe amenable to preventive health care that are

potentially more threatening to individual free-dom and privacy than earlier methods of presymp-tomatic diagnosis and vague hypotheses aboutfamilial traits (10). A related issue is the need toprotect information that may be available to orsought by third parties such as insurance compa-nies or employers. Progress to date indicates thatthe ability to diagnose a genetic abnormality pre-cedes the development of therapeutic interven-tions and that this gap may be growing. This istrue for many genetic diseases, an important ex-ample being Huntington’s disease (see box 7-A inch. 7).

PRACTICE

now known to have a genetic component or cause.On a practical level this presents obvious difficul-ties, as health care providers are increasingly un-certain whether they are dealing with patternsof health and illness in individuals that can beameliorated by changes in life style and medicaltreatment or if such patterns are in large part amatter of genetic destiny. In addition, the ethicalprinciple of respect for persons indicates that in-dividuals must be treated with care, compassion,and hope because they are persons and not merelythe embodiments of a genetic formula or code,

REPRODUCTIVE CHOICES

● What ethical considerations arise from the The ethical question of one generation’s dutiesincreased ability of parents to determine the and obligations to another becomes more evidentgenetic endowment of their children (through as genome mapping generates data pointing to thesuch practices as selective termination of serious consequences of certain cultural practicespregnancy, selective discarding of human em- or mating patterns. For example, it has beenbryos created in vitro, or selection of X- or demonstrated that, if it were possible to chooseY-bearing sperm to determine the sex of the the sex of their children, many individuals andchild)? couples would prefer that their firstborn be male

Page 87: Mapping Our Genes - Genome Projects: How Big? How Fast?

84

(18). It has also been demonstrated that firstbornchildren benefit from their early period of exclu-sive parental attention. If firstborn boys becamethe norm, it might further compromise equalityof opportunity between men and women (16). Insuch circumstances, the conflicts among valuesand ethical principles such as autonomy, justice,and beneficence will be strong. Human matingthat proceeds without the use of genetic dataabout the risks of transmitting diseases willproduce greater mortality and medical costs thanif carriers of potentially deleterious genes are

alerted to their status and encouraged to matewith noncarriers or to use artificial inseminationor other reproductive strategies (3).

On a practical level, the availability of informa-tion that couples might use to select embryos cre-ated in vitro has been hampered by an absenceof federally funded research concerning many as-pects of human fertilization. There has been a defacto moratorium on such research since 1980(13).

EUGENIC IMPLICATIONS

● What ethical concerns arise from possible eu-genic applications of mapping and sequenc-ing data?

The possibility of mastery and control over hu-man DNA once again raises the highly chargedissue of genetic selection. One major differencebetween current and previous attempts at eugenicmanipulation is that any potential eugenicist willhave substantially more powerful techniques toeffect desired ends and more data with which tomuster support. With even the modest knowledgeachieved in their first century, genetic techniqueshave become sophisticated enough to permit theuse of selective breeding to produce animals withdesired qualities.

When Francis Galton defined eugenics in 1883as the science of improving the “stock)” he in-tended the concept to extend to any techniquesthat might serve to increase the representationof those with “good genes.” Thus, he indicatedthat eugenics was “by no means confined to ques-tions of judicious mating, but takes cognizance ofall the influences that tend, in however remotea degree, to give the more suitable races or strainsof blood a better chance of prevailing speedily overthe less suitable than they otherwise would havehad” (4). Prior to the development of recombinantDNA technology, eugenic aims were primarilyachieved by attempting to control social practicessuch as marriage. New technologies for identify-ing traits and altering genes make it possible foreugenic goals to be achieved through technologi-cal as opposed to social control.

Knowledge of human genetics will amplify thepower to intervene in the diagnosis and treatmentof disease. Each time a person who would other-wise have died of a disease caused or influencedby a gene is treated successfully by genetic or non-genetic means, the frequency of that gene in thepopulation increases [Lappe, see app. A]. Humangenome projects will intensify and accelerate thealready difficult debates about who should haveaccess to one’s genetic information by providingfaster and cheaper methods of testing for geneticvariations, by making much more informationavailable, and by increasing the specificity ofgenetic information (15). The ethical debate abouteugenic applications more properly focuses onhow to use new information rather than onwhether to discover it. Eugenic programs areoffensive because they single out particular peo-ple and therefore can be socially coercive andthreatening to the ideas that human beings havedignity and are free agents.

Positive Eugenics

Beginning with Plato, philosophers have recog-nized that eugenic ends could be achieved throughsubtle or direct incentives to bring togetherpresumptively fit human beings. Positive eugenicsis defined here as the achievement of systematicor planned genetic changes in individuals or theiroffspring that improve overall human life andhealth and that can be achieved by programs that

Page 88: Mapping Our Genes - Genome Projects: How Big? How Fast?

85

do not require direct manipulation of genetic ma-terial.

Most commentators have rejected or cast doubton any uses of genetic engineering to enhance ordirectly improve the human condition. The Presi-dent’s Commission for the Study of Ethical Prob -lems in Medicine and Biomedical and BehavioralResearch declared that efforts to improve or en-hance normal people, as opposed to amelioratingthe deleterious effects of genes, is at best prob-lematic (11).

It may well be that the problem with positiveeugenics has more to do with the means than withthe ends. The basic objective of improving the hu-man condition is generally supported, althoughdebates about just what constitutes such improve-ment continue. Many concerns about eugenic pol-icies in the past focused on the methods used toattain them, such as sterilization, rather than onthe ends themselves.

Negative Eugenics

Negative eugenics refers to policies and pro-grams that are intended to reduce the occurrenceof genetically determined disease. It implies theselective elimination of gametes (ova or sperm)and fetuses that carry deleterious genes, as wellas the discouraging of carriers of markers forgenetic disease from procreation. There are few

technical obstacles to karyotyping human beingsfor eugenic reasons. Verbal genetic histories ofsperm donors, for example, are designed to ex-clude donors carrying some genetic diseases. Sucha screening process, accompanied by a physicalexamination and laboratory tests, has already beenrecommended by the Ethics Committee of theAmerican Fertility Society (l). The developmentof specific genetic tests could make gamete screen-ing easier and more specific and will also expandexisting capabilities to conduct prenatal tests.

Eugenics of Normalcy

The third eugenic use of genetic informationwould be to ensure not merely that a person lackssevere incapacitating genetic conditions, but thateach individual has at least a modicum of normalgenes. One commentator has argued that individ-uals have a paramount right to be born with anormal, adequate hereditary endowment (6). Thisargument is based on the idea that there can besome consensus about the nature of a normalgenetic endowment for different groups of thehuman species. The idea of genetic normalcy, oncefar-fetched, is drawing closer with the develop-ment of a full genetic map and sequence; how-ever, concepts of what is normal will always beinfluenced by cultural variations and subject toconsiderable debate.

ATTITUDES

● How will a complete map and sequence of It will also depend on how important human ge-the human genome transform attitudes and nome projects are to understanding genetic fac-perceptions of ourselves and others? tors for complex traits. Whether higher human

One of the strongest arguments for supportinghuman genome projects is that they will provideknowledge about the determinants of the humancondition. One group of scientists has urged sup-port of human genome projects because sequenc-ing the human genome will provide one of themost powerful tools humankind has ever had fordeciphering the mysteries of its own existence (12).

attributes are reducible to molecular constructsis a topic of considerable debate in the philoso-phy of biology, and human genome projects woulddoubtless enlarge and intensify this debate. A rea-sonable hypothesis is that, while little informa-tion of director immediate value regarding com-plex behaviors is likely to result from humangenome projects, insights into the possible con-

struction of control regions for the developmentThe relevance of this proposition will depend of the human embryo, the genetic basis for orga -

on the degree to which complex human behaviors nizing neuronal pathways, and the genetic con-are determined by understandable genetic factors. trol of sexual differentiation will all be significantly

Page 89: Mapping Our Genes - Genome Projects: How Big? How Fast?

86

enhanced. In the long run, knowledge of humangenetics will make scientific understanding of hu-man life more sophisticated.

A greatly increased understanding of how genesshape characteristics could influence human be-ings’ attitudes toward themselves and others[Glover, see app. A]. Such increased understand-ing might highlight the degree to which geneticfactors are equal or unequal for traits that con-fer social advantage. This information might re-veal that human beings have fewer options thanthey suppose and could thereby encourage a de-terminist view of human choices (see box 4-B), orit could reveal just the opposite. A general increase

in genetic information might also alter social cus-toms based on erroneous scientific assumptions.

Many individuals have general beliefs abouttheir genetic potential for achievement in certainspheres of activity, about the limits of possibleimprovement through effort or environmentalchange. These intuitive beliefs are often vague andinaccurate. often, it is only in regard to a few skillsor characteristics that individuals have pushedagainst the limits of their potential. When sciencemakes it possible to trace the actual limits of indi-viduals, intuitive perceptions may turn out to bewrong. This has the potential of both enhancingand limiting personal liberty.

BOX 4-B.— Determinism and the Human Genome

Determinism in biology is the general thesis that, for every action taken, there are causal mechanismsthat preclude any other action. Mapping and sequencing the human genome will not alone impose a deter-minist view of human nature. Seeing where genes are located, or knowing the order of bases in the DNA,will not alone make behavior predictable.

But mapping and sequencing together with tracing the pathways between genes and behavior will startto paint a determinist picture. Scientists are now starting to work out these pathways. Take, for example,the pattern of behavior classified by psychiatrists as sensation seeking, which involves a disposition towardgambling and alcoholism. This behavior is correlated with low levels of activity of the platelet monoamineoxidase. These levels of activity have been shown by studies of twins to be largely under genetic control.

In a determinist model, human actions can be explained in terms of causal mechanisms, even thoughthose mechanisms may be very complex. If this model is right, it seems that what human beings do, justas much as what billiard balls do, is the product of a set of laws operating in particular circumstances.

This view of human nature is disturbing. It suggests that a Godlike scientist, with complete knowledgeof all the relevant causal laws and of the circumstances in which they operate, could successfully predicthuman action. In two different ways, determinism is at least an apparent threat to our attitudes. First,the elimination of genuine choice would leave no room for the belief that we can partly create, actualize,or modify ourselves. Second, undermining choice may also undermine many emotional reactions to others.The determinist picture may not leave room for justifiable resentment of what people do or for justifiablefeelings of blame or guilt.

There are alternative views within determinism. Hard determinism is the view that individual choiceis entirely ruled out, along with the emotional responses linked to holding people responsible for whatthey do. Soft determinism asserts that free choice and responsibility are compatible with determinism.

The issue is whether the soft determinist can resist the hard determinist’s argument against freedomand the reactive attitudes. There are two strategies for resisting: 1) to point out that determinism is notthe same as fatalism, that even in a deterministic world what human beings do influences the future; and2) to disagree that determinism eliminates genuine choice, attempting to work out a model of free actionthat is compatible with determinism.SOURCESOffice of Technology Assessment, 1988Clover, see app A.

Page 90: Mapping Our Genes - Genome Projects: How Big? How Fast?

87

ROLE OF GOVERNMENT

what is the proper role of government inmapping and sequencing the human genome?Specifically, does the government have a rolein deciding what data should be collected ingene mapping and sequencing? How shouldthis information be disseminated and guardedfrom abuse?

The lines of power, coercion, and authority inthe public and private scientific sectors are blurredbecause the first genetic maps are being made incorporations (e.g., Collaborative Research, Inc.)and in private philanthropies based in universi-ties (e.g., the Howard Hughes Medical Instituteat the University of Utah).

The ethical arguments for involving the FederalGovernment in the process of genome mapping,whether by shaping, constraining, blocking, ordoing nothing, center on the public interest inmaking resources available in ways that are con-sistent with the considerations of beneficence, jus-tice, and autonomy. These issues encompass aca-demic freedom or freedom of scientific inquirybecause the projects have universal and lastingimplications. Once the human genome is mappedand sequenced, the resulting data will have wide-spread implications for generations to come[Lappe, see app. A].

The precise boundary between basic and ap-plied science is hard to draw, but there is enoughunderstanding of where it lies to be able to useit as a basis for policy. A case might very well bemade for a government policy that would leave

basic research unrestricted but that would placesome stringent controls on applied research andtechnological applications, for example, by ensur-ing that genetic testing is voluntary and accessto data is controlled.

All research carries with it the likelihood ofchanging one’s conception of the world and soof changing one’s attitudes. For these reasons,there is a strong case against government inter-vention to stop research. There are four mainarguments:

1.

2.

3.

4.

Stopping research might be opting for com-fortable ignorance or illusion rather than un-comfortable truth. The growth of science hasrested on the preference for uncomfortabletruth. Those who view science as one of man-kind’s finest creations will be dismayed at anywholesale repudiation of this preference.It is unlikely that existing world views, be-liefs, and attitudes can be protected by shut-ting down basic research. The knowledge thatsuch protection was needed might itself startto undermine existing views.As a practical matter, it maybe that govern-ment cannot stop basic research, It is not easyto monitor what goes on in laboratories, andwhat is stopped in one country may take placein another [Glover, see app. A].Stopping research blocks both possible ben -efits and risks. The belief that research canbe performed to permit benefits while cop-ing with and occasionally avoiding risks is amatter of historical precedent.

DUTIES BEYOND BORDERS

What, if any, ethical issues are raised when and sequencing information?considerations of international competitive- . what issues are involved when applicationsness influence basic scientific research? of genetic information or biotechnology thatWhat, if any, are the duties and obligations are of great use to Third World countries areof the United States to disseminate mapping not developed or fully exploited because theyand sequencing information abroad? are less profitable for industrialized coun-What are the implications of shared informa - tries?tion for international competitiveness?What are the international implications of The United States has recently proposed an in-sharing technological applications of mapping ternational framework of rules for science. The

Page 91: Mapping Our Genes - Genome Projects: How Big? How Fast?

88

purpose of this framework is to see that all na-tions do their fair share of basic research and thatall the results of such research be made public,except for those with strategic implications (9).The increased protection of intellectual propertyand patent rights for technological innovationsformed the basis of this proposal; these rights werealso central to recent international trade talks.There is some sentiment that barriers to the trans-fer of technology would continue even if therewere no reward for intellectual property. onecommentator has noted that, unless products areprotected by a set of principles now, basic scien-tific results could become increasingly restricted;some nations might do less basic research and in-stead emphasize applying other nations’ results (9).

The most common single-gene defects, disordersof the hemoglobin molecules that carry oxygenin red blood cells, are highly prevalent in manynations in Southern Europe, Africa, the MiddleEast, and Asia. Such nations would benefit mostif research tools became widely available as theywere developed and if priorities for which chro-mosomal regions are mapped first took worldprevalence of disorders into account. Use of mapand sequence information by developing nationsmay also require special attention to devisingscreening tests that are cheap and simple, andmight entail access to services (e.g., sequencingor mapping) located in developed nations[Weatherall, see app. A].

CONCLUSION

All human beings have a vital interest in the so-cial and ethical implications of mapping and se-quencing the human genome. It is not surprising,therefore, that there are debates about how ge-nome projects should proceed. These extend be-yond considerations of scientific efficacy and in-volve the interests of patients, research subjects,physicians, academicians, lawyers, entrepreneurs,and politicians. Mapping the human genome ac-celerates our rate of understanding—and the dis-tance between increased understanding and di-rect intervention to alter the human genome isshrinking. Add to this the development of scien-tific tools such as gene probes, and immediate

practical questions are posed: How should basicresearch be conducted? What level of resolutionin mapping is necessary? Who should have accessto and ownership of data banks and clone reposi-tories? How should thorny questions surround-ing commercialization be handled? Long-rangequestions about eugenics, reproductive choices,the role of government, and possible duties andobligations beyond national borders also arise.These questions are complex and are not likelyto be resolved in the near future. It will thereforebe necessary to ensure that some means for ex-plicitly addressing ethicaI issues attends scientificwork.

CHAPTER 4 REFERENCES

1. American Fertility Society, Ethics Committee, “Ethi-cal Considerations of the New Reproductive Tech-nologies,” Fertility and Sterility 46:1 S–94S, 1986.

2. Callahan, D., “Ethical Responsibility in Science inthe Face of Uncertain Consequences,” Annals ofthe New York Academy of Sciences 265:1-12, 1976.

3. Campbell, R. B., “The Effects of Genetic Screeningand Assortative Mating on Lethal Recessive-AlleleFrequencies and Homozygote Incidence, ’’AmericanJournal of Human Genetics 41:671-677, 1987.

4. Galton, F., Inquiries Into Human Faculty (London:Macmillan, 1883).

5. Gilbert, W., quoted in Roberts, L., “Who Owns theHuman Genome?” Science 237:358-361, 1987,

6. Glass, B., “Ethical Problems Raised by Genetics)”in Genetics and the Quality of Life, Charles Birchand Paul Abrecht (eds.) (New York, NY: PergamonPress, 1974), pp. 51-57.

7. Institute of Medicine, Responding to Health Needsand Scientific Opportunity: The OrganizationalStructure of the National Institutes of Health (Wash-ington, DC: National Academy Press, 1984).

8. Issues of Collaboration for Human GenomeProjects, OTA workshop, June 26, 1987.

9. MacKenzie, D., “US Advocates a ‘World Code’ forScience)” New Scientist 5 (November): 245, 1987.

10. Macklin, R., “Mapping the Human Genome: Prob-lems of Privacy and Free Choice,” Genetics and the

Page 92: Mapping Our Genes - Genome Projects: How Big? How Fast?

89

Law III, A. Milunsky and G.J. Annas (eds.) (NewYork, NY: Plenum Press, 1985), pp. 107-114.

11. President’s Commission for the Study of EthicalProblems in Medicine and Biomedical and Be-havioral Research, Screening and Counseling forGenetic Conditions: A Report on the Ethical, So-cial, and Legal Implications for Genetic Screening,Counseling and Education Programs (Washington,DC: U.S. Government Printing Office, 1983).

12. Smith, L., and Hood, L., “Mapping and Sequencingthe Human Genome: How To Proceed, ” Biotech-nology 5:933-939, 1987.

13. U.S. Congress, Office of Technology Assessment,Infertility: Medical and Social Choices, OTA-BA-358(Washington, DC: U.S. Government Printing Office,May 1988).

14. U.S. Congress, Office of Technology Assessment,New Developments in Biotechnology: Ownershipof Human Tissues and Cells, OTA-BA-337 (Wash-ington, DC: U.S. Government Printing Office,March 1987).

15. U.S. Congress, Office of Technology Assessment,Human Gene Therapy, OTA-BP-BA-32 (Washington,DC: U.S. Government Printing Office, December,1984), app. B.

16. Walters, L., “Is Sex Selection a Frivolous Practice?Yes,” Washington Post, Apr. 7, 1987, p. 11.

17. Weatherall, D., “Molecules and Man, ” Nature 328:771-772, 1987.

18. Westoff, C. F., “Sex Preelection in the United States,Some Implications,” Science 184:633-636, 1974.

Page 93: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 5

Agencies and Organizationsin the United States

Page 94: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTS

Page

National Institutes of Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93The Institutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Funding Mechanisms . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Research Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Peer Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Department of Energy ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99The Office of Heakh and Environmental Research. . . . . . . . . . . . . . . . . . . . . . . . 99Health and Environmental Research Advisory Committee Report . ..........101

National Science Foundation ..., . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....102National Bureau of Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...103Centers for Disease Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....103Department of Defense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...104Office of Science and Technology Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ...104Domestic Policy Council . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ., ....105Office of Management and Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ....105Howard Hughes Medical Institute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....105National Research Council. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...106Private Corporations . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . ......107Private Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........109Summary. .. .. .. .. .. .. ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ....109Chapter 5 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........110

FiguresFigure Page

5-1.Organization of NIH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945-2, organization of DOE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995-3. Howard Hughes Medical Institute Genomics Resources . .................106

TablesTable Page5-1.NIH Support for Mapping and Sequencing, Fiscal Year 1986 . . . . . . . . . . . . . 945-2. Division of Research Resources Activities Related to Molecular Genetics. . . . 975-3. Budget for Proposed DOE Human Genome Initiative . . . . . . . . . . , . . . . . . . . .101

Page 95: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 5

Agencies and Organizationsin the United States

“Science has been a formative factor in making both the Federal Government andthe American mind what they are today. The relation of the government to sciencehas been a meeting point of American political practice and the nation’s intellectual life.”

A. Hunter Dupree,Science and the Federal Got~ernment

(Baltimore: Johns Hopkins University Press, 1986), p. 2.

Projects involving or related to mapping or se-quencing the human genome can be found in sev-eral Federal agencies and nongovernment orga-nizations in the United States. Activities at theprincipal agencies and organizations will be brief-ly reviewed in this chapter. They include:

● Federal agencies:—the National Institutes of Health,—Department of Energy,—the National Science Foundation,—the National Bureau of Standards,

—the Centers for Disease Control,—the Department of Defense,—the office of Science and Technology Policy,—the Office of Management and Budget,—the Domestic Policy Council;

● Nongovernment organizations:—the Howard Hughes Medical Institute,—the National Research Council,—private corporations,—private biomedical research foundations and

other philanthropies.

NATIONAL INSTITUTES OF HEALTH

The National Institutes of Health (NIH) form onebranch of the Public Health Service of the U.S.Department of Health and Human Services. Theyare administered by a director, currently JamesWyngaarden. NIH is a highly decentralized con-federation of institutes, divisions, bureaus, andthe National Library of Medicine (see figure 5-l).The principal mission of NIH is to conduct andsupport biomedical research to improve humanhealth.

NIH was established in 1887. Since World War II,it has become “the foremost biomedical researchfacility not only in the United States but in theworld” and “a brilliant jewel in the crown” of theFederal Government, according to Wilbur Cohen,former Secretary of the Department of Health,Education, and Welfare (10).

The institutes with the largest budgets forgenetic mapping and DNA sequencing are the Na-tional Institute of General Medical Sciences, the

National Cancer Institute, the National Instituteof Allergy and Infectious Diseases, the NationalInstitute of Child Health and Human Development,and the National Institute of Neurological and Com-municative Disorders and Stroke (see table 5-l).In fiscal year 1986, NIH supported approximately3,000 projects that involved mapping or sequenc-ing, with a combined budget of $294 million (outof a total budget of $5.26 billion, of which all but5 percent went for research activities) (18). NIHestimated it spent $313 million for such projectsin fiscal year 1987 (18).

Planning at NIH is decentralized. The Office ofthe Director has responsibility for overall direc-tion, but most programmatic decisions are madein the institutes, which are autonomous andlargely control their own budgets. A 1984 reportby the Institute of Medicine remarked on the “ab-sence of the trappings of bureaucratic authority;hence the Director manages largely on the basisof persuasion, consensus, and knowledge” (11).

93

Page 96: Mapping Our Genes - Genome Projects: How Big? How Fast?

94

Figure 5-1.–Organization of NIH

Table 5.1 .—NIH Support for Mapping and Sequencing,Fiscal Year 1986 (millions of dollars)

Human NonhumanInstitute research research TotalNIGMS . . . . . . . . . . . . . . . . . . 12.4 99.6 112.0NCI . . . . . . . . . . . . . . . . . . . . 18.3 24.2 42.5NIAID . . . . . . . . . . . . . . . . . . . 6.0 28.0 34.0NICHE . . . . . . . . . . . . . . . . . . 11.8 18.2 30.0NINCDS . . . . . . . . . . . . . . . . 10.7 10.6 21.3Other institutes and

divisions and the NLM . . 3 1 . 9 22.1 54.0Total . . . . . . . . . . . . . . . . . 91.1 203.0 294.0

Abbreviations: NIGMS = institute of General Medical Sciences; NCl = NationalCancer Institute; NIAID= National lnstituteof Allergyand Infectious Diseases;NICHD= National lnstituteof Child Health and Human Development; NINCDS= National Institute of Neurological and Communicative Disorders and Stroke;NLM = Library of Medicine.

SOURCE’ Office of the Director, National Institutes of Health, May 1987, as mod-ified by the Office of Technology Assessment.

National Instituteof Child Health andHumanDevelopment

National Instituteon Aging

policy were presented by many experts. State-ments both in favor of and against special initia-tives were aired (22). A working group of NIH ad-ministrators was formed subsequent to thatmeeting. The working group is chaired by the di-rector; other members represent several of theinstitutes and divisions most directly involved.l

This working group is responsible for setting over-all policies for NIH in connection with human ge -nome projects, and it initiated two program an-nouncements in 1987 (17). Included in NIH’srelated research figures are several grants to pro-duce physical maps of other organisms or partsof human chromosomes, to develop cloning orDNA detection techniques, and to develop other

Coordination of the various institutes is accom-plished largely by the Office of the Director. InOctober 1986, the Advisory Committee to the Di-rector held a meeting at NIH entitled “The Hu-man Genome)” at which views about setting NIH

‘Other members of the NIH working group on the human genomeare Ruth Kirschstein (Director of the National Institute of GeneralMedical Sciences), Betty Pickett (Director of the Division of ResearchResources), Duane Alexander (Director of the National Institute ofChild Health and Human Development), Donald A.B. Lindberg (Di-rector of the National Library of Medicine), Jay Moskowitz (Associ-ate Director for Program Planning and Evaluation), and George Palade(Yale University). Rachel Levinson (Office of the Director) is execu-tive secretary.

Page 97: Mapping Our Genes - Genome Projects: How Big? How Fast?

95

relevant technologies. The new genome programsare to develop new methods for analysis of com-plex genomes and to improve computer repre-sentation and analysis of information derived frommolecular biology (12), These solicitations forproposals were not associated with any new oradditional funding in 1987, but Congress has setaside $17.2 million for them in 1988. The budgetrequest for fiscal year 1989 is $28 million. To re-view complex genome and informatics proposals,NIH will convene new peer review committees.

The NIH also plans to seek advice from outsidescientists and to keep congressional staff abreastof genome projects through a series of workshops,the first of which was held February 29 and March1, 1988.

The Institutes

The National Institute of General Medical Sci-ences (NIGMS) supports research and training inthe basic biomedical sciences fundamental to un-derstanding health and disease. Its primary func-tion is to support research projects conducted byscientists throughout the nation and the worldthat can serve as the bases for the more disease-oriented research undertaken by the other NIHinstitutes. NIGMS will administer the funds setaside for characterization of complex genomes.Unlike most of the other NIH institutes, NIGMShas no intramural research program–its fund-ing is for work done by non-NIH scientists.

NIGMS supports a major share of basic researchin genetics, including research on nonhuman spe-cies. Such work is concentrated at NIGMS becausethe institute is responsible for research relatedto fundamental biology or a broad array of dis-orders rather than to a disease group, develop-mental stage, or organ system. Genetics under-lies many physiological processes and can explainmany disease states, but most fundamentalgenetics research is not designed to elucidate asingle disease; rather, it elucidates general mech-anisms or illuminates how human diseases mightoccur by showing how other organisms function.Understanding other organisms is often the firstand most important step in understanding humanhealth and disease, but the details of how knowl-edge about bacteria, yeast, or animals will relate

to human biology is rarely known in advance.These are some of the reasons that NIGMS sup-ports such a large share of the work on geneticsof nonhuman organisms.

Each NIH institute other than NIGMS has as itsmission the support of research on a range of dis-eases. The range of diseases may be defined byorgan system, developmental stage, explicitlynamed disease group, or other criteria (see fig-ure 5-1). The distinction between the kinds of worksupported by NIGMS and by the other institutesis not hard and fast; in fact, support extends overabroad range of scientific projects that could comeunder the aegis of NIGMS or one of the other in-stitutes. The National Institute of Child Health andHuman Development (NICHD), for example, hasa program that investigates the basic molecularbiology of development. In connection with this,NICHD convened in May 1987a meeting of scien-tists working on human chromosome 21. Chro-mosome 21 is of special interest to persons doingresearch on Down’s syndrome, Alzheimer’s dis-ease, and several other diseases; it is also of inter-est because it contains the genes underlying sev-eral important and well+ haracterized biochemicalprocesses.

All of the institutes support genetic research (infact, other institutes support more of it in the ag-gregate than NIGMS), but this research is oftendirected at finding the location of a particulardisease-associated gene. (For example, study of thefamilial form of Alzheimer’s disease is supportedby the National Institute of Neurological and Com-municative Disorders and Stroke, the National In-stitute on Aging, and the National Institute on Men-tal Health.) Institutes develop preventive diagnostictests and therapies for genetic diseases.

Funding Mechanisms

Spending at NIH is predominantly for investiga-tor-initiated, basic, undirected research. Mostprojects are related to human diseases, animalmodels of human disease, or fundamental re-search on biological questions that might il-luminate human biology in health and disease,NIH’s primary funding mechanism is the investiga-tor-initiated scientific grant (classified as RO1 bythe NIH bureaucracy and widely known by that

Page 98: Mapping Our Genes - Genome Projects: How Big? How Fast?

term) awarded to a single investigator or smallgroup. The typical RO1 grant (and there are nowmore than 6)100 of them) is given to a researchscientist at a university or other research centerin response to a proposal submitted by that sci-entist. The proposal outlines the research ques-tion addressed, the approach to the question, thepeople who would work on the project, and thebudget for the project. The average grant amountfor projects that involved mapping or sequenc-ing was $130)000 in 1986 (5).

Some efforts-those with a specific purpose—are more amenable to funding by contract. TheGen.Bank” database of nucleic acid sequences, forexample, is supported by this mechanism undera $17.2 million 5-year contract with IntelligeneticsCorp. of Mountain View, California (with a sub-contract to the Los Alamos National Laboratory,where GenBank” is housed). NIGMS administersthe GenBank” contract and is the principal fund-ing unit, with contributions from other NIH insti-tutes and divisions, the Department of Energy, andthe National Science Foundation. NIGMS also main-tains the Human Mutant Cell Repository undercontract. This is a resource for persons attempt-ing to use genetic techniques to understand dis-eases or physiological processes. In these instancesand others, NIH can contract with a provider todeliver a service.

Each NIH institute other than NIGMS adminis-ters a program of intramural research, in mostcases located on the NIH campus. Investigatorsare employed directly by NIH. The intramural re-search programs of NIH collectively constitute thelargest biomedical research facility in the world.NIH’s intramural research complements its ex-tramural support of university and research cen-ter scientists. Components of human genomeprojects that require direct management by NIH

or that would be best integrated into existing pro-grams could be added to the intramural researchprograms.

NIH is not often associated with large, centrallyadministered programs, but it does support many.The National Cancer Institute, for example, sup-ports a number of centers that bring research,training, information dissemination, and clinicalapplication under one roof or administrative ar-

rangement in order to accelerate the communi-cation of ideas among normally disparate groups.Program and center grants are typically largerthan investigator-initiated grants and can includefunds for training as well as for equipment andresearch materials. A concerted research programhas recently begun to combat AIDS (acquired im -munodeficiency syndrome). NIH has the capac-ity to direct a research program that requires co-ordination and some central planning.

Research Infrastructure

A small but important fraction of NIH fundinggoes to support a research infrastructure–resources used by a wide array of scientists andclinical investigators to facilitate their research.Much of the support for a research infrastruc-ture comes from the Division of Research Re-sources (DRR) at NIH. Databases for genetic in-formation, funding for repositories (e.g., forhuman cell lines, DNA clones, and probes), andsupport of the National Library of Medicine arealso important components of the research infra-structure for mapping and sequencing.

The DRR supports regional and national centerswith various purposes. It is divided into five pro-grams, several of which support projects relevantto mapping and sequencing. one of the purposesof DRR-supported resources is to provide scien-tists and clinicians with access to advanced re-search technologies. This involves support of sev-eral databases, materials repositories, computerresource centers, and grants to generate and ana-lyze biomedical research data (see table 5-2). DRRcofunds with NICHD the Repository of HumanDNA Probes and Libraries. This repository facili-tates exchange of research materials crucial togenome projects. DRR also supports a grants pro-gram to apply artificial intelligence and other so-phisticated approaches of information science tounderstanding sequence data and managing largemasses of biological information, Several data-bases, repositories, and activities supported byDRR are cofunded by other NIH institutes or agen-cies of the Federal Government. DRR funds its re-sources through grants and contracts, primarilyto nongovernment scientists. It has helped fundtwo workshops directly related to human genome

Page 99: Mapping Our Genes - Genome Projects: How Big? How Fast?

97

Table 5“2.—Division of Research Resources Activities Related to Molecular Genetics

Resource Function Location

Protein Identification Resource Database for protein sequences and Georgetown Universitysoftware Washington, DC

Sequences of Proteins of Annotated protein sequence file Bolt, Beranek, & Newman, Inc.Immunological Interest Boston, MA

BIONET Network, database linkage, and Intelligeneticssoftware for use in molecular Mountain View, CAbiology

Dana Farber Cancer Institute and DNA sequence analysis software Boston, MA and Houston, TX,Baylor College of Medicine (with and other computer resources respectivelyother NIH institutes)

National Flow Cytometry Resource Chromosome and cell sorting Los Alamos National Laboratory Los(with DOE, other NIH Institutes) Alamos, NM

DNA Segment Library Distribution center for cloned American Type Culture Collection(with NICHD, DOE) human DNA made by Los Alamos Rockvilie, MD

and Lawrence Livermore nationallaboratories

Cell Line Two-Dimensional Gel Cell line analysis by protein Cold Spring Harbor Laboratory ColdElectrophoresis Database elect rophoresis in two dimensions Spring, NY

SOURCE Office of Technology Assessment, 1988.

projects-one with the Department of Energy(DOE) on materials repositories and databases, theother with DOE, the National Library of Medicine,and the Sloan Foundation on applying informa-tion management systems to analysis of complexbiological problems.

The National Library of Medicine (NLM) is thelargest and most comprehensive collection of med-ical information in the world. The library also sup-ports an extensive medical bibliographic resource—the published Index Medicus and MEDLARS/MEDLINE, the most widely used on-line computerreference service for medicine and biomedical re-search. The NLM has been called “the foremostbiomedical communications center in the world”(10) and the ‘(central nervous system of Americanmedical thought and research” (16).

The library was started in 1818 as a few booksin the office of the Surgeon General of the Army,Joseph Lovell. Its great flowering occurred un-der John Shaw Billings in the period after the CivilWar, when it became an internationally recog-nized medical library. The library was transferredfrom the military to the civilian sector in 1956,and its name was changed to the National Libraryof Medicine through legislation sponsored by Sen-ators Lister Hill and John Kennedy. A new build-ing for the collection was constructed on the NIH

campus in 1962, and in 1980 the 10-story ListerHill Center was dedicated. The library became partof NIH in 1968 (16).

The NLM’s expertise lies in managing clinicaland biomedical research information. This in-cludes not only storage of books and journals, butthe publication of reference works that list theextensive international biomedical literature andthe maintenance of computer databases that makeaccess to the medical information more efficient.In recent years, the Board of Regents of the NLMhas pointed to biotechnology databases as an areaof expected future growth and has encouragedlibrary staff to provide improved access to data-bases relevant to genetics, molecular biology, andother aspects of the “new biology. ”

Late in the 99th Congress, Senator Claude Pep-per introduced a bill, the National Center for Bio-

technology Information Act of 1986, that wouldgive NLM responsibility to “develop new commu-nications tools and serve as a repository and asa center for the distribution of molecular biologyinformation” (H.R. 99-5271), The bill was rein-troduced early in the 100th Congress with minor-modifications (H. R. 100-393), and a companionmeasure with very similar provisions (S. 100-1354)was introduced in the Senate by Lawton Chiles.The bill was further amended and introduced as

Page 100: Mapping Our Genes - Genome Projects: How Big? How Fast?

98

S. 100-1966 jointly by Senators Chiles, Kennedy,Domenici, Leahy, Graham, and Wilson in Decem-ber 1987. These bills would make the NLM re-sponsible for improving access to the numerousdatabases used in molecular biology and clinicalgenetics, with funding authorized at $10 millionper year for fiscal years 1988 through 1992. Ap-propriations for fiscal year 1988 included $3,83million for these purposes (13).

The NLM has been conducting research on howto make human genetic information available tothe medical community for several years. It hasmade Victor McKusick’s Mendelian Inheritancein Man (15), the pivotal catalog of human geneticloci identified by analysis of pedigrees, availableon-line through its Information Retrieval Experi-ment program, and it has linked the data in thisvolume to information available in GenBank@ andthe Protein Identification Resource databank. Thelibrary has also begun an experimental programto link molecular biology databases, using re-searchers on the NIH campus in Bethesda to testthe system. It plans to make DNA sequence andprotein database analysis possible through a com-puter link to the National Cancer Institute’s su-percomputer center in Frederick, Maryland. TheHoward Hughes Medical Institute and the NLMhave been discussing ways to link access to thevarious databases supported by NIH and the in-stitute.

Peer Review

The National Cancer Institute became the firstAmerican institution to routinely employ peer re-view when it established the National Cancer Advi-sory Council in 1937 (26). Since then, peer reviewhas become an essential element in allocatingfunds for research grants at NIH. The review sys-tem is two-tiered: The initial review is done bystudy sections of scientific experts; the second tierinvolves recommendations for funding made byan institute advisory council.

Review of the typical grant involves severalsteps. A grant application is received by the Divi-sion of Research Grants at NIH from an investiga-tor (or from a program or center) under sponsor-ship of an institution. The application is thenassigned to a group of scientists from a particu -

lar discipline appointed by the Director of NIH.These groups meet three times a year to reviewgrant applications. They assess applications fortheir scientific merit (including originality, feasi-bility, and importance), the competence of the in-vestigators to do the work, and the appropriate-ness of the proposed budget (4). The study sectionvotes to approve, disapprove, or defer considera-tion of an application. For grant applications thatare not defended, a priority score ranging from100 (best) to 500 is assigned, based on the rank-ings of the individual members of the study sec-tion. This priority score is then included in a sum-mary statement for each application that brieflystates reviewers’ opinions. The summary (andwhere necessary the full documentation) is thenpassed on to the appropriate advisory council.

Each institute at NIH has an advisory council,composed of eminent scientists and informed laymembers, that recommends applications for fund-ing. Advisory councils monitor the quality and fair-ness of review by the study sections and assessspecial relevance to important national healthneeds and the mission of the institute. Membersof advisory councils are appointed by the Secre-tary of Health and Human Services, except thoseon the National Cancer Advisory Board, who areappointed by the President. Inmost cases, the advi-sory council approves the actions of the study sec-tions. Fewer than 10 percent of grant applicationsare singled out for special discussion or action bythe advisory councils (26).

Staff of the NIH institutes then rank the ap-proved proposals. Priority scores are the main,but not the sole, determinants of funding: An esti-mated 1 to 2 percent of proposals are funded be-cause of their particular relevance to a pressinghealth need, the need to start research in areasof future importance, a desire for balance in theportfolio of grants supported by an institute, ethi-cal considerations, or importance to NIH programneeds (26). Roughly one in five grant applicationsis referred to more than one institute by the studysection (11). A small proportion of applications isfunded by more than one institute; typically, how-ever, they are funded by one institute or are notawarded.

Contracts and special programs also receive peerreview, usually through program review commit -

Page 101: Mapping Our Genes - Genome Projects: How Big? How Fast?

99

tees organized by the institutes or divisions. Theintramural research programs are reviewed bynon-NIH scientists who serve on boards adminis-tered by the institutes. Special review committeesare also constituted by the institutes to reviewcenter or program grants.

In its 1984 report on the organization of NIH,the Institute of Medicine noted that “the geniusof the institution in shaping scientific excellenceto health needs is found in the interplay betweenthe categorical research institutes and the discipli-

nary study sections” (11). This statement refersto the fact that except for NIGMS, the NIH insti-tutes focus on a category of diseases or organ sys-tems. Study sections, in contrast, are composedof scientists from a particular discipline or areaof expertise (e.g., genetics, pharmacology, pathol-ogy). These may overlap, but they often do not.Institutes consider applications from differentstudy sections, sometimes from as many as 18 (11).In 1986, there were 2,700 scientists and lay rep-resentatives serving on 155 review committeesat NIH (4,26).

DEPARTMENT OF ENERGY

Much of the attention devoted to mapping andsequencing the human genome can be traced toactivities in the Department of Energy. DOE hasalready begun a program of targeted research onthe human genome —the Human GenomeInitiative—to construct physical maps of severalhuman chromosomes and to develop relevanttechnologies. The part of DOE responsible for theHuman Genome Initiative is the Office of Healthand Environmental Research in the Office ofEnergy Research (see figure 5-2).

The Office of Health andEnvironmental Research

The history of the Office of Health and Envi-ronmental Research (OHER) goes back to the Man-hattan Project of World War II, which was orga-nized to develop fission bombs. OHER began asthe Health Division, started in 1942 by Nobel laure-ate Arthur Holly Compton, a physicist at theUniversity of Chicago. The division focused on pro-tecting people from the effects of radiation andon the use of radioactive chemicals in medicineand biomedical research. The research base wasbroadened to include fossil fuels and renewableenergy sources by the Energy Reorganization Actof 1974. These functions were retained when theEnergy Research and Development Administra-tion became the Department of Energy in 1977and OHER was established (21). The primary mis-sion of OHER has been to study sources of radia-tion, pollution, and other environmental toxins(particularly those related to the generation of

Figure 5-2.—Organization of DOE

Secretary,Department ofEnergy

Biochemistry CoordinationManagement

Genetics Molecular Biology

Page 102: Mapping Our Genes - Genome Projects: How Big? How Fast?

100

energy), to trace them through the environment,and to determine their effects. Another missionis to exploit the resources of DOE-administerednational laboratories to the maximum benefit ofthe nation.

Research at OHER is conducted largely throughthe system of national laboratories. There are eightgeneral-purpose laboratories that conduct OHERresearch as well as research in physical sciencesand mathematics, and there are nine dedicatedOHER laboratories located near national labora-tories or universities. In addition, OHER supportsresearch at 100 universities and research centers.

OHER’S involvement in the human genome de-bate is traced by its former director, CharlesDeLisi, to an idea that occurred to him late in 1985when he was reading a draft of the OTA reportTechnologies for Detecting Heritable Mutationsin Human Beings (23)27). He realized the impor-tance of having a reference human sequence forOHER’S work. Subsequent discussions disclosedthat researchers at the Lawrence Livermore andLos Alamos National Laboratories were thinkingabout ordering DNA clones to make a physicalmap as an extension of ongoing work. Robert Sin-sheimer, chancellor of the University of Califor-nia at Santa Cruz, had hosted a workshop on thefeasibility of sequencing the human genome theprevious year and was very interested. Duringthis period, Nobel laureate Renatto Dulbecco pub-lished a brief article in Science urging that thehuman genome be sequenced (8).

DOE sponsored the Human Sequencing Work-shop in Santa Fe, New Mexico, in March 1986,and DeLisi outlined a three-point strategy in a May1986 memo: 1) to produce a set of overlappingDNA clones related to a physical map of humanchromosomes (see ch. 2), 2) to develop high-speedautomated sequencing methods, and 3) to improvemethods for computer analysis of map and se-quence information. DOE also funded a secondworkshop, Exploring the Role of Robotics andAutomation in the Decoding of the Human Ge-nome, in January 1987. Funding for the DOE ini-tiative, based on the three-pronged attack, beganin fiscal year 1987, with $4.2 million going to 10projects at three national laboratories and at Har-vard and Columbia Universities (6). DOE plans tospend $12 million on human genome projects in

fiscal year 1988 and has requested $18.5 millionfor 1989. A special appropriation of $12.7 millionwas added to the Office of Energy Research bud-get to construct a building for an Institute of Hu-man Genomic Studies at Mount Sinai Medical Cen-ter in New York. This resulted from a congres-sional initiative, and operations of that instituteare not part of human genome projects sponsoredby DOE. The reason for the name of the instituteis unclear, but the institute will apparently houseclinical genetic services for its region (24).

OHER projects include assembling an orderedset of overlapping DNA clones spanning humanchromosome 19 at the Lawrence Livermore Na-tional Laboratory, making a similar clone set forchromosome 16 at the Los Alamos National Lab-oratory (using somewhat different techniques),and constructing a physical map of chromosome21 and chromosome X at Columbia University. Ef-forts in 1988 will expand to include more univer-sity groups and the Lawrence Berkeley NationalLaboratory and other national laboratories. Aneffort to sequence the genome of the bacteriumEscherichia coli by a new method is being sup-ported at Harvard University. Other projects in-clude construction of DNA clones covering thefull set of human chromosomes (not cataloged inorder) and development of new technologies forsequencing, detecting, and analyzing DNA.

Early enthusiasm for mapping and sequencingat the national laboratories stemmed largely fromexisting OHER projects. In one set of projects, laser-activated cell sorting was used to separate indi-vidual chromosomes. Cell sorting began naturallyin the national laboratories because of easy ac-cess to high-technology instrumentation and amultidisciplinary blend of scientists, including bi-ologists, chemists, physicists, computer scientists,engineers, and mathematicians. The first fluores-cence-activated cell sorter was developed at theLos Alamos National Laboratory. This instrumentwas used at the Lawrence Livermore and LosAlamos National Laboratories to sort human chro-mosomes, and these chromosomes were used toproduce sets of DNA clones. The effort was dividedinto two phases.

The first phase was to make sets of small frag-ments of cloned DNA (up to several thousand basepairs) in lambda phage. This phase has been com -

Page 103: Mapping Our Genes - Genome Projects: How Big? How Fast?

701

pleted, and the clone sets have been turned overto the American Type Culture Collection in Rock-ville, Maryland. (Preparation of the clones isfunded by DOE; storage and distribution of theclone sets are funded jointly by NICHD and DRR.DRR also supports the cell-sorter facility at theLos Alamos National Laboratory.) The secondphase is to develop clone sets of up to 45,000 basepairs using cosmids and other vectors (see ch. 2for details). The next logical step is to order theclone sets.

In future years, DOE plans to expand its effortssubstantially. A report recently written by a sub-committee of the Health and Environmental Re-search Advisory Committee is the main publicplanning document for DOE work on human ge-nome projects.

Health and Environmental ResearchAdvisory Committee Report

The Health and Environmental Research Advi-sory Committee (HERAC) is a group of scientistsfrom universities, national laboratories, and pri-vate corporations which reports to the Directorof the Office of Energy Research. Its main func-tion is to advise the Director of OHER on the sci-entific program supported by OHER. In late 1986,HERAC formed a subcommittee on the human ge-nome to make recommendations about DOE’s Hu-man Genome Initiative. The subcommittee waschaired by Ignacio Tinoco and included membersfrom the Howard Hughes Medical Institute, uni-versities, biotechnology companies, and one sci-entist from a national laboratory. The subcom-mittee’s document was approved and was sub-mitted by HERAC to Alvin Trivelpiece, then Di-rector of the Office of Energy Research, in April1987 (25).

The subcommittee report urges DOE to developtwo important tools for research in molecular bi-ology: a reference human DNA sequence and themeans to interpret and use it. These would becreated by a new research program divided intotwo stages. The first phase (5 to 7 years) wouldfocus on:

● assembling ordered DNA clone sets of the hu-man chromosomes;

● locating genes and other markers on a physi-

cal map based on these sets;producing sequences of selected clones anddistributing that information;developing new techniques for mapping andsequencing;applying automation and robotics to mappingand sequencing;creating computational and other methodsfor identifying genes;finding new algorithms for analyzing DNAsequences; andestablishing computer facilit~es, databases,materials repositories, networks, and otherresources to promote use of the methods andresources produced by the projects.

Budget recommendations for the first phase arenoted in table 5-3. The second phase would pro-vide a complete sequence for each human chro-mosome and would make new technologies avail-able for use in addressing the central questionsof medicine and biology.

The subcommittee recommends that the workbe widely distributed among national laboratories,universities, and companies because of the “highlycreative nature” of the science needed to meetthe objectives. The research program would in-clude work by many small groups funded throughinvestigator-initiated grants, as well as larger mul-tidisciplinary centers or consortia. The report alsorecommends that DOE establish a two-tiered sys-tem of peer review: one or two initial review com-mittees to assess technical merit and feasibility,and a policy committee to determine overall strat-egy, develop policy, and oversee scientific review.

Table 5=3.—Budget Proposed for DOE HumanGenome Initiative (millions of dollars)

Fiscal year Amount that year Cumulative amount1988 . . . . . . . . . 20 201989 ..., . . . . . 40 401990 . . . . . . . . , 80 1401991 . . . . . . . . . 120 2601992 .., . . . . . . 160 4201993 .., . . . . . . 200 6201994 . . . . . . . . . 200 8201995 . . . . . . . . . 200 1,020SOURCE: Subcommittee on the Human Genome, Health and Environmental

Research Advisory Committee, Repofl on the Human Gerrorne kritia-tke, prepared for the Office of Health and Environmental Research,Office of Energy Research (Germantown, MD Department of Energy.April 1987).

Page 104: Mapping Our Genes - Genome Projects: How Big? How Fast?

102

The subcommittee urges DOE to ensure thatthe results of the projects be in the public domainand that the efforts be made in cooperation withthose of other agencies in the United States andabroad, within the constraints of Federal lawgoverning technology transfer and concern fornational competitiveness in biotechnology.

A broad-based research program to foster de-velopment of technology is outlined, followed bythe rationale for DOE involvement, namely: 1) thehistorical relation to ongoing work at the nationallaboratories; 2) DOE’s experience with directedresearch programs (as opposed to the much largerand more diverse human and animal research sup-ported by NIH); 3) the relation to the mission ofOHER (in assessing mutational damage from ra-diation and environmental exposure or develop-ing new energy resources); and 4) access to mul-tidisciplinary teams in the national laboratorysystem. The potential utility of DNA sequencingfor monitoring exposure to radiation and toxic

The primary justification for the new initiativeis its potential utility. The technologies and infor-mation deriving from it would make future re-search more efficient (less costly and more power-ful), would directly improve human health, andwould aid economic growth of industries depen-dent on biotechnology. A final section of the re-port warns that, although the program is of thehighest priority, it should not be permitted to hin-der worthwhile ongoing programs, including re-search on nonhuman organisms. Concern that alarge new program at DOE would impede devel-opment in other fields is countered with the ob-servation that large new sums of money have al-ready been introduced into molecular biology:HHMI has increased its annual spending on bio-medical research by over $150 million during thelast decade, with primarily beneficial results. Thesubcommittee ends by stating its opposition to cre-ating any large, inflexible organization to executeor supervise the work.

chemicals is noted as a principal reasonveloping sequencing technologies.

for de-

NATIONAL SCIENCE FOUNDATION

The blueprint for the National Science Founda-tion (NSF) grew from the report Science—The End-less Frontier, written by Vannevar Bush in 1945(2). The original ideas for NSF, as propounded byBush and Senator Harley Kilgore, were modifiedby postwar events and eventually led to legisla-tion creating the foundation in 1950. The prin-cipal purpose of the NSF was to continue the Fed-eral Government’s role in sponsoring basicresearch, a role that developed during WorldWar II (9,14). Biology at NSF is supported throughits Directorate of Biological, Behavioral, and So-cial Sciences. In fiscal year 1987, NSF spent anestimated $32.7 million on research related to genemapping and sequencing. Of this amount, only$200,000 went for focused projects on gene map-ping and sequencing of nonhuman organisms; thebulk was for basic research ($13.7 million) andfor the research infrastructure, such as develop-ment of methods, new scientific instruments, data-bases, and repositories and support of instrumen-tation centers ($19 million). Planned spending for1988 was $37.9 million. These figures are part of

the $206 million spent by NSF in support of bio-logical science in fiscal year 1987, out of the totalNSF budget of $1.62 billion (12).

NSF supports primarily basic research in all sci-ences. Support of basic research grants is thelargest single component of NSF funding relatedto human genome projects. In recent years, NSFhas increased its emphasis on engineering andtechnology development (e.g., it partially sup-ported development of the California Institute ofTechnology’s DNA sequenator). In 1987, NSF an-nounced a Biological Centers Program intendedto stimulate the growth of knowledge in biologi-cal research areas important to the continued de-velopment of biotechnology. Support for thesecenters, estimated at $12 million for fiscal years1987 and 1988, constitutes the second largest com-ponent of NSF funding of genome-related activi-ties. A center for bioprocess engineering has beenfunctioning at the Massachusetts Institute of Tech-nology for several years. (NIH has also supportedthis center, for research training.) Two types of

Page 105: Mapping Our Genes - Genome Projects: How Big? How Fast?

103

centers are to be created under the Biological instrumentation and funds individual grants forCenters Program: One will focus upon sharing basic biological science. Although the NSF bud-capital-intensive instrumentation and developing get for biology is small relative to its support fornew instruments; the other will host large-scale other areas and to DOE and NIH support, itmultidisciplinary research. Either could be used nonetheless supports mapping and sequencingby groups mapping and sequencing various organ- through bioengineering, basic biology research,isms. NSF also sponsors a program on biological and the centers programs.

NATIONAL BUREAU OF STANDARDS

The National Bureau of Standards (NBS) was cre-ated in 1900 as the National Standardizing Bureau.It is part of the Department of Commerce, andits primary mission since its inception has beento develop standards in scientific and technicalfields in order to facilitate industrial progress andto prevent incompatibilities that could hamper re-search or technological applications. NBS also hasa program of research in methods and instrumen-tation that has grown naturally out of trackingdiverse and rapidly advancing technologies. Itsmain technical expertise lies in the physical, chem-ical, and information sciences, but it is now de-veloping expertise in biotechnology. It has joinedwith the Montgomery County Government andthe University of Maryland, for example, in sup-port of the Center for Advanced Research in Bio-technology in Gaithersburg, Maryland.

NBS has been suggested as a candidate agencyfor quality control and research on measurementsfor DNA mapping and sequencing. This would giveit the function in biology that it has for physicsand chemistry but would entail a considerable ex-pansion of its expertise and resources devoted tomolecular biology. Its role could include check-ing data for accuracy, assessing the accuracy ofthe machines used in the multicenter mapping andsequencing efforts, and setting standards for thereporting of results. NBS might also conceivablydevelop technical standards for automated ma-chines and computers used in creating or analyz-ing data about DNA. If NBS undertakes a func-tion in quality control and standard setting, it willneed close collaboration with NIH and DOE, wherethe bulk of expertise currently lies.

CENTERS FOR DISEASE CONTROL

The Centers for Disease Control (CDC) are situ-ated in the Public Health Service of the Depart-ment of Health and Human Services. The mainoffices are located in Atlanta, Georgia. CDC is theNation’s primary resource for tracking the inci-dence and prevalence of diseases and for inter-vening to thwart the spread of infectious agentsand preventable diseases. Related to this mission,CDC maintains databases, disseminates informa-tion, and provides materials widely used in clini-cal research.

CDC could have a role in quality control andin monitoring scientific activities involved in map-ping and sequencing the human genome. It hasperformed this function in the past, through itsLipid Standardization Program. This program be-

gan more than 25 years ago to provide quantita-tive measurements for laboratories engaged inlipid research related to diseases of the heart andblood vessels. Since the program was initiated,over 500 national and international laboratorieshave received and analyzed reference materialsprovided by CDC (3). Quality control and stand-ard setting may become important as map andsequence data become more plentiful and as morelaboratories come to rely on a common set of data.If such measures prove necessary, CDC is a possi-ble agency for determining or confirming thechromosomal location or origin of DNA fragmentsor for orienting new DNA fragments on the emerg-ing physical maps. If this function were to be un-dertaken by CDC, close communication with NIHand DOE would be necessary.

Page 106: Mapping Our Genes - Genome Projects: How Big? How Fast?

--

104

DEPARTMENT OF DEFENSE

While biological research is not the main mis-sion of the Department of Defense (DoD), somecomponents of mapping and sequencing DNAmight be shared with or conducted by variouscomponents of DoD. Each military service (par-ticularly the Army and Navy) conducts some re-search in biology, primarily that related to thehealth needs of military personnel or to defensesagainst chemical and biological warfare. DoDreports that all such research is unclassified: Muchof it is conducted at military facilities or contractor-administered laboratories, but some of it is con-ducted in universities as well. DoD supports somegenerally useful resources in biomedical research.

The Armed Forces Institute of Pathology (AFIP)is an international treasure house of tissue sam-ples and microscopic slides spanning the full rangeof human disease. Its tissue collection is used bypathologists and biomedical researchers through-out the world. AFIP began as the Army MedicalMuseum in 1862. It became the AFIP in 1949,when the Navy and Air Force joined with the Armyin support of it, and the role of the institute hasexpanded steadily since then. Today AFIP consti-tutes the largest organization of research and diag-

nostic pathologists in the world, The institute hasreceived more than 2.2 million cases (tissue orslides from patients) from over 50,000 patholo-gists affiliated with more than 19,000 hospitalsand clinical facilities. AFIP’s unique capabilities asa tissue repository have been expanded to includemodern storage techniques. Through further ex-pansion of its capabilities, the staff and facilitiesof AFIP could be used as a national tissue reposi-tory and assessment center for the full spectrumof human diseases, The institute could play a rolein linking map and sequence data to human dis-eases. The availability of systematically classifiedhuman tissues could facilitate development andtesting of medical products and diagnostic meth-ods to probe the molecular basis of variousdiseases.

The military biomedical research communitywould have an interest in map and sequence databecause investigations of the effects of chemicaland biological weapons would include the studyof genes that are particularly vulnerable to attackand the construction of vaccines or other defen-sive measures.

OFFICE OF SCIENCE AND TECHNOLOGY POLICY

The Office of Science and Technology Policy Representatives of OSTP followed the human ge -(OSTP) is headed by the President’s Science Advi- nome debate and spoke at several national meet-sor. OSTP’s primary responsibility is to advise the ings in 1986 and 1987.President on science policy, on matters where sci-entific or technical information is relevant to Fed-eral policy decisions, and on national policies for OSTP recently announced plans to reorganizetechnology development. OSTP can on occasion its oversight of life sciences. It plans to form aform coordinating councils under the Federal Co- Committee on Life Sciences for interagency coo-rdinating Council for Science, Engineering and munication and coordination, and it tentativelyTechnology. An example of OSTP coordination in plans to establish subcommittees on specific topics.life sciences is the Biotechnology Science Coordi- Genome projects have been noted as likely tone-nation Committee, which started as an OSTP ini- cessitate such a subcommittee, although the ex-tiative responsible primarily for devising guide- act role and composition of it is not yet determinedlines for regulation of biotechnology products. (7).

Page 107: Mapping Our Genes - Genome Projects: How Big? How Fast?

105

DOMESTIC POLICY COUNCIL

The Domestic Policy Council (DPC) is a cabinet-level group that reviews government activities.David Kingsbury of NSF, as acting chairman ofthe Biotechnology Working Group, gave a briefpresentation on human gene mapping to the DPCin February 1987. An interagency subcommitteeof this working group, chaired by NIH DirectorJames Wyngaarden, was formed and met in May1987 to exchange information on agency activi-ties. NIH, DOE, NSF, the Food and Drug Adminis-tration, the U.S. Department of Agriculture, the

Environmental Protection Agency, and the Officeof Management and Budget were represented.The purpose of the subcommittee was to mini-mize duplication of effort among the agencies andto promote interagency communication. The sub-committee has subsequently been disbanded, tobe replaced by the OSTP group noted above. TheDPC will continue to keep abreast of developmentson genome projects through the President’s Sci-ence Advisor, who will administer the OSTP group(l).

OFFICE OF MANAGEMENT AND BUDGET

The Office of Management and Budget (OMB)monitors and coordinates the annual budget proc-ess for executive agencies of the Federal Govern-ment and oversees management of the agencies.Each year, every Federal agency prepares a bud-get request that is reviewed within the agencyand then submitted to OMB. OMB reviews the re-quests and develops a budget for the President;this budget is submitted to Congress in Januaryfor the fiscal year beginning that October (al-though the process is late for fiscal year 1989 be-cause of delay in passing the 1988 budget). OMB’Sbudget-coordinating function places it in the po-sition of arbiter among different agencies if there

are conflicting priorities or potential duplications.By this mechanism, and by monitoring other activ-ities in multiple departments, OMB can encouragecommunication and coordination of activities.OMB has one budget officer for NSF, another forNIH, and a third for DOE. These officers are re-sponsible for other agencies as well, and the activ-ities related to human mapping and sequencingconstitute only a small fraction of their total bud-get responsibility. Two officers in the OMB sci-ence office have taken primary responsibility fortracking the human genome budget submissionsof all agencies (20).

HOWARD HUGHES MEDICAL INSTITUTE

The Howard Hughes Medical Institute (HHMI)was created in 1953 by aviator-industrialist How-ard Hughes. It is a medical research organizationwith an endowment of approximately $5 billion.HHMI has increased its research funding dramat-ically over the last decade, from roughly $15 mil-lion in 1977 to approximately $240 million in 1987.

HHMI operates three programs (see figure 5-3).The first and largest is scientific research in 27laboratories located in hospitals, academic medi-cal centers, and universities throughout the United

States. The second program, which supports thefirst research program and is integrated with it,includes a genome resources project, a researchtraining program for medical students (jointly withNIH), and sponsorship of HHMI meetings and re-views. A third program will provide $500 millionover the next decade through grants and specialprograms to support education in the medical andbiological sciences.

Under its first program, HHMI conducts re-search in five basic scientific areas: genetics, im -

Page 108: Mapping Our Genes - Genome Projects: How Big? How Fast?

106

Figure 5-3.—Howard Hughes Medical InstituteGenomics

\

Resources

munology, neuroscience, cell biology and regu-lation, and structural biology. Several HHMI

investigators are involved in genetic mapping andrelated computational research, physical mapping,and medical genetics. The principal HHMI centersfor genetics are located at the University of Utah(Salt Lake City, Utah), the Baylor College of Medi-cine (Houston, Texas), the University of Michigan(Ann Arbor, Michigan), the University of Califor-nia (San Francisco, California), The Johns HopkinsUniversity (Baltimore, Maryland), the Universityof Pennsylvania (Philadelphia, Pennsylvania),Brigham Hospital (Boston, Massachusetts), andChildrens’ Hospital (Boston, Massachusetts). HHMIestimates that it expended $40 million for geneticsresearch in 1987, of which $2 million to $4 mil-lion were devoted to finding and using DNA mar-kers and constructing genetic maps.

The HHMI genome resources project has a cur-rent annual budget of over $2 million. Throughthat project, HHMI supports consequence data-bases relating to human genetics, including theHuman Gene Mapping Library (New Haven, Con-necticut) and the On-Line Mendelian Inheritancein Man (Baltimore, Maryland) (see figure 5-2).HHMI also helps maintain a mouse genetics data-base at Jackson Laboratory (Bar Harbor, Maine)and collaborates with the Center for the Studyof Human Polymorphism (CEPH) headquarteredin Paris, France. CEPH is a critical collaborativeinstitution that links several large groups work-ing on construction of human genetic maps. HHMIhas participated in a large number of meetingson the human genome, including one it sponsoreddirectly in July 1986 at NIH; at that workshop,strategies and policies for mapping and sequenc-ing were discussed. HHMI partially supported theninth Human Gene Mapping Workshop, held inParis in September 1987, which compiled data oninternational gene mapping activities since the pre-vious meeting in 1985.

NATIONAL RESEARCH COUNCIL

The National Academy of Sciences was estab- Research Council now conducts many studies onlished by Congress in 1863. President Lincoln issues relating to science and technology. It is orga -signed the law that brought it into existence. The nized into several disciplinary groups.National Research Council (NRC) was establishedin 1916 to provide advice to the Federal Govern- In August 1986, a group of scientists interestedment about issues involving science. The principal in issues surrounding genome projects met inimpetus was the increasing relevance of science Woods Hole, Massachusetts. This group agreedto preparations for World War I. The National to formulate a proposal for a study by the NRC,

Page 109: Mapping Our Genes - Genome Projects: How Big? How Fast?

107

which was presented to and approved by the BasicBiology Board of the Commission on Life Sciencesin September 1986. A committee of distinguishedscientists, chaired by Bruce Alberts, was appointedto consider the scientific issues connected withgenome projects. The committee on mapping andsequencing the human genome held several pub-lic meetings in 1987 and released its report in Feb-ruary 1988 (19).

The Alberts committee was composed of scien-tists of different backgrounds, with varying de-grees of direct involvement in mapping and se-quencing projects and with initially divergentviews on genome projects. The committee reachedconsensus on several points during the course ofits deliberations. The committee concluded thatmapping, sequencing, and understanding the hu-man genome merited a special effort funded andorganized specifically for this purpose.

The committee’s report recommends that theprojects should begin with “a diversified, sustainedeffort to improve our ability to analyze complexDNA molecules, ” with a “focused effort that em-phasizes pilot projects and technological develop-ment .“ It lists the specific types of maps that wouldbe useful as early genome projects, notes the im-portance of mapping and sequencing genomes ofnonhuman organisms, and stresses the need forthorough peer review. The proposed projects dif-fer from ongoing research by focusing on meth-ods that would improve mapping, sequencing,analyzing, or interpreting the biological signifi-c a n cvere- to ten-fold. The committee also notes the needfor central databases, repositories, and quality con-trol facilities.

Research projects that merit special support areexplained in some detail. The committee favorsdevelopment and refinement of techniques in theearly years, with most support going to work onmapping large genomes. One specific goal in earlyyears, for example, would be to enable sequenc-ing of 1 million continuous base pairs. The need

for technological progress is noted for several ad-ditional areas: to isolate chromosomes, to createcell lines, to clone substantial portions of DNAfrom genomes of whole organisms, to clone DNAin large fragments, to isolate large DNA fragments,to order DNA clones derived from genomes, toautomate many steps involved in mapping and se-quencing DNA, and to improve the collection, stor-age, dissemination, and analysis of informationand materials. Administration of these centralizedfunctions would be conducted by a scientific advi-sory board, including at least one full-time scien-tist appointed as chairman. This scientific advi-sory committee would also serve to advise theagencies and to act as the focal point for interna-tional cooperation.

The committee recommended that $2OO millionper year be appropriated specifically for genomeprojects, increasing to this level over the first 3years. In the first 5 years, this might be spent tofund work at 10 medium-sized multidisciplinarycenters and to support a program of grants tomany more small research groups. An estimated1,200 scientists would be involved, with roughlyhalf located at the multidisciplinary centers, Theresearch component would account for $120 mil-lion per year. The remainder of the budget wouldbe used for construction ($55 million per year ini-tially, decreasing in later years) and to pay for therepository, database, quality control, and admin-istrative functions of the scientific advisory com-mittee ($25 million per year). The funding for con-struction in early years would be reassigned toproduction of maps and sequence data as tech-nologies matured.

A majority of the committee recommended thata single agency be designated and given fundingto lead the effort. Other options were also dis-cussed, including an interagency structure muchlike the task force option discussed in the nextchapter. A final option was to have an interagencybody for planning and funding, but a single agencyfor administration.

PRIVATE CORPORATIONS

private corporations in several fields have ex- in academic or national laboratories are nowpertise relevant to mapping and sequencing the produced commercially. Pharmaceutical and bio-human genome. Many instruments first developed technology companies could use map and se-

Page 110: Mapping Our Genes - Genome Projects: How Big? How Fast?

108

quence information to develop new products, andmany are themselves developing research tech-niques. Companies that market scientific instru-ments are also keenly interested. The role of theprivate sector appears to be primarily to:

advise in planning the mapping and sequenc-ing research program,commercialize products that result from theresearch, andensure that technology transfer from feder-ally funded research projects to commerciallyexploitable products is smooth and rapid.

Private corporations view human genome proj-ects, with a few exceptions, as long-term researchthat is best supported by the Federal Government.Corporations are unlikely to lend financial sup-port to a national program to map and sequencethe human genome, although they might well in-vest in particular projects that involve develop-ment of technology. Private firms could performspecified functions under contract from the Fed-eral Government (e.g., genetic mapping, physicalmapping, DNA preparation, or DNA sequencing)once the technologies are available.

Several American companies already produceDNA sequenators and other analytical instrumentsused in mapping and sequencing projects. TheLawrence Livermore National Laboratory group,which is constructing ordered sets of DNA clonesunder DOE sponsorship, is modifying an instru-ment initially designed for DNA sequencing byApplied Biosystems of Foster City, California, Pri-vate corporations have likewise participated inbuilding the existing genetic map of human chro-mosomes. Collaborative Research and IntegratedGenetics are two companies based in the Bostonarea that have contributed substantially to the ef-fort to find new DNA markers and to Iink thosemarkers to human diseases. A few biotechnologycompanies have been at the forefront in develop-ing automated technologies for handling DNA. TheGenetics Institute (Cambridge, Massachusetts), forexample, developed a robotic system that extractsDNA from bacteria and cells.

At least two companies—the Genome Corp. inBoston and SeQ, Ltd., in Cohasset, Massachusetts—are being started specifically to map and se-quence the human genome. These companies plan

to construct a physical map and subsequently se-quence the human genome over the next decade,using private funds. They would offer access tothe materials and to the map and sequence infor-mation for a price. The process would be muchlike that used currently by researchers, who payrepositories for DNA clones, probes, and vectorsor who pay companies for enzymes and other ma-terials used in molecular biology. The argumentbehind this is that, while each laboratory couldconceivably develop the information independ-ently, it is cheaper and faster simply to buy it froma private firm that has developed it already. Thosepurchasing the information would be free to useit, but not to copy or sell it.

Private corporations could also play a role inthe development of technology related to map-ping and sequencing. This could include companyaccess to government facilities, exchange of cor-porate and academic personnel, multicompanyconsortia, individual corporate agreements withuniversities or national laboratories, or some com-bination of these.

Members of the Industrial Biotechnology Asso-ciation were recently polled regarding their sup-port of Federal initiatives in mapping and sequenc-ing the human genome. Those respondingindicated that:

The work should be funded entirely by theFederal Government and should not interferewith ongoing biomedical research.NIH, DOE, and NSF should all participate (NIHshould take the lead).A national planning committee, composed of50 percent university scientists, 30 percentgovernment representatives, and 20 percentindustry representatives, should be set up.Work should be carried out at disperseduniversity and federally supported labora-tories, not a center created for the purpose.International cooperation should be en-couraged if it does not entail delays.Physical mapping should precede sequencing.

Respondents clearly support a role for industryin planning and using the results of mapping andsequencing projects, while indicating that the Fed-eral Government should pay the bill.

Page 111: Mapping Our Genes - Genome Projects: How Big? How Fast?

109

PRIVATE FOUNDATIONS

Several private foundations support researchin human genetics. These include such disease-oriented foundations as the March of Dimes, theHereditary Disease Foundation, the Muscular Dys-trophy Association, and the Cystic Fibrosis Foun-dation, Other foundations support work on hu-man genetics as part of a broader researchprogram, among them the American Cancer So-ciety, the American Heart Association, and the Alz -

heimer’s Disease and Related Disorders Associa-tion. These foundations, while relatively small intotal funding, often act as catalysts in focusingresearch on a problem of particular interest. Theyare also highly effective at publicizing researchresults, educating the public about the conse-quences of disease, and generating public supportfor biomedical research.

SUMMARY

NIH, DOE, HHMI, and NSF have already madesubstantial commitments to projects related to thestudy of the human genome. Government activi-ties are currently being coordinated by informalcommunication among the agencies. A previouscoordinating group under the Domestic PolicyCouncil will likely be replaced by one organizedunder the Office of Science and Technology Pol-icy in the White House, with budget submissionscoordinated by the Office of Management andBudget, Each research agency has its own meansof funding research and providing peer reviewof programs. NIH and DOE have created specialplanning groups to review genome projects. NIHfunding is over $313 million each year for humanand nonhuman research involving mapping or se-quencing. In 1987, NIH announced two new pro-grams in methods development; it has budgeted$17.2 million for those projects in 1988 and re-quested $28 million for 1989.

In 1987, DOE allocated $4.2 million for 10projects on physical mapping and technology de-velopment, It plans another $12 million in 1988and has received recommendations from an out-side scientific panel to ask for over $1 billion overthe following 7 years. The former director ofOHER stated that at least half the funds wouldbe distributed to researchers at universities andresearch centers other than national laboratories(3) and that the work will be reviewed prospec-tively and retrospectively by peers. DOE officialshave stated that their budget requests will be moremodest than those recommended.

NSF spent over $32.7 million on research re-lated to genome projects in 1987, although only$200,000 was considered to be for genomeprojects per se. NSF’s new Biological Centers Pro-gram is likely to be relevant to genome projects,particularly those involving new instrumentation.HHMI funded $40 million of genetics research in1987, including several million for constructionof genetic maps. HHMI also administers and fundsa genomics resource program of $2 million annu-ally to support databases and other elements ofthe research infrastructure.

To date, actions of the principal organizationscan be described as cooperative. NIH and DOEhave supported many joint efforts related to hu-man genome projects. The GenBank” databasehas been administered by NIH and located at aDOE-supported national laboratory for over 5years, and the two agencies also jointly supportDNA clone and probe repositories, computer anal-ysis methods, and flow-sorting facilities. NIH andDOE sponsored a meeting on database and repos-itory needs of human genome projects in August1987, and there has been an exchange of projectofficers and extensive informal cooperation amongstaff at NIH, DOE, NSF, and other executiveagencies.

The strengths of NIH and DOE are more com-plementary than competitive. Each believes itcould successfully mount and sustain the scien-tific and technical effort necessary for the con-templated mapping and sequencing projects. Bothsupport relevant work already, although withdifferent emphases. A decision to delegate the en-

Page 112: Mapping Our Genes - Genome Projects: How Big? How Fast?

110

tire effort to one agency would require that cur-rent efforts in other agencies be shut down.

The mapping and sequencing effort will morelikely continue to include NIH, DOE, NSF, HHMI,and other agencies and organizations covered inthis chapter. The key question then becomes howmuch and which part each agency should per-form. Such decisions will be made in a generalsense by Congress, through authorization and ap -

propriation, with more detailed planning left toexecutive agencies. There may be informal coop-eration or some more formal means of coordinat-ing the planning and execution of agency projectsunder OSTP. Joint nongovernment advisorygroups could be formed to bring in expertise fromacademia and industry. There are many optionsfor organizing an interagency effort and for in-corporating outside advice into research planning.These issues of organization and advisory struc-ture are discussed in chapter 6.

CHAPTER 5 REFERENCES

1. Bledsoe, R., White House Domestic Policy Council,personal communication, December 1987.

2. Bush, V., Science—The Endless Frontier: A Reportto the President (Washington, DC: reprinted by theNational Science Foundation, 1980).

3. Chen, A., Centers for Disease Control, personalcommunication, August 1987.

4. Committee Management Staff, National Institutesof Health, NIH Public Advisory Groups, NIH Pub.87-10 (Bethesda, MD: NIH, October 1986).

5. DeLisi, C., comments at a meeting of the HumanGenome Subcommittee, Health and EnvironmentalResearch Advisory Committee, Department ofEnergy, Denver, CO, December 1986.

6. Department of Energy, material submitted to theDomestic Policy Council, May 1987.

7. Dorigan, J., White House Office of Science andTechnology Policy, personal communication, De-cember 1987.

8. Dulbecco, R., “A Turning Point in Cancer Research:Sequencing the Human Genome,’’Science 231:1055-1056, 1986.

9. England, J. M., A Patron for Pure Science: The Na-tional Science Foundation Formative Years, 1945-1957 (Washington, DC: NSF, 1982).

10. Harden, V. A., Inventing the NIH: Federal Biomedi-cal Research Policy, 1887-1937 (Baltimore, MD: TheJohns Hopkins University Press, 1986).

11. Institute of Medicine, Responding to Health Needsand Scientific Opportunity: The OrganizationalStructure of the National Institutes of Health (Wash-ington, DC: National Academy Press, October 1984),p. 76.

12. Kingsbury, D., National Science Foundation, per-sonal communication, October 1987.

13. Levinson, R., Directors Office, National Institutesof Health, persona] communication, January 1988.

14. Mazuzan, G. F., “National Science Foundation: A

15

16

17.

18.

19.

20.

21. Office of Energy Research, Department of Energy,

Brief History,” unpublished manuscript, Septem-ber 1987.McKusick, V. A., Mendelian Inheritance in Man, 7thed. (Baltimore, MD: The Johns Hopkins UniversityPress, 1986).Miles, W. D., A History of the National Library ofMedicine: The Nation Treasury of Medical Knowl-edge, NIH Pub. 82-1904 (Washington, DC: U.S. Gov-ernment Printing Office, 1982).National Institutes of Health, NIH Guide to Grantsand Contracts, vol. 16, May 29, 1987, pp. 9-14; Oct.16, 1987, pp. 6-10.National Institutes of Health, Office of the Direc-tor, background information and information pro-vided to the Domestic Policy Council, May 1987.National Research Council, Mapping and Sequenc-ing the Human Genome (Washington, DC: NationalAcademy Press, 1988).Noonan, N., White House Office of Managementand Budget, personal communication, October1987.

22

23

24

25

Health and Environmental Research: Summary ofAccomplishments, vol. 1, DOE Pub. ER-0194, April1984; vol. 2, DOE Pub, ER-0275, August 1986(Washington, DC: DOE, 1984, 1986).Office of Program Planning and Evaluation, Na-tional Institutes of Health, The Human Genome,proceedings of the 54th meeting of the AdvisoryCommittee to the Director, Oct. 16-17, 1986 (Be-thesda, MD: NIH, 1987).Science and the Citizen, “Geneshot,” ScientificAmerican, May 1987, pp. 58-58a.Smith, D., Office of Health and Environmental Re-search, Department of Energy, personal commu-nication, January 1988.Subcommittee on the Human Genome, Health andEnvironmental Research Advisory Committee, Re-

Page 113: Mapping Our Genes - Genome Projects: How Big? How Fast?

111

port on the Human Genome Initiative, preparedfor the Office of Health and Environmental Re-search, Office of Energy Research, Department ofEnergy (Germantown, MD: DOE, April 1987).

26. U.S. Congress, General Accounting Office, Univer-sity Funding: Information on the Role of Peer Re -v i e w a t N S F a n d N I H , G A O p u b , R E E D - 8 7 - 8 7 F S

(Washington, DC: U.S. Government Printing Office,March 1987).

27. U.S. Congress, Office of Technology Assessment,Technologies for Detecting Heritable Mutations inHuman Beings, OTA-H-298 (Washington, DC: U.S.Government Printing Office, September 1986),

Page 114: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 6

Organization of Projects

Page 115: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTSPage

Administrative Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ....115Single-Agency Leadership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. ... ... ... .116Interagency Agreement and Consultation . . . . . . . . . . . . . . . . . . . . . ..........118Interagency Task Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ...119Consortium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............121Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..123

Advisory Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........123Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........123Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ...124Structure and Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .................124

Big Science v. Small-Group Science . . . . . . . . . . . . . . . . . . . . . .................125Displacement of Higher-Priority Science . . . . . . . . . . . . . . . . . . . . . ...........125Style and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................125Politicization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ....127

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............128Chapter 6 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........128

BoxesBox Page

6-A. Acid Precipitation Task Force . . . . . . . . . . . . . . . . . . . . . ..................1206-B. Midwest Plant Biotechnology Consortium . . . . . . . . . . . . . . . . . . . . . ........1226-C. Quotes on Genome Controversies. . . . . . . . . . . . . . . . . . . . . . ..............126

FiguresFigure Page

6-1. Lead Agency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ....1166-2. Interagency Agreement and Consultation . . . . . . . . . . . . . . . . . . . . . . . . . . ...1186-3. Interagency Task Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....1196-4. Consortium. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............121

Page 116: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 6

Organization of Projects

“Organization is a means to an end rather than an end in itself. Such structure isa prerequisite to organizational health; but it is not health itself. The test of a healthbusiness is not the beauty, clarity, or perfection of its organization structure. It is theperformance of people. ”

Peter Drucker,Management: Tasks, Responsibilities, Practices

(New York: Harper & Row, 1974), p. 602.

ADMINISTRATIVE STRUCTURES

Chapter 5 presented the history, current involve-ment, and future plans of the many governmentand nongovernment parties interested in genomeresearch. This chapter assumes the continued in-terest and participation of the current actors, andit discusses the options for organizing those ac-tors at the Federal level.

A properly designed administrative or organiza-tional structure for genome projects is important.As form so perfectly matches function in DNA,so the organizational form of the project shouldmatch its function and goals. These include rapidaccumulation of knowledge about the genome,efficient storage and distribution of that informa-tion, and conversion of this knowledge intoproductive theories, tools, reference materials,and medicines. The political consequences of apoorly administered group of projects are not onlyfailure to achieve potential intellectual and eco-nomic contributions, but also negative impacts onthe organization and funding of other scientificinvestigations. A genome project blueprint can-not be drawn without taking into considerationthe abutting structures as well as the internal con-straints.

Three major funding agencies must be includedin any consideration of organizational design: theNational Institutes of Health (NIH), the Departmentof Energy (DOE), and the National Science Foun-dation (NSF). Nongovernmental bodies such as theNational Academy of Sciences (NAS) and the How-ard Hughes Medical Institute (HHMI) are alreadyparticipating in organizational and advisory roles,and commercial firms anxious for sequencingtechnology and data seek input as well.

There are at least five possible administrativestructures a human genome project could develop:

One agency—a project performed exclusivelyby one of the expert agencies.Single-agency leadership—a project in whichCongress would designate one agency to co-ordinate and oversee the research.Interagenccv agreement and consultation—acooperative project among the agencies inwhich no additional authority structurewould be created.Interagency task force—a project in whicha committee with the authority to direct re-search planning among the agencies wouldbe chartered.Consortium-a project in which the privatesector as well as the Federal Governmentwould plan research, with possible cofund -ing from the corporate partners.

The first alternative, a project organized andexecuted solely by one agency, may be dismissedas unnecessary and politically unworkable. Asingle-agency project could only result from cut-ting out others, and several agencies have alreadymade substantial investments in genome researchand related technologies. Further, the current ge-nome infrastructure, including GenBank” andDNA clone repositories, is already interagency.

The other four proposals have unique strengthsand weaknesses. For any of them to be success-ful, however, the administrative structure mustat least organize communications at the scientific,interagency, and international levels. At most, itshould be capable of planning a research program

115

Page 117: Mapping Our Genes - Genome Projects: How Big? How Fast?

116

involving many partners and funding them ac-cordingly. Congressional decisions on the organiza-tional structure can be based on perceptions ofthe necessary patterns of authority, of quality andscope of experience in research and development,and of fiscal and economic priorities.

Single-Agency Leadership

One possible beginning for genome projectswould be the designation by Congress of a leadagency to coordinate ongoing activities in variousagencies (see figure 6-l). This option was the onefavored by a majority of those on the National Re-search Council committee that issued a report onmapping and sequencing the human genome (18).The strengths of this organizational option derivefrom its clear designation of authority. Suchleadership can be dynamic, and research wouldfollow the theme established by the lead agency.A lead agency, and thus a lead administrator, fo-cuses the project in all its aspects: It provides acommunications link among researchers, domesticand foreign; a contact for media; and a target ofcriticism and politicking. Drawbacks to designat-ing a lead agency are the possibility of incompletecommitment by the lead agency and the poten-tial inability of the lead agency to command theresources of other agencies effectively. Choosingthis option would necessarily entail choosingwhich agency should lead,

Among NIH, DOE, and NSF–the three fundingagencies—NIH and DOE are the most appropri-ate candidates to lead a genome project, NSF isan unlikely leader because its mandate excludesthe investigation of human health and disease, theultimate focus of the projects. Further, a large-scale operation conducted by NSF would inevita-bly detract from other research of which NSF may

Figure 6-1 .—Lead Agency

be the sole patron. Such a loss would occur ineach of the funding agencies involved, but NSFmay be most sensitive because it funds much lessbiology than NIH, and the research it supportsis more basic as a rule than that supported byDOE or NIH (30). NSF can contribute to genomeprojects by stimulating interest in automation androbotics (which it has done), in animal models ofhuman disease (by gathering animal and micro-organism sequence data for comparison), and ininstrumentation (through its biology centers).

Choosing between NIH and DOE is troublesomebecause these agencies have complementarystrengths and weaknesses. The project would havea different face with different leadership.

Because of its mandate to support investigationsto improve the Nation’s health, NIH dominates bio-medical research. The institutes spent an esti-mated $313 million in 1987 for projects that in-volved mapping or sequencing, over $90 millionof which funded projects to characterize the hu-man genome (13,15). NIH has conducted, funded,and administered genetics research expertly foryears, and the institutes would seem to be a natu-ral home for genome projects. The theme of anNIH-led project would likely be a renewed com-mitment to the peer review system and to small,or cottage industry, science, with some added at-tention to the research infrastructure.

One great strength of NIH is its decentralizedadministration: Quality projects uninteresting toone institute may well be funded by another. Thisflexibility is achieved at a cost, however. Criticshave scrutinized this process and concluded that,among other faults, it cannot support a large,directed project (28). NIH leadership can have dif-ficulty imposing the priority decisions needed fora concerted effort. A distinction is often made be-tween the operating styles of NIH and NASA (theNational Aeronautics and Space Administration),with NASA having much greater central author-ity and NIH exemplifying a decentralized processfor setting priorities. To manage some of the pro-posed genome projects, mechanisms beyond NIH’sstandard researcher-riginated format may be re-quired. This could “require a change in NIH’s phil-osophical outlook and its approach)” accordingto George Cahill of HHMI (23).

Page 118: Mapping Our Genes - Genome Projects: How Big? How Fast?

117

NIH could conduct a directed research program,It has done so in the past for the study of particu-lar diseases (e.g., polio and cancer) and is now do-ing so for AIDS. While NIH has not previouslymounted a major project to develop a set of toolsfor biology (as mapping and sequencing projectsare often characterized), it has the funding mech-anisms and expertise necessary to do so. The map-ping and sequencing projects have been describedas a library of information awaiting translation,and NIH administers the National Library of Medi-cine (see ch. 5). To facilitate special genomeprojects, NIH has created new study sections toreview grants that focus on methods; NIH couldalso convene a new scientific advisory body in anexisting institute to direct a focused project, setaside funds for special projects in one or moreinstitutes, and begin new centers or multidiscipli-nary programs analogous to existing ones. It al-ready has a multi-institute coordinating body todevelop special initiatives like those announcedin May and October 1987 for analyzing complexgenomes and for informatics in molecular biol-ogy. NIH is currently in the process of establish-ing a mechanism for obtaining outside advice.

The high capital needs in some areas, the di-verse expertise (extending beyond biomedical re-search) needed on some research teams, and thestandardized and repetitive work of mapping andsequencing may render small research groups un-able or unwilling to do such work (8). Experienceat the National Cancer Institute with the SpecialVirus Cancer Program has suggested that thestandard grant mechanism is insufficient for suchtasks as the production of standardized tools, thedistribution of clinical materials, and the increasedcoordination of investigators (33). If the instituteswere to assign high priority to genome projects,those projects could conflict with other major re-search efforts, for example research on AIDS.Some persons, among them Ruth Kirschstein, Di-rector of the National Institute of General Medi-cal Sciences, have questioned whether “it wouldbe appropriate to have a specifically targeted pro-gram that would compete with all the extraor-dinarily important programs NIH funds” (23). Thedanger is that a targeted program would becomean instead-of program rather than an in-addition.to program, as was the case with the Special VirusCancer Program (33).

As a lead agency, DOE would endow the genomeproject with different characteristics of organiza-tion and expertise. DOE has long supported re-search on human mutations and DNA damage andrepair through the Office of Health and Environ-mental Research. The mission of OHER is to un-derstand the effects of radiation and other meansof energy generation on human health and theenvironment. OHER views ignorance of the ge-nome and the inability to sequence and analyzeDNA rapidly as major limitations on its research.As NIH might emphasize the human disease as-pects of genome research, DOE would emphasizethe investigation of mutagenesis and other areasclosely related to OHER’s mission. Critics havecharacterized OHER’s rationale as “clearly imprac-tical” (17) and “forced and . . . disingenuous” (30).But because of OHER’s interest in human geneticmaterial, DOE already has established expertisein crucial technologies such as automated chro-mosome and cell sorting, and in the computer stor-age of genetic data. DOE believes that, throughits national laboratory structure, it should developmethods and tools useful to the entire commu -nity of molecular biologists (27).

The strengths and weaknesses of DOE arelargely complementary to those of NIH. DOE’sstrength is its familiarity with the administrationof focused research programs, It manages manylarge facilities for research in physics and chem-istry—such as accelerators for high-energyphysics—and the scientists whom DOE funds inthese areas are among the best in the world. DOEalso maintains excellent computing resources. YetDOE does not have the same stature within thecommunity of molecular biologists that NIH does(11,16,24,30).

The national laboratories have long providedservices to the community of molecular biologiststhat are not provided by other agencies. The na-tional laboratories have pioneered many high-technology instruments useful in biology: zonalcentrifuges, high-pressure liquid chromatography,fluorescence-activated cell sorters, and chromo-some sorting. Teams at national laboratories haveprepared sets of DNA clones from individual hu-man chromosomes, and current mapping projectsare logical extensions of this work. Even thoughthe national laboratories are not renowned for

Page 119: Mapping Our Genes - Genome Projects: How Big? How Fast?

118

their expertise in molecular biology, some of thetechnology, analytical software, and new meth-ods that need to be developed will not be in bio-logical disciplines—they will involve engineering,physics, and mathematics, all areas of acknowl-edged national laboratory expertise.

DOE enjoys the reputation of being a proficientorganizer of projects among government, univer-sity, and industry researchers. As an agency, itis experienced in managing large projects and dis-bursing large sums of money, extramurally touniversities and research centers and intramurallyto the national laboratories. Critics fear that, ifDOE assumes leadership of genome projects, itsbias toward central management will corrupt re-search and stifle the more traditional, perhapsmore creative, cottage industry approach.

DOE’s review process for genome projectswould involve prospective and retrospective peerreview. The degree of scrutiny is not likely to differsubstantially from that at NIH. Funding throughDOE would be less likely to sap other biomedicalfunds, but other biological research programs atDOE could suffer. Designating DOE as the leadagency would give the organizational lead to anagency that supports only a small fraction of re-lated research and thus only a small fraction ofthe user community. Having DOE administrativelylead all genome projects could prove unmanage-able in the long term.

The controversy over which agency should lead—DOE or NIH—may be misguided. Each agencyhas a role to play, and discussion should focusinstead on how to encourage cooperation and toensure that the research program of one agencydoes not inhibit that of the other. One observerhas asserted that a major directed program at NIHalone would soon be politically incorporated intothe overall NIH budget and would thereafter dis-place untargeted research. The corollary is that“DOE could find the leadership excellence moreeasily than NIH could provide the budgetary in-sulation” (14). Nonetheless, NIH is the logical choicefor lead agency if Congress chooses to designateone—its mission is most directly affected, and thescientific community now supported by NIH is byfar the largest of the intended beneficiaries of ge-nome projects. If NIH leads, then the expertiseand multidisciplinary research already supported

by NSF and DOE should be explicitly taken intoaccount in future planning. Difficulties in desig-nating a lead agency are discussed under optionsfor action by Congress in chapter 1.

Interagency Agreement andConsultation

The lack of a lead agency implies no favoredresearch strategy or funding mechanism, but abalanced program to take advantage of NIH, DOE,NSF, and other agencies’ strengths. Agencies couldbe left to themselves to cooperate and communi-cate among themselves and with other interestedorganizations in the United States and abroad (seefigure 6-2). A group of agency principals–agreedto by the agencies or under the Office of Scienceand Technology Policy—could meet to achievethese goals and exchange details of research direc-tions and developments.

An interagency agreement and consultationframework eschews any formal creation of au-thority and relies on the good will of the partici-pants to exchange information freely. Such an ar-rangement allows each agency autonomous, andpresumably efficient, use of its resources and per-mits each agency to address those research topicsmost closely associated with its institutional in-terest. Interest may not always correspond to ex-pertise, however, and it may conflict with or over-lap other agencies’ programs. This would actagainst one ostensible goal of the cooperativeeffort—to streamline projects by eliminating un-necessary duplication of research. An informalor ad hoc framework may also be inappropriatefor very expensive, long-term projects becauseevolving and potentially diverging priorities maydiminish rapport among the agencies.

A communications and consultation committeecould be responsible for these cooperative, com-

Figure 6-2.—interagency Agreement and Consultation

Page 120: Mapping Our Genes - Genome Projects: How Big? How Fast?

119

munications, and streamlining functions, but theinstitutional focus would be scattered and theproject would exist without clear leadership. Al-though in the best scenario such a committeewould be completely abreast of all the domesticresearch, it might be too diffuse a body to sup-port international organization of a project.

Decentralized authority is not without benefits,however, for pluralism of funding sources andflexible, decentralized organization are strengthsof American science. Genome projects may becompelling enough to turn the organizationalgears without creating a special bureaucracy forthe task. A cooperative effort also permits a flexi-ble mix of funding options, and each agency wouldretain control over its research planning.

The subcommittee on the human genome of theBiotechnology Working Group of the DomesticPolicy Council acted as “a mechanism for exchang-ing information . . . [with] the right people at theright level, ” according to David Kingsbury of NSF(23). This coordinating group will now be locatedunder a life sciences committee at the Office ofScience and Technology Policy. Such a group ofgovernment administrators facilitates interagencycommunication but may not address other needs.A purely government body is open to the criti-cism that scientists and not government adminis-trators must provide direction or at least partici-pate directly in planning (31). This conflict issimilar to that found in creating an advisory body,which generally reflects the question of how muchinfluence scientists should have on the science pol-icy process (discussed below).

The merit of informal agreement and consulta-tion is that each agency would have the flexibilityto follow its own research agenda. Agreement andconsultation would not require legislation by Con-gress and would make interagency cooperationa matter of congressional oversight. A disadvan-tage is that flexibility maybe achieved at the costof clear authority and accountability. Further,there might be no mechanism for resolving con-flicts among agencies. The appropriateness of in-formal interagency cooperation turns on a judg-ment of which is more efficient—a directed andplanned effort or a pluralistic and decentralizedprocess.

Interagency Task Force

A genome initiative might require more activeleadership than that described above. An inter-agency task force dedicated to pursuing the ge-nome project and wielding some authority overfunding and research might provide such leader-ship (see figure 6-3).

The task force could be constituted much likean interagency committee, with principals fromthe participating agencies; however, the task forcewould possess authority in certain areas, such asgathering of information from participating agen-cies, preparation of reports, formulation of rec-ommendations, and interagency planning. It coulddesign and direct a genome project, drawing oneach of the participating organizations (see box6-A).

A task force would be much like a lead agencyin its ability to draw the attention of foreign re-searchers, the media, and domestic political in-terests. And like a lead agency, the task forcewould present a central character—its chairper-son —who would act as spokesperson for the proj-ect. If the chairperson of the task force wereselected from the agency representatives, how-ever, the appointment would likely carry with itthe same kind of political difficulties as selectinga lead agency.

Establishing a functional authority may requiresubstantial political investment, but the cost ofsubsequent decisions is negligible because theycan be immediate and final. With a committeeauthorized only to facilitate communication, adilemma in the assignment of a particular researchproject, for example, could be costly in any num-ber of ways, from the time it takes to reach a co-operative solution to the money required to dupli-cate the research should no equitable distribution

Figure 6-3.—interagency Task Force

Page 121: Mapping Our Genes - Genome Projects: How Big? How Fast?

120

Box 6-A.—Acid Precipitation Task ForceThe Acid Precipitation Act of 1980 (Public Law

96-294), Title VII of the Energy Security Act, estab-lished a lo-year program to reduce or eliminate thesources of acid precipitation. To implement this pro-gram, Congress mandated the formation of the AcidPrecipitation Task Force, composed of membersfrom the national energy laboratories, the agencies,and four presidential appointees, and chaired jointlyby representatives from the National Oceanic andAtmospheric Administration (NOAA), the U.S. De-partment of Agriculture (USDA), and the Environ-mental Protection Agency (EPA). The task force isthus a truly interagency body, drawing on a vari-ety of agency expertise for leadership.

The legislative history describes the task force asbeing charged with preparing a comprehensive re-search plan, to include individual research, eco-nomic assessment, Federal coordination, interna-tional cooperation, and management requirements.The comprehensive plan is implemented and man-aged by the task force. The Acid Precipitation TaskForce could thus serve as a model for an interagencytask force dedicated to genome projects.

In 1985, representatives from the various agen-cies signed a memorandum of understanding thatfixed the structure for administering the act. Thememorandum assigns authority and responsibilityto: 1) a Joint Chairs Council, consisting of principalsfrom USDA, DOE, EPA, NOAA, the Department ofthe Interior, and the Council for EnvironmentalQuality, and responsible for approving the annualresearch program and the corresponding portionsof the budgets of the participating agencies; 2) thetask force, to review the annual research programand budget and to provide advice and recommen-dations to the council; 3) the Director of Research(appointed by the Joint Chairs Council), to formu-late the research program and budget; 4) the Inter-agency Scientific Committee and the InteragencyPolicy Committee, consisting of senior scientific andpolicy executives, respectively, from the agencies,to advise and recommend; 5) an External ScientificReview Panel; 6) ‘the Office of the Director of Re-search, consisting of scientists and support staff;and 7) research task groups, each under the leadof a specific agency, to develop a research plan andbudget for a particular task.

Shortly after the 1985 reorganization, the Gen-eral Accounting Office reviewed the program at therequest of Congress, because management changesand delays in reporting had become constant. TheGeneral Accounting Office’s recommendations aremore functional than structural, and they relate tothe difficulty of issuing public reports under greatscientific uncertainty. The almost intractable natureof some of the acid precipitation problems is appar-ently issue-specific and not related to the organiza-tion of the project.

The authority under the new organization is sig-nificantly vested in the Joint Chairs Council, the Di-rector of Research, and divided between a scien-tific column and a policy column, The fine structureis fascinating: It attempts to permit each participant-ing agency to retain authority over its research ex-pertise by creating research task groups. For ex-ample, Interior is responsible for monitoringdeposition, NOAA for atmospheric processes, andDOE for emissions and control technology. In ge-nome projects, distribution according to expertisewould have NIH focus on mapping techniques andbiological technologies, and DOE focus on automa-tion and robotics and computation. This could beuseful as long as it did not assign tasks to the wrongagency and did not inhibit flexible interagency plan-ning for areas of legitimate overlap. The agenciesparticipating in the Acid Precipitation Task Forceare working on a scale similar in magnitude to thatof a genome project; from fiscal years 1982 to 1987,the agencies spent just over $300 million for acidprecipitation research.

The joint chair arrangement, among NIH, DOE,and NSF in a genome project, would represent asmooth distribution of authority. The appointmentof a director of research might prove the only boneof contention, as the selection might imply whatstyle of research—small-group science v. Big Sci-ence—is to be funded. The Acid Precipitation TaskForce also balances the concerns of policy specialistswith those of scientists and seeks the input ofnonagency scientists as well (it does neglect non-agency policy specialists, however). This inter-agency task force approach attempts to combinethe dynamic properties of an authoritative leaderwith the efficiency of agencies pursuing their ownresearch expertise.

SOLIRCE: Office of Technology Assessment, 1987, based in part on U.S. Congress, General Accounting Office, Acid Rain: Delays and Management Changes

in the Federal Research Program, GAO Pub. RCED-87-89 (Washington, DC: GAO, 1987).

Page 122: Mapping Our Genes - Genome Projects: How Big? How Fast?

121

be achieved. A task force or lead agency couldeliminate some of this cost.

A task force may not be able to match efficiencyin decision making with efficiency in administer-ing the agencies’ resources. Its recommendationscould be ignored by agencies, or it could provean obstruction or source of delays. A task forceis a bureaucratic solution, identifying a personor group with the goal of genome analysis andbuilding upon the existing authority structure.Such authority may be necessary to direct re-search and provide a focus for international com-munications, but it adds another layer to thebureaucracy that separates the administrators ofscience from the investigators.

Consortium

Like the task force approach, a consortiumwould involve the creation of a new authorita-tive entity. Unlike the organizational structuresdiscussed previously, however, this approachwould require the active participation of privatefirms (see figure 6-4). The introduction of this newfactor complicates the staging of a genome project.

The typical consortium is a close working asso-ciation between a university research group andone or more private firms interested in the pur-suit and economic development of that research.Government involvement in consortia is oftenlimited to financial support during the initial stagesof basic research, while industry waits to fundthe development stage. State government is fre-quently more active than Federal, because theprojects are perceived to be closely linked to lo-cal economic development (see box 6-B).

Figure 6-4. —Consortium

A consortium of universities, businesses, andgovernment is directed toward several mutuallyenriching goals: strengthening universities, stim-ulating (competitive) economic growth, engagingin basic research, creating generic technologies,and developing and delivering specific products(6). These goals correspond to those of some ge-nome projects, which some persons hope willmaintain America’s competitive position in bio-technology against challenges from Asia and Eur-ope. Genome projects would, for example, seekboth to create generic biotechnology tools (suchas techniques for handling very large DNA frag-ments, detecting very small amounts of DNA, anddesigning software for analysis) and to developspecific products (such as vectors for cloning DNAor automated DNA sequencers). Such resultswould benefit university researchers and cor-porate investors alike.

The question of setting the research agenda (inother instances the responsibility of the individ-uals conducting the research and the agenciesfunding it or of the task force established to over-see it) is complicated by private firms’ need to em-phasize technology development and not neces-sarily free inquiry. Profit-seeking firms often haveshorter-term goals than are practical for the sup-port of basic research and therefore focus on thedevelopment of short-term technologies over long-term ones. Not all industries have a short-termperspective, however: Pharmaceutical firms areaccustomed to basic research and long-term pay-offs—investments requiring more than a decadeto bear fruit.

A private emphasis on development is closelyassociated with the effective transfer of technol-ogy into the marketplace. A consortium wouldno doubt speed technology transfer to participat-ing firms, but firms might suppress the spreadof scientific information to protect their invest-ment (5). The phenomenon of sitting on data isnot restricted to industry—academic scientistsmay delay dissemination of information in orderto consolidate results for their own financial orreputational benefit (12)—but proprietary inter-est will not help the free exchange of data. Thus,the question of proprietary rights versus infor-mation in the public domain is a sticky one for

Page 123: Mapping Our Genes - Genome Projects: How Big? How Fast?

122

genome projects, where a naturally occurringDNA sequence can translate into a muhi-million-dollar product. Thus the presence of commercialfirms in academia is two-sided: Goals of technol-ogy transfer and economic development may bemore easily reached, but the control exerted byindustry over the planning of the research agenda

of private firms might corrupt the atmosphereof academia may be overstated now, since onlya few businesses have shown any interest in thegenome project; but the possibilities of reapingeconomic benefits, especially in the current envi-ronment of international competitiveness, arelikely to attract more private sector involvement

and the dissemination of results might be too self- in the future.interested. Concern that the economic aspirations

Box 6-B.— Midwest Plant Biotechnology ConsortiumRepresenting a large number of university, industry, and government partners, the Midwest Plant Bio-

technology Consortium is an experiment in basic research and technology transfer among agricultural sec-tors. Its purpose is to increase the competitiveness of American agriculture and agribusiness through thedevelopment of basic plant biotechnology research.

The idea for the consortium began at DOE’s Argonne National Laboratory (ANL), which has a historicalresearch interest in photochemistry and photosynthesis. ANL determined that a coordinated program inplant science could contribute to biotechnology applications of interest to both industry and governmentagencies.

When ANL invited participation from universities and industry, it specified a number of principles thatwould guide the consortium. The continued importance of both industrial and scientific peer review proc-esses was stressed, and the intellectual property rights were established from the outset. ANL also empha-sized the regional nature of the consortium, encouraging the participation of Midwest institutions to inves-tigate for Midwest agribusiness. Aside from these initial guidelines, the original organization of the consortiumremained informal until recently, when it sought incorporation as a 501(c)(3) (tax-exempt) corporation anddeveloped a more formal budget process.

Government interest in the consortium comes from those agencies involved in the genome discussion—DOE, NSF, and NIH—with the addition of USDA. A secretariat operates the consortium, determining policyand procedure with informal involvement of government officials. More formal arrangements may be pos-sible in the future. An executive board of corporate and university officers oversees the technical andadministrative operations. A number of research topic subgroups (e.g., plant growth, pesticides-herbicides)also exists, and the primary interaction for technology transfer occurs at this level.

The consortium solves the problem of research direction by a two-tiered system: The industrial part-ners first select proposals on the basis of commercial potential, and then a peer review system selects onthe basis of technical merit. The consortium expects the Federal Government (with some State funds) tosupport research through the initial stages, that is, until the industrial partners can see around the develop-ment corner to a commercial application. Research proposals developed as part of the consortium willbe subjected to normal competitive grant review at the Federal agencies. Industry would then fund thefinal steps.

Organizationally, the Midwest Plant Biotechnology Consortium offers a number of useful parallels toa genome project. The Federal agencies overlap similarly, as does DOE’s attempt to link the research pro-gram to related research at the national laboratories. Intellectual property rights are sensitive in both projects;the consortium provided from the outset that each research participant would retain rights according toinstitutional policy and that the industrial participants would have the right to first disclosure, The consor-tium retains the integrity of the peer review system while allowing industry to set some of its own researchpriorities based on commercial potential. The parallels break down where some of the short-term commer-cial interest in a genome project focuses on automated tools and machinery in addition to the results ofbiotechnical manipulation. Funding for the consortium is also considerably less than what is expected tobe necessary for a genome project.SOURCE” Office of Technology Assessment, 1987

Page 124: Mapping Our Genes - Genome Projects: How Big? How Fast?

123

If consortia related to one or more genomeprojects are formed, several issues will have tobe resolved. First, terms of participation must en-able a broad spectrum of private firms to partici-pate. Small firms with limited resources have haddifficulty, for example, in paying entry fees to somebiotechnology consortia (22). And nonprofit organ-izations, which must make their information avail-able on a nondiscriminatory basis under U.S. taxlaws, might have difficulty in participating if thereare preferential terms for industrial partners.

Discussion

None of the four workable administrativestructures—the lead agency approach, which re-quires a choice between DOE and NIH; the coop-erative approach, which requires no new legis-lation; the task force, which creates a formalauthority; or the consortium, which adds a doseof private sector assistance—is static. Administra-tive forms may overlap: For example, the consor-tium may require a lead agency, or the coopera-tive effort may create consortia or task forces toattain specific objectives. The administrative struc-ture at the national level does require explicitchoices, however. Congressional action will varyaccording to the option chosen. Interagency agree-ment and consultation would require no new leg-islation, only oversight. Designation of a leadagency, establishment of a task force, or creationof a single national consortium would require newlegislation.

Administration of genome projects will requiremonitoring of some central services and facilities,some services and functions performed at centers,and many grants to small groups. This raises sev-eral concerns about communication among agen-cies and among the scientists whose work theysupport. The diffusion of research among a largenumber of groups complicates communication,but it permits the most flexible organization ofresearch; the investigator may be as focused orinterdisciplinary as the research demands. A re-duction in the number of groups reduces the dif-ficulty of communication but limits the numberof people trying wholly new approaches to thescientific or technical objective. The pace of in-novation may be directly proportional to the num-ber of groups: A commitment to a single centeror institute might fix the relevant technologyprematurely. When innovation is less importantthan production, then specialized facilities are log-ical because they simplify the organization ofwork. Problems of communication for centrallyadministered projects are of a different variety.Often the most difficult problem is ensuring thatservices are appropriate and tailored to the needsof those using them. Different genome projectswill have different modes of communication. Proj-ects that rely on many small groups will need com-munication networks or frequent meetings of sci-entists; central services will require feedback fromuser communities.

ADVISORY STRUCTURE

Second to the administrative structure in or-ganizational hierarchy, though not in importance,is the structure of an appropriate advisory bodyor bodies. Agencies supporting genome projectswill benefit from tapping the academic and indus-trial sectors for the requisite expert wisdom. Sim-ilarly, academia and industry wish to ensure theirinput into the decision-making process and to ex-ercise some control over the research that affectstheir livelihood. The responsibilities, composition,structure, and funding of advisory groups thenbecome issues.

Responsibilities

The primary responsibility of the independentadvisory board (or boards) would be to follow theresearch plan and budget envisioned by the agen-cies, task force, or consortium and to make rec-ommendations where appropriate. Such recom-mendations might include identification ofpromising research initiatives in need of fundingor oversight of standards necessary to ensure qual-ity control. The board could be granted budgetauthority to enact these recommendations, or its

Page 125: Mapping Our Genes - Genome Projects: How Big? How Fast?

124

role could be strictly advisory. Consideration ofbroad overarching issues–such as the ethical im-plications of using some newly developed tech-nologies or the economic benefits of targeted tech-nology development-could also be a function ofthe board.

The advisory board would naturally have areporting duty: to the participating agencies, toCongress, to the public, and perhaps to the inter-national community of scientists. The advisoryboard would be an organ of communicationamong the agencies, supplementing their infor-mal direct contact. Congress would probably wantto be kept abreast of research progress and couldrequire periodic assessments in order to plan ge-nome projects and other research initiatives. An-nual or biannual reporting to Congress on prog-ress and the distribution of funds could be fit intothe budget process, for this will be one way inwhich genome projects are held accountable tothe taxpayers. The executive branch could be keptup to date by the advisory board or through theOffice of Science and Technology Policy. An advi-sory board not composed entirely of Federalofficers would fall under the Federal AdvisoryCommittee Act (Public Law 92-463). Pursuant tothe act, the advisory board’s meetings and papersmust be open to the public. The advisory boardcould also be the contact for international com-munication.

Composition

An advisory board would require members withvaried backgrounds. Scientists with experiencein the planning of mapping and sequencing workwould be needed for technical advice. Scientistswith database expertise would also be required,as the storage and dissemination of the project’sinformation is as central as the generation of it.Scientists could be chosen from universities, in-dustry, and federally supported laboratories.Choosing the board involves the same issues asthe consortium decision: how much influence de-velopment- and profit-minded industry expertsshould have on the project. One suggestion, froman industrial association, is to set up an advisoryboard with 50 percent university, 30 percent gov-ernment, and 20 percent industry representatives

(9). This would in fact be an extension of currentpractice, as university and industry representa-tives often work together productively. The selec-tion of scientists from abroad to serve on the advi-sory board, perhaps as nonvoting members, wouldhelp it assume an international role.

Since the project’s impact would extend into gen-eral science policy, economic competitiveness,medical care delivery, and the like, experts fromsuch fields might be included. The board mightwant, for example, to ensure that other areas ofbiomedical research do not suffer from a drainof funds or personnel, and policy experts andeconomists would be helpful in this. Lawyersmight be necessary to address questions of intel-lectual property. Ethicists might be included tohelp the board address such issues as confiden-tiality of data on research subjects or whetherto investigate the chromosome containing diseasegene A before that containing disease gene B. Rep-resentatives of interested private philanthropies,particularly those supporting research in humangenetics, might also be included, An advisoryboard would logically include at least a represent-ative of the Howard Hughes Medical Institute, asit funds a substantial portion of genome projects.

Structure and Funding

Scientists and nonscientists could serve togetheron a single advisory body or on separate bodies.The choice will influence the method of researchplanning and science policy formation: In a sin-gle body, the procedure is multifaceted butessentially unitary; in separate bodies, the proce-dure is separated into scientific and policy com-ponents. Another possible division of advisorswould be government representatives on onepanel and private representatives, from academia,industry, and other backgrounds, on another.

Appointments to a policy board could be madeby the President, with the advice and consent ofthe Senate. The choice of members could be as-signed to a nongovernmental body, such as theNational Academy of Sciences, to ensure theboard’s independence and its technical compe-tence. As an alternative, the task of selection couldbe delegated to the Office of TechnologyAssessment.

Page 126: Mapping Our Genes - Genome Projects: How Big? How Fast?

125

BIG SCIENCE V. SMALL-GROUP SCIENCE

The likelihood that Big Science will invademolecular biology has often been cited in opposi-tion to a concerted government program of ge-nome projects. Small science is largely conceivedand executed by a principal investigator direct-ing a small laboratory group funded by a grant.Big Science can refer to many things. It can meanlarge and expensive facilities. It can refer to large,multidisciplinary team efforts that entail cooper-ative planning and therefore require individualscientists to sacrifice some freedom in choosinggoals and methods. Or it can refer to bureaucraticcentral management by government administra-tors. These different meanings have been inter-mingled in the emotionally charged debate aboutgenome projects. (For further insight into that de-bate, see box 6-C.)

Three lines of argument have been made againstconducting molecular biology research on one ofthese Big Science models: style, efficiency, and po-litical interference.

Displacement of Higher-PriorityScience

Some scientists worry that a major Federal pro-gram to map the human genome and sequencea significant portion of it would detract from theconduct of more important science (2)3)20). Theargument is that special appropriations for hu-man genome projects could well go to projectsthat do not present the most immediate obstaclesto scientific progress and might supplant fundsthat would be allocated differently by the peerreview processes of scientific agencies. If genomeprojects were not of the same scientific caliberas projects in other areas of science, agencieswould nonetheless be precluded from reassign-ing those funds.

Other scientists argue that some genome proj-ects do not lend themselves easily to current re-view procedures and merit a special effort (7,10,19,25,30). Genome projects will involve not onlyscience, they say, but also technology developmentand production. Some aver that existing peer re-view committees give short shrift to projects in-tended to develop methodology (as opposed to

answering a scientific question) and tend to under-fund shared research resources. They believe thatthe value of genome projects warrants a specialeffort, including new peer review committees andincreased resources for a research infrastructure.

A related issue concerns the details of fundingmechanisms. Those who believe strongly in thesuperiority of investigator-initiated small-groupresearch urge caution in supporting large projectsthat are administered by institutions rather thanindividuals. The agencies most directly involved—namely, NIH and DOE—are adopting policies thatanswer both arguments by promising to use a sys-tem of peer review that gives the scientific com-munity substantial power to direct genome proj-ects but that differs from current peer review byadding new review groups to focus on compo-nent genome projects.

Style and Efficiency

Some scientists have objected to a Big Scienceapproach to genome projects because it goesagainst the tradition of science as a cottage in-dustry conducted by small, largely autonomousgroups. The underlying assumption is that Big Sci-ence management would undercut the motivationand circumscribe the freedom of investigators bymaking them beholden to administrators in a sci-entific bureaucracy. Yet team effort is likely tobe cheaper and faster in the long run for genomeprojects that focus on developing instruments orproducing maps. It would be unwise and waste-ful to shun all projects that do not conform tothe small-group mode. One science administra-tor advised scientists that:

. . . insofar as what they do is part of the waragainst human suffering, their desires andtastes are not all that matter. Biomedical sci-ence is not done, or, more important, is notsupported by the public, simply because itgives intense satisfaction to the dedicated andsuccessful biomedical researcher (32).

Large and expensive projects must meet cer-tain criteria, otherwise they could indeed supplantother research. They must meet needs that can-not be met by small-group research (e.g., produc-

Page 127: Mapping Our Genes - Genome Projects: How Big? How Fast?

126

Box 6-C.—Quotes on Genome ControversiesProposals for genoxne projects, particularly sequencing the human genome, have provoked consider-

able controversy among luminaries in molecular biology and related disciplines. The following quotationsillustrate the liveliness of the debate over the past 2 years.

“Sequencing the human genome is like pursuing the holy grail.” Walter Gilbert, Harvard University,at several national meetings, March 1986 to August 1987.

“[sequencing the genome now] is like Lewis and Clark going to the Pacific one millimeter at a time.If they had done that, they would still be looking.” David Botstein, Whitehead Institute, Cold Spring HarborSymposium on the Molecular Biology of Homo sapiens, June 1986.

“Humans deserve a genetic linkage map. It is part of the description of Homo sapiens .“ Raymond White,Howard Hughes Medical Institute, University of Utah, in Science 233:158, 1986.

“The idea is gaining momentum. I shiver at the thought.” David Baltimore, Director, Whitehead Insti-tute, in Science 232:1600, 1986.

“Of course we are interested in having the sequence, but the important question is the route we taketo getting it. ” Maxine Singer, Director, Carnegie Institution of Washington, in Science 232:1600, 1986.

“Sequencing the human genome would be about as useful as translating the complete works ofShakespeare into cuneiform, but not quite as feasible or as easy to interpret. ” James Walsh, Universityof Arizona, and Jon Marks, University of California, Davis, in Nature 322:590, 1986.

“I believe such a conclusion [against special efforts to sequence the human genomel represents a failureof vision, an unwarranted fear of (not very) ‘big’ science .“ Robert Sinsheimer, University of California, SantaCruz, in Science 233:1246, 1986.

“My plea is simply that we think about this project in light of what we already know about eukaryoticgenetics and not set in motion a scientifically ill-advised Juggernaut.” Joseph Gall, Carnegie Institution ofWashington, in Science 233:1368, 1986.

“Too bad that it needs such fancy wrappings to attract public attention for an obvious good. ” JoshuaLederberg, “The Gift Wrapped Gene, “ in The Scientist, Nov. 17, 1986, p. 12.

“The sequence will give us a new window into human biology.” Renatto Dulbecco, Salk Institute, inter-view with OTA staff member, January 1987.

“Of course, if you have the clones, you’re going to want to sequence them. The question is which onesto do first. I think it is scientifically arrogant to prejudge what will be important and what will not.” PaulBerg, Stanford University, interview with OTA staff member, January 1987.

“I’m surprised consenting adults have been caught in public talking about it [sequencing the genome] . . . itmakes no sense.” Robert Weinberg, Whitehead Institute, in The New Scientist, Mar. 5, 1987, p. 35.

“The sequence of the human genome would be perhaps the most powerful tool ever developed to ex-plore the mysteries of human development and disease.” Leroy Hood and Lloyd Smith, California Instituteof Technology, in Issues in Science and Technology 3:37, 1987.

“The main reason that research in other species is so strongly supported by Congress is its applicabilityto human beings. Therefore, the obvious answer as to whether the human genome should be sequencedis ‘Yes. Why do you ask?’ “ Daniel Koshland, Editor, Science 236:505, 1987.

“The real problem that faces us is not the cost of the Human Genome Program, but how to get it going,seeing both that the right people are in charge and that they work under an administrative umbrella thatwill not tolerate uncritical thinking and so will never promise more than the facts warrant.” James D. Wat-son, Director’s Report, Cold Spring Harbor Laboratories, September 1987.

“We will see a new dawn of understanding about evolution and human origins, and totally new ap-proaches to old scientific questions.” Allan Wilson, University of California, Berkeley, at a symposium forthe director, National Institutes of Health, Nov. 3, 1987.

Page 128: Mapping Our Genes - Genome Projects: How Big? How Fast?

127

tion, service, or targeted technology development).They must not merely be useful, but fill criticalresource gaps as well (4). These criteria are likelyto be met by many databases, repositories, andmapping projects. They have not yet been met byproposals to sequence the entire human genome.

Some argue that, while it may appear that cer-tain projects are best conducted by large, multi-disciplinary teams, in the long run science pro-gresses faster if large, targeted projects are notbegun (20). That is, small-group science is so muchmore productive in the long run that attempts todirect science will inevitably go astray.

Similar debates preceded the approval of costlyprojects in other fields. Construction of cyclotronsand other particle accelerators was resisted bymany physicists in the 1930s IHeilbron and Kevles,see app. A], and space-based instruments wereopposed by many astronomers in the 1960s (26).Yet these facilities permitted scientific advancesthat would otherwise have been impossible, andthey were (and are) most often used by small re-search groups. The issue is not that expensive fa-cilities should not be built, but that they shouldaddress critical needs and be carefully planned.

Politicization

One way in which concerted projects are be-lieved to drift into inefficiency is through politi-cal interference. This can be on a small scale (hag-gling that impedes progress among members ofa research team) or a large scale (e.g., pork barrelscience at the national level), One scientist has ob-served, “a megaproject like sequencing the humangenome is certain to increase the political controlover scientific decisionmaking” (3), and the Amer-ican Society for Biochemistry and Molecular Biol-ogy warns against ‘(the establishment of one ora few large centers that are designed to mapand/or sequence the human genome” (1). Largeresearch institutions can drift once their missionshave been accomplished, and it can be difficultto close down unproductive efforts (32).

Molecular biology has been remarkably produc-tive for three decades without the managementstyle of Big Science. In the recent inventory of275 Big Science facilities compiled by the House

Committee on Science and Technology, none wasbiological (29). Yet some human genome projects,for example developing new instruments or pool-ing results from many different groups, will re-quire multidisciplinary teams concentrating on atechnical problem. This situation is analogous inmany ways to the situations faced earlier by othersciences in their transition to Big Science IHeil-bron and Kevles, see app. A] (32). It is difficultto imagine, for example, automating the steps incloning DNA, sequencing it, or mapping it with-out combining optics, chemistry, physics, engi-neering, and electronics. If the end products ofgenome projects—materials and information—areto be reliable and used internationally, there mustbe quality control and standardization.

Clearly, some important functions require cen-tral coordination or multidisciplinary team re-search, although not necessarily centralized ad-ministration; some tasks cannot be forced into themold of small-group science. Technological devel-opments will determine the pace and extent towhich Big Science becomes part of biological re-search. The question will be how to decide whichprojects merit special effort and which do not.Decisions of several types will be necessary in con-ducting genome projects. The advantages of de-centralized planning must be balanced against theneed for some centralized resources. The impor-tance of mapping, sequencing, and technology de-velopment must be compared to other researchand services. Such decisions will require an admin-istrative structure to make them.

Biomedical investigations are now, and in theforeseeable future will continue to be, conductedprimarily by small groups, although Big Sciencefacilities and services can amplify and complementthem. Small groups will remain the principalmeans of studying physiology and disease. Whennew institutions are created for elements of hu-man genome projects, special attention must bepaid to making results useful to small scientificgroups. It would be ironic if genome projectsstarved small-group research efforts in order tocreate new tools.

The costs of database, repository, and map proj-ects are not large relative to the costs of otherbiomedical research, so planned projects are un-

Page 129: Mapping Our Genes - Genome Projects: How Big? How Fast?

128

likely to have any measurable adverse impact on search faster and less costly. If genome projectsother research. Moreover, genome projects in- threaten the health of small-group biomedical re-tended to bolster the research infrastructure search, then genome projects should take a back seat.should free funds for new work by making re-

SUMMARY

The Howard Hughes Medical Institute recentlyissued a short report on efforts to map the hu-man genome; it observed:

The sooner the entire genome is mapped andsequenced once and for all, the sooner scientistscan get on with the real work of human biology:understanding what the genes do (2 I).

Databases and repositories must be centrallyadministered, although not necessarily centrallylocated, in order to be widely accessible. Tech-

CHAPTER 6

1. American Society for Biochemistry and MolecularBiology, policy statement, April 1987.

2. Ayala, F., “Two Frontiers of Human Biology: Whatthe Sequence Won’t Tell Us,” Zssues in Science andTechnology 3:51-56, 1987.

3. Baltimore, D., “Genome Sequencing: A Small-ScaleApproach)” Issues in Science and Technology 3:48-50, 1987.

4. Baltimore, D., Whitehead Institute, personal com-munication, December 1987.

5. Cahill, G., comments at Issues of Collaboration forHuman Genome Projects, OTA workshop, June 26,1987.

6. Dimancescu, D., and Botkin, J., The New Alliance:America’s R&D Consortia (Cambridge, MA: Bal-linger, 1986).

7. Gilbert, W., comments at Costs of Human GenomeProjects, OTA workshop, Aug. 7, 1987.

8. Gilbert W., ‘( Genome Sequencing: Creating a NewBiology for the Twenty-First Century,” Issues in Sci-ence and Technology 3:26-35, 1987.

9. Godown, R. D., testimony on Title II of S.1480 be-fore the Subcommittee on Energy Research andDevelopment, Senate Energy and Natural Re-sources Committee, Sept. 17, 1987.

10. Hood, L., and Smith, L., “Genome Sequencing: HowTo Proceed, ’’Zssues in Science and Technology 3:36-46, 1987.

11. Joyce, C., “The Race To Map the Human Genome,”New Scientist, Mar. 5, 1987, pp. 35-40.

nology will most likely determine whether andwhen large facilities and coordinated administra-tion are necessary to conduct genome projects.If large facilities prove to be more efficient, thiswill not necessarily be incompatible with researchby small groups; it could in fact enhance it. If,however, large facilities and centrally organizedresearch programs threaten the lifeblood of bio-medical research—investigator-initiated grants—then the projects should be reevaluated and, ifnecessary, cut back.

REFERENCES

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

Karny, G., comments at Issues of Collaboration forHuman Genome Projects, OTA workshop, June 26,1987,Kirschstein, R. L., letter in Issues in Science andTechnology, summer 1987, p. 5.Koshland, D. E., “Sequencing the Human Genome,”Science 236:505, 1987.Levinson, R., NIH, personal communication, June1987,Lewin, R., News and Comment, Science 235:1453,1987.Mitra, S., Oak Ridge National Laboratory, personalcommunication, March 1986.National Research Council, Mapping and Sequenc-ing the Human Genome (Washington, DC: NationalAcademy Press, 1988).Office of Technology Assessment interviews withR. Dulbecco, L. Smith, T. Friedmann, M. Bitensky,R. Davis, and P. Berg, January 1987; W. Gilbert,July 1987.Office of Technology Assessment interviews withA. Kornberg and S. Cohen, January 1987; D. Balti-more, July 1987.Pines, M., Mapping the Human Genome (Bethesda,MD: Howard Hughes Medical Institute, 1987).Raines, L., Industrial Biotechnology Association,personal communication, November 1987.Roberts, L., “Agencies Vie Over Human GenomeProject,” Science 237:486-487, 1987.Science and the Citizen, “Geneshot,” Scientific

Page 130: Mapping Our Genes - Genome Projects: How Big? How Fast?

129

American, May 1987, pp. 58-58a.25. Smith, L., and Hood, L., “Mapping and Sequencing

the Human Genome: How To Proceed, ” Bio/Tech -nology 5:933-939, 1987.

26. Smith, R., The Johns Hopkins University, personalcommunication, December 1987.

27. Subcommittee on the Human Genome, Health andEnvironmental Research Advisory Committee, Re-port on the Human Genome Initiative, preparedfor the Office of Health and Environmental Re-search, Office of Energy Research, Department ofEnergy (Germantown, MD: DOE, April 1987).

28. U.S. Congress, General Accounting Office, Univer-sity Funding: Information on the Role of Peer Re-view at NSF and NIH, GAO Pub. RCED-87-87 FS(washinwon, DC: U.S. Government printing office,

March 1987).29. U.S. Congress, House Committee on Science and

Technology, World Inventoruv of “Big Science” In-struments and Facilities (Washington, DC: U.S. Gov-ernment Printing Office, 1986), cited in Heilbronand Kevles, see app. A.

30. Watson, J. D., director’s report, Cold Spring Har-bor Laboratories, in press.

31. Watson, J. D., comments at Costs of Human GenomeProjects, OTA workshop, Aug. 7, 1976.

32. Weinberg, A. M., Reflections on Big Science (Cam-bridge, MA: MIT Press, 1967), esp. pp. 113-114.

33. zinder, N. D., report of the ad hoc committee re-viewing the Special Virus Cancer Program of theNational Cancer Institute, March 1974.

Page 131: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 7

International Efforts

Page 132: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTS

Page

Introduction . . . . . . . . . . . . . . . . , . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . , . . .133Japan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .........136

Mapping and Sequencing Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .136Potentials for Cooperation and Conflict With the United States . . . . . . . ..........138

Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ...,139European Organizations ..., . . . . . . . . . . . . . . . . . . . . ..., . . . . . . . . . . . . . . . . . . ..139National Research Efforts in Europe. . . . . . . . . . . . . . . . . . . . . . ..................143

Other International Efforts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........148Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...148Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......148Latin America . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149South Africa. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...........149The Union of Soviet Socialist Republics and Eastern Europe. . ..................149

International Collaboration and Cooperation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150Precedents for International Scientific Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150Options for International Organization of Genome Research. . . . . . . . . . . . . . . .....1.52Existing Collaborative Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155

Chapter 7 References , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159

BoxesBox Page7-A. The Venezuelan Pedigree Project . . . . . . . . . . . . . . . . . . . . . ...................1347-B. The Center for the Study of Human Polymorphism (CEPH): An International

Gene Mapping Center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........1467-C. The International Geophysical Year . . . . . . . . . . . . . . . . . . . . . . . . . . ...........1517-D. Views on International Cooperation and Collaboration in Genome Research . . . .1527-E. Large Centers v. Networking. . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .156

FiguresFigure Page?-I. Distribution of Publications in Human Gene Mapping and Sequencing . ........1337-2. Human Gene Mapping and Sequencing Articles Published Annually . ..........158

TableTable Page7-1. International Distribution of Human Genome Research . . . . . . . . . . . . . . . . . . . . . .133

Page 133: Mapping Our Genes - Genome Projects: How Big? How Fast?

Chapter 7

International Efforts

INTRODUCTION

The expected health benefits of genome proj-ects—and their commercial potential—have at-tracted international as well as national attention.The United States is the clear leader in basicresearch, publishing more articles on mappingand sequencing than European or Asian nations(see figure 7-1, table 7-l). U.S. companies have alsomarketed more instruments for DNA researchthan any others (see ch. 2). Productivity in basicand applied research does not, however, guaran-tee the United States the lead in developing orproducing commercial products and processes,nor does it ensure market competitiveness. Japanhas also encouraged the commercial developmentof technologies associated with the mapping andsequencing of DNA, Countries such as Switzerlandand West Germany are home base for multination-al pharmaceutical and chemical companies that arepoised to commercialize developing products.Some nations not supporting much basic genomeresearch at present have strong biotechnology orhigh-technology resources and policies and might

Figure 7-1 .—Distribution of Publications inHuman Gene Mapping and Sequencing

Compiled from a bibliometric analysis of literature on humangene mapping and sequencing conducted for the Office ofTechnology Assessment by Computer Horizons, Inc. [seeapps. A and E]. The differences between the annual percen-tages displayed and the total annual research (100°/0) can beattributed either to countries not included in the listing orto the absence of sufficient bibliographic information to de-termine the country or region from which the publicationoriginated.

SOURCE: Office of Technology Assessment, 1988

Table 7-1 .—International Distribution of Human Genome Research(percent of articles published annually on human gene maps or markers)

Year 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986United States. . . . . . . . . . . . . . . 45 ”/0 420/o 380/o 43% 420/o 40% 460/o 44 ”/0 420/o 43 ”/0Japan . . . . . . . . . . . . . . . . . . . . . 2 2 3 3 4 5 4 4 4 5Western Europe

Denmark . . . . . . . . . . . . . . . . . 1 3 1 1 2 1 1 1 1 1Federal Republic of

Germany. . . . . . . . . . . . . . . 5 4 2 4 6 5 5 5 5 5Finland . . . . . . . . . . . . . . . . . . <1 1 1 1 1 <1 <1 1 1 <1France . . . . . . . . . . . . . . . . . . 5 7 6 6 8 6 6 5 6 6Italy . . . . . . . . . . . . . . . . . . . . . 2 2 1 2 3 2 4 3 3 4Netherlands . . . . . . . . . . . . . . 4 5 3 3 2 2 2 2 3 2United Kingdom . . . . . . . . . . 8 9 9 6 9 10 9 10 11 10Other. . . . . . . . . . . . . . . . . . . . 7 4 7 4 6 5 6 5 4 5

Other non-European countriesAustralia. . . . . . . . . . . . . . . . . <1 1 2 2 2 2 2 2 1 2Canada . . . . . . . . . . . . . . . . . . 3 3 3 4 2 3 2 3 4 4Eastern Europe and

U. S. S. R.. . . . . . . . . . . . . . . . 5 3 3 5 5 4 3 4 4 3South Africa . . . . . . . . . . . . . 0 1 1 <1 <1 <1 <1 1 <1Other. . . . . . . . . . . . . . . . . . . . 5 4 : 4 4 5 3 3 5 3

SOURCE: Office of Technology Assessment, 1988, compiled from a literature search and bibliometric analysis conducted for the OTA by Computer Horizons, Inc Thekey words used in the search are described in app. E,

133

Page 134: Mapping Our Genes - Genome Projects: How Big? How Fast?

134

be well positioned to commercialize technologiesthat are developed for and spun off from humangenome research. ’ The OTA has found thatGovernment agencies in the United States are fur-ther along in developing policies for genomeprojects than are comparable agencies in othercountries, although a number of other countrieshave well-established basic research efforts inmapping and sequencing human and nonhumangenomes-efforts that could either complementor compete with U.S. efforts.

Gene mapping is perhaps the most common in-ternational research activity in human genetics,and it is likely to be an area to which many na-tions will contribute. Human genes are highlypolymorphic, and populations from differentregions exhibit considerable genetic variation.These regional differences will allow researchersto contribute to comparative studies, as well asto characterize and map genes of particularregional interest (e.g., the thalassemias in theMediterranean and Oudtshoorn skin disease inthe Afrikaner population in South Africa). Thestudy of DNA from diverse peoples will shed lighton the nature of polymorphisms and genetic dis-orders, even if it does not lead immediately to im-proved health care (8).

The large scope of genome projects invites in-ternational cooperation. Informal cooperation andcollaboration are already underway through a va -

IIt is not within the scope of this assessment to provide a detailedanalysis of biotechnological capabilities and industrial funding; suf-fice it to say that genome research is part of a much larger arenaof Federal, university, and industrial research and development.A forthcoming OTA assessment, New Developments in Bidechnd-ogy, 4: [ ~.S. Int’estment in Biotechnology (81), covers this in greatdetail for the United States. A previous OTA assessment, Commerc-ial Biotechnology: An International Ana[vsis (79), describes the stateof biotechnolo~v in Western Europe and Japan; the more recentDepartment of Commerce reports, Biotechnoiogv in Western Eur-ope (91 ) and Biotechnology in Japan (39), offer updated informationon international efforts.

riety of mechanisms. Formal collaboration couldspeed research and reduce the financial burdenon each country. Maintenance of internationaldatabases and repositories is particularly im-portant to provide timely access to informa-tion from research conducted around theworld. Many scientists encourage internationalcooperation in genome research, but any effortto conduct genome mapping and sequencing proj-ects on an international scale must be based ona realistic assessment of the capabilities and in-terests of the countries involved.

Countries that do not themselves carry out thekinds of research involved in mapping and se-quencing can play an important role by collect-ing genetic material from families for compara-tive studies. One such project, a collection ofgenetic material from a group of Venezuelan fam-ilies, was a key factor in the successful search forthe gene that causes Huntington’s disease (see box7-A). Similar pedigree collections are being estab-lished and maintained in Egypt and Denmark, aswell as in isolated populations in the United Statessuch as Mormon and Amish communities. Thesepedigrees provide valuable source material for thestudy of polymorphisms and genetic disease inhuman beings.

This chapter summarizes the state of DNA map-ping and sequencing research in Japan, WesternEurope, and elsewhere. Issues of international co-operation and competition and precedents for in-ternational cooperation in science are examined.Some organizational options for the internationalmanagement of genome projects are proposed,specifying areas in which cooperation might bestbe achieved and describing cooperative frame-works already in existence. Chapter 8 outlinesquestions about international technology trans-fer that might emerge in collaborative or cooper-ative situations.

BOX 7-A. —The Venezuelan Pedigree Project

In the small fishing villages that line the coast of years the residents of these villages were ostracized,Lake Maracaibo in Venezuela lives an unusual group considered to be chronically drunk. But in the earlyof families. If you walk into any of these villages, 1970s a doctor from a nearby military base real-you may be met by residents who do a characteris- ized that the dance was not due to alcoholism buttic dance down the streets—large, jerky motions, to Huntington’s disease, a rare, dominant geneticstaggering and weaving from side to side. For many disease that causes degeneration of nerve cells in

Page 135: Mapping Our Genes - Genome Projects: How Big? How Fast?

135

the brain. The onset of Huntington’s disease is gen-erally late: In those who carry the gene, symptomsbegin at age 35 or older. The disease leads to lossof control of the voluntary muscles, first causingtwitches and jerks, then dementia, and finally death.

A preliminary study describing the case historiesand pedigrees of approximately 100 patients fromLake Maracaibo families was presented at a meet-ing of the American Neurological Association in1972. It was an interesting case: an interrelated setof families, along whose pedigree could be tracedan extraordinarily high incidence of a genetic dis-ease that is rare in the general population. At thetime, however, no one knew what to do about it.The case remained an interesting anecdote in thememories of the researchers who attended themeeting.

One of those researchers was Nancy Wexler, aclinical psychologist. Wexler had both a professionaland a personal interest in Huntington ’s-her motherhad died of the disease, so she and her sister eachhave a 50:50 chance of developing it.

Wexler and her colleagues remembered the caseof the Venezuelan families 5 years later, when writ-ing a report on Huntington’s disease for a congres-sional commission. One of their recommendationswas to initiate a genetic study of the Venezuelanpedigree. Starting in 1979, the National Institutesof Health appointed Wexler to direct a program thatwould implement the recommendations and setaside funding for the Venezuelan genetic study. Thefirst team of researchers went to Lake Maracaiboin 1981 to collect blood samples from which to ex-tract DNA. At the same time, they compiled a care-

ful record of the pedigrees of the volunteers fromwhom the blood was extracted. Research teamshave gone every year since. The pedigree has grownto include over 7)000 family members; the diagramof it occupies a 100-foot-long section of a corridornear Wexler’s Columbia University office. DNA sam-ples have been collected from nearly 1,50() familymembers, some with Huntington’s and somewithout.

At the same time the genetic study got underway,advances in recombinant DNA technolo~v, specifi-cally the elaboration of techniques for findinggenetic markers using RFLPs (see ch. 2), increasedthe power of analytical methods that could be usedon the collected family materials. In 1982, JimGusella and others began to screen the DNA fromthe Venezuelan collection for genetic markerslinked to the gene for Huntington’s disease. Theytested DNA from normal and affected members ofthe Venezuelan families, comparing the differentpatterns cut by restriction enzymes on the samplesfrom different family members. The fact that thepedigree included large extended families wvas use-ful in locating informative markers. By 1983, theresearchers had figured out which chromosomecontains the Huntington’s gene and had identifieda linked marker, paving the way for a diagnostictest—an extraordinary breakthrough, says Wexler,in such a short time. The search for the actualgene is not yet over, however, since locating moreclosely linked markers has presented unforeseendifficulties.

A cure is not in sight for the families of LakeMaracaibo, but they have made an extremely im-portant contribution to the study of Huntington’s.

Photo credit: Nick Kelsh/Kelsh.Marr Studios

Huntington’s patient being rowed across LakeMaracaibo.

Photo credit: frank Micelotta/Time magazine

Nancy Wexler going over Huntington’s diseasep e d i g r e e s . -

Page 136: Mapping Our Genes - Genome Projects: How Big? How Fast?

136

Moreover, their genetic materials are a valuable re-source for other genetic studies, including searchesfor other disease genes, as well as for the develop-ment of genetic maps. Indeed, some of the DNA hasbeen contributed to the international mapping col-laboration coordinated by CEPH (see box 7-B). Wex-ler suggests that “the pedigree is a big geneticplayground–whatever idea you have, you couldprobably test it there.”

The Venezuelan pedigree project highlights an im-portant role that developing countries can play inhuman genome projects, even if they do not yet havethe capability to carry out human genome researchon their own. A similar collection of genetic mate-rials from patients with genetic diseases (primarilythe anemias and thalassemias) and their families wasstarted in Egypt in 1964 and has proceeded sincein collaboration with scientists from NIH and sev-eral universities---Oxford, London, Harvard, Colum-bia, New York University, and the University of Cali-

fornia, San Francisco. The scientists who managethis collection are eager to cooperate in internationalefforts to map and sequence the human genome.As Wexler points out:

In many cases, the countries are eager to col-laborate, but they don’t know what they haveto offer. The patient populations are a valuableresource. And once the working relationships areestablished between Third World countries withhealth problems and the high-tech labs in the de-veloped countries, the connections are there foradvice and assistance if those countries get to thepoint of starting their own labs.

Sf)[lRCES’J Guse[la, ‘(-fine, Mapping and Disorders of the Nervous System, lecture at Amer.

wan Association for the Advancement of Scwnce annual rmwtmg, Boston Feb Is1988

N Hashem, Ain Shams (Inii ersq Medmal Center, Cairo, E@Ipt, personal communl.cation, July 1987

S $Vexler, Columhla L’niverslty, personal communication october 1987

JAPAN

Japan’s efforts to develop automated DNA se-quencing technologies have been highly publicizedover the past year, causing concern that Japanwill capture the market for sequencing technol-ogy and that it will realize most of the potentialprofits from genome projects. Japan does not,however, have well-defined government policiesfor human genome mapping. Instead, funding formapping and sequencing research is under thejurisdiction of half-a-dozen government agenciesthat often compete for prestige rather than at-tempt to coordinate efforts.

Mapping and Sequencing Research

The general framework for science policy in Ja-pan is formulated by a small group of bureaucratsin the various agencies and by an inner cabinetgroup, the Council on Science and Technology,chaired by the prime minister. Programs for hu-man genome research have been divided amongthe Ministry of Education, Science, and Culture(MESC), the Science and Technology Agency (STA),and the Ministry of International Trade and Indus-try (MITI). The Ministry of Agriculture, Forestry,and Fisheries supports some research on nonhu-man genomes, notably a $500,000 feasibility studyon sequencing the entire genome of rice (77).

The Ministry of Education, Science,and Culture

Most mapping and sequencing research falls un-der the domain of MESC, the primary supporterof basic research in Japan. Like the National In-stitutes of Health in the United States, MESC sup-ports research projects selected by peer review;it provides grants and funds for universities anduniversity-based researchers and for several na-tional research institutes. In addition, the minis-try can encourage research in specific, targetedareas on the recommendation of its advisorycommittees.

The ministry does not yet have an official pol-icy regarding genome research but has appointedan advisory committee to study the situation.Members of the committee visited the UnitedStates in early 1988 to gather information on US.policies on human genome research and to ascer-tain what the US. expects of Japan. The commit-tee’s recommendations will be implemented be-ginning in fiscal year 1989 or 1990 (58).

Japan is often criticized for not doing enoughbasic research; many observers have questionedwhether Japanese scientists have enough exper-tise in basic molecular biology to support a major

Page 137: Mapping Our Genes - Genome Projects: How Big? How Fast?

137

gene mapping or sequencing effort [Yoshikawa,see app. A]. Bibliometric analysis [see app. E] indi-cates that while Japan’s research output in DNAmapping is far below that of the United States (fig-ure 7-I, table 7-1), its proportion of research rela-tive to other countries has consistently increasedover the last decade. Its share of publications onhuman gene mapping and sequencing rose from2 percent in 1977 to 5 percent in 1986, comparedto a U.S. share that varied from 40 to 46 percentduring those years. In addition, MESC supportedthe research of a scientist that led to the publica-tion of a complete genetic map of E. coli in theprestigious magazine Cell in 1987 (53). U.S. re-searchers published a map of E. coli at about thesame time, but the Japanese research was nota-ble for the speed with which it was done and forthe use of automated technologies.

The Science and Technology A g e n c y

STA supports mostly mission-oriented basic re-search. It has played a leading role in the devel-opment of automated sequencing technology.Since 1981, STA’s Special Coordination Fund forthe Promotion of Science and Technology has un-derwritten a program entitled Extraction, Analy-sis, and Synthesis of DNA, with a total fundingof $3.8 million (40). The project, led by AkiyoshiWada of the University of Tokyo, aims to “to re-duce the burden of time demanded of research-ers working on the analysis of DNA base sequencesby developing automatic machinery,” utilizing theknowledge and resources of companies with ex-pertise in electronics, robotics, computers, andmaterial science [Wada quoted in Yoshikawa, app.A]. The project scientists are adapting robotic tech-niques and mass production machines to automatethe time-consuming steps in the Maxam and Gil-bert sequencing process (see ch. 2) rather thandeveloping new processes. The project has re-sulted in a prototype of a microchemical robot,made by Seiko, but it is not yet on the market.The goal of the project has been to increase therate of DNA sequencing output in general, not tosequence the entire human genome. Wada hasrepeatedly emphasized the necessity for interna-tional cooperation in the project and would liketo develop a supersequencing center to operateas a service facility for scientific groups aroundthe world (84,87).

STA and a private foundation sponsored an in-ternational conference in Okayama in July 1987to discuss the state of DNA sequencing technol-ogies and possible strategies for genome sequenc-ing in the future; the conference gave no clearindication of the pace or direction of future STAefforts. Some scientists expressed doubts aboutthe STA project, noting that there has been nopublic discussion in Japan about whether or notto support Wada’s conception of the project andthat the project is not actively supported by manyother Japanese scientists [Yoshikawa, see app. A].Still, a quiet consensus has emerged that sequenc-ing technology should be developed regardless ofwhether a full-scale project to sequence the hu-man genome is launched.

Oversight of the project has now shifted fromthe Special Coordination Fund to STA’s Councilfor Aeronautics, Electronics, and Other AdvancedTechnologies (CAEOAT); a decision on the statusof future directions of the sequencing researchshould be made by spring 1988, The publicity andmomentum of the project are undoubtedly attrib-utable in part to the active role that ex-Prime Min-ister Nakasone played in advocating biotechnol-ogy and related projects [Yoshikawa, see app. A].Whether the momentum will continue now, af-ter Nakasone’s retirement, remains to be seen.

The Ministry of InternationalTrade and Industry and theHuman Frontiers Science Program

MITI coordinates applied research, linkinguniversity researchers with industry to encouragetechnology development and commercialization.It does not now play a major role in genome re-search, but its influence may increase if the Hu-man Frontiers Science Program is fully funded.A human genome sequencing project may becomea focal point for the program.

The Human Frontiers Science Program (HFSP)is a proposal for an international, cooperative pra-gram of research in basic biology and the devel-opment of related “key technologies. ” The pro-posal originated in 1985 in MITI’s Agency ofIndustrial Science and Technology (AIST). The pro-posal came about partly in response to interna-

Page 138: Mapping Our Genes - Genome Projects: How Big? How Fast?

138

tional criticism that Japan does little basic researchitself, but capitalizes on the research of others(4,23), and partly to emphasize international co-operation in the face of persistent foreign tradefrictions [Yoshikawa, see app. A]. The HFSP pro-posal met with a lukewarm reception during earlyoutings and international conferences, however,and Nakasone’s mention of it at the June 1987 Eco-nomic Summit meeting roused little enthusiasm[Yoshikawa, see app. A] (46,76).

If implemented, HFSP would probably enhanceJapan’s sequencing effort, since DNA sequencingtechnology has been identified as a key area fordevelopment. The program was granted an ini-tial budget of 197 million yen (approximately $1.5million) for fiscal year 1987, to conduct a feasibil-ity study, but the amount to be spent on develop-ment of sequencing technologies is not yet clear.Some observers speculate that the proposal willbe shelved now that Nakasone has retired. MITIofficials contend, however, that the program isstill viable (4,90). A December 1987 planning meet-ing again endorsed human genome sequencingas a focus for HFSP, but the Ministry of Financeprobably will not decide on the program’s bud-get until 1989 (75).

Commercial ization of Mapping andSequenc ing Technolog ies

potentially marketable technologies that are de-veloped for genome projects have been supportedby the several mechanisms through which the gov-ernment aids industrial research in technologydevelopment. STA’s Special Coordination Fund,established in 1981, provides incentives for basicresearch for new technologies in accordance withthe long-term goals for science and technologydevelopment set by its Policy Committee. STA’sResearch Development Corp. promotes commer-cial uses of government-developed technologiesthat might not be used otherwise. The prototypeof Seiko’s microchemical machine was developedwith assistance from the Special CoordinationFund, while the Research Development Corp. hassupported its commercialization. In addition,Hitachi, Fuji Photo Film, Toyo Soda, and MitsuiKnowledge Industries have all undertaken re-search into the automation of DNA sequencing,and some relevant products are being commer-

cialized. DNA extractors developed by Toyo Sodaare already on the market, as is a gel preparationby Fuji and autoradiograph readers by Seiko andHitachi.

Potentials for Cooperation andConflict With the United States

Many Japanese scientists are willing to cooper-ate in an international genome sequencing project,but collaboration will clearly be accompanied byeconomic tensions and competitive posturing bothby the United States and by Japan.

The development of similar automated technol-ogies by U.S. and Japanese companies may posedifficult trade issues. The Japanese concentrationon sequencing hardware has drawn criticism fromAmerican companies, which fear that the Japa-nese could take the lead in developing technol-ogies for the analysis of DNA (89). At present, how-ever, U.S. manufacturers are clearly ahead in thedevelopment and manufacture of equipment formanipulating and analyzing DNA (see ch. 2). Jap-anese companies are not as far along in market-ing relevant products as is often reported—whilethe Seiko machine has been touted in the West-ern press, few scientists in Japan have even heardof it (40). In addition, the machine’s economy hasbeen overrated: One frequently quoted estimatefor the sequencing systems is $0.17 per base pair,with a target of $0.01 or less, but Wada himselfstates that the system is still far from reachingeven the $0.17 goal (86), (The present cost of se-quencing is approximately $1.00 per base pair.)Finally, despite the customary preference of Jap-anese officials for buying Japanese machines, offi-cials of U.S.-based Applied Biosystems, Inc. (ABI,Foster City, CA) in Japan have reported no diffi-culty in marketing their DNA sequencing machinesand other instruments used by molecular biolo-gists [Yoshikawa, see app. A]. To date, Japan is thelargest market for ABI’s sequencing machine (47).

One frequently voiced fear is that Japanese com-panies are focusing on automating parts of thesequencing process that companies in the UnitedStates have not yet automated (although severalU.S. firms have begun development). Thus far,however, the STA-sponsored technology develop-ment effort is based on automating machines that

Page 139: Mapping Our Genes - Genome Projects: How Big? How Fast?

7JY

use conventional methodology rather than devel-oping or using new molecular biology techniques.Scientists at some U.S. companies have commentedthat it may have been a mistake for Japan to in-vest so much in automating existing methodolo-gies when there are new technologies emergingthat may make the old methods obsolete.

Databases, which are generally considered use-ful and politically straightforward areas for co-operation on genome projects, present knottyproblems of ownership of information. Despitesupport within the scientific community, the de-velopment of shared databases—even withinJapan—is problematic. The Japanese Governmenthas recognized that Japanese databases and re-positories are insufficient to handle even its ownresearch and development, and it is trying to estab-lish the database infrastructure necessary for a

While Japan is often viewed as a prime com-petitor, many European countries have strongerresearch traditions in molecular genetics and thedeveloprnent of related technologies. There arenotable genome mapping and sequencing activi-ties in France, Italy, and the United Kingdom, andsignificant research in gene mapping and tech-nology development in Denmark, the FederalRepublic of Germany, and others. In addition, sev-eral supranational organizations in Europe havedeveloped targeted programs to encourage bio-technology development; human genome projectscan be and are being included. The following sec-tions describe research activities underway in theEuropean community as a whole and in selectedcountries, in alphabetical order.Z

‘The information presented in the sections on selected countriesis based on several sources. The OT.A contracted a report on re-search efforts in key countries in Western Europe [Newmark, seeapp. Al. Some information was gleaned from scientific journals andinternational news sources. In late 1986 and throughout 1987, OTAconducted an informal suri’ey of international efforts, contactingembassy officials, science attaches, and scientists from numerouscountries to request information about the types and funding Ietelsof genome mapping and sequencing research undertaken in thosecountries and asking whether any specific policies governed genomeresearch. The information gathered from this effort varied consider-ably in focus, depth, and detail. The countries represented here—other than those with targeted or particularly well known researchprograms–are thus self-selected and self-reported. ‘1’he result is adescripti~’e account rather than a cornprehensi~e analvsis.

European Organizations

Over the past two decades, many European na-tions have supported scientific collaboration inprinciple, but in practice funding has been a per-sistent problem:

Most European governments have become in-creasingly reluctant to invest large sums of pub-lic money in domestic and citilian R&D, and thisis reflected at the European level. . . As domes -tic science budgets in Europe have become hard-pressed for cash, governments are asking whetherthey are getting value for money from interna-tional projects. Scientists in some fields have alsocome to view such projects as unwelcome com-petitors for their domestic research budgets (29).

Nonetheless, several existing organizations in Eur-ope either support genome research now or coulddo so in the future.

The European Economic Community

The founding treaties that established the in-stitutions of the European Economic Community(EEC) made little explicit provision for researchand development beyond that needed for Eura -tom (which dealt with nuclear energy, includingradiation biology), the Coal and Steel Community)and some coordination of agricultural research

Page 140: Mapping Our Genes - Genome Projects: How Big? How Fast?

140

under the Treaty of Rome founding the EEC. InJanuary 1974, the Council of Ministers agreed onthe general need for an EEC research and devel-opment policy, and in the mid-1970s, the EEC’Sadvisory commission began proposing programs,including a program of research and training inselected areas of genetics and enzymology (bim-olecular engineering). It was not until 1981 thatthis proposal was approved, since Article 235 ofthe Treaty of Rome specifies that such programscan only be adopted by unanimous agreement ofall member states (11).

Support of research and technological develop-ment has been enhanced by the adoption of theSingle European Act, which took effect on July 1,1987. This act modifies and extends the Treatyof Rome by adding provisions for precompetitiveresearch to strengthen “the scientific and tech-nology basis of European industry and to encou-rage it to become more competitive at the inter-national level” (19). Once a multiyear frameworkprogram is unanimously agreed on by memberstates, the individual research and developmentprograms within its agreed areas and financiallimits can be approved by a qualified majority(member state votes are weighted roughly by size).The current framework program, an initiative tohelp create collaboration in targeted areas in sci-ence and technology, was adopted on September28, 1987, and runs until 1991, with a global limitof 5.396 million ECUs (European currency units,which in recent years have had approximately thesame value as the U.S. dollar). Framework pro-grams must be proposed by the commission andapproved by the governing Council of Ministersand the European Parliament (11).

Most relevant to genome research is a seriesof research programs in biotechnology: the Bim-olecular Engineering Programme, 1981-85; theBiotechnology Action Programme (BAP), 1985-89;and Biotechnology Research and Innovation forDevelopment and Growth in Europe (BRIDGE),1990-93, A Concertation Unit for Biotechnologyin Europe was established in 1984 to coordinatethe various activities in biotechnology [Newmark,see app. A]. These programs have been designedto complement national research programs whilepromoting the development of European biotech-nology (83).

The budget for BAP has been substantially re-duced from the original proposal; as of spring1987, it appeared that approximately $300 mil-lion of the proposed $6 billion budget would beearmarked for biotechnology research, withanother $100 million for health, including somefunds for human genome mapping and sequenc-ing work, under the heading of “predictive medi-cine” [Newmark, see app. A]. “Within the biotech-nology program(s), active consideration is beinggiven to mapping and sequencing technology, andin particular with respect to the genome of yeast,”although “given the range of topics within the cur-rent biotechnology program, it would be surpris-ing if genome work gained more than a small frac-tion of the total” (11). However, “Communityresearch expenditures have a catalytic role thatmobilizes other funds, and a political significancethat enhances the coherence and consequent ef-fectiveness with which national funds are de-ployed” (11). BAP encourages proposals that in-clude at least one industrial partner in the researcheffort or that provide specific evidence of inter-est on the part of industry.

When BAP expires, it will be replaced byBRIDGE, which is likely to place even more em-phasis on industrial participation. While not yetfinalized, BRIDGE is likely to include a project tosequence the genome of yeast, which is more fea-sible than sequencing the human genome [New-mark, see app, A]. The tentative plan is to under-take a 2-year pilot project in which perhaps 15laboratories will concentrate on sequencing oneyeast chromosome; eventually, a large number ofEuropean yeast laboratories would be involved.The pilot project might be launched under BAP,but the full project would be part of BRIDGE andis provisionally estimated to cost $50 million. Theproject would also try to create a market for se-quencing equipment [Newmark, see app, A]. Re-search on the project will begin soon at some par-ticipating laboratories in the United Kingdom[Mount, see app. Al.

A subprogram of BAP, Contextual Measures forR&D in Biotechnology, aims to enhance EEC ca-pabilities in bio-informatics (the use of computersand information science in biology), data capturetechniques (including advanced instrumentationand automated reading), data banks, computer

Page 141: Mapping Our Genes - Genome Projects: How Big? How Fast?

141

modeling, computer software, and the “collectionof biotic materials” (repositories), along with the“development of information and communicationtechniques for enhancing the quality and useful-ness of such collections” and “the developmentof techniques for the identification, characteri-zation, conservation, and resuscitation of the ma-terials held in such collections” (20). Developmentof a biotechnology infrastructure has obvious po-tential for researchers in human genetics.

Another EEC activity that aids genome researchis the Task Force for Biotechnology Information.Created in 1982, the task force has produced dis-cussion papers and has provided small sums ofmoney, totaling $200,000, to support databases(including a contribution to software developmentat the database of nucleotide sequences run bythe EMBL, discussed below and in app. D), andthe launching of the European branch of theCODATA Hybridoma Databank, centered at theAmerican Type Culture Collection in Rockville,Maryland. The task force work plan for 1987-90maintains support for databases, communications,and computational research. The commission ofthe EEC also supported a series of workshops andstudies (1984-86) investigating the interface be-tween biotechnology and information technologyin a planning exercise known as Bioinformatics:Collaborative European Programs and Strategy(BICEPS), which “aims to formulate a mid- to long-term strategy for Europe in bio- and medical in-formatics” and “overall, to improve the Europeancompetitive position in the rapidly developingworld market for these technologies and applica-tions” (18). Documents for BICEPS refer to the in-formatics requirements of human genome se-quencing and have contributed to plans forbio-informatics in BAP and BRIDGE and to a pro-posal for a program of Advanced Informatics inMedicine (17). The proposed pilot phase, 1988-90,at 25 million ECUS ) was presented by the com-mission to the European Parliament and the Coun-cil of Ministers in September 1987. It includesplans for the development of advanced sequenc-ing instruments and related computational facil-ities required in genome and other areas of bio-chemical and protein engineering research, TheEuropean chemical industry trade association hasendorsed some of the BICEPS proposals and has

indicated a willingness to help support an infra-structure such as sequence databases (11,21).

Apart from biotechnology programs, EEC fundsresearch and development in health. The com-mission’s original proposals for the frameworkprogram envisaged a Program of Predictive Medi-cine and Novel Therapy, which would seek “de-velopment of predictive medicine and novel ther-apy oriented towards better knowledge of thehuman genome, and genetic engineering proc-esses aiming at the repair of DNA defects (e.g.,in congenital diseases of genetic origin)” (11). Theprogram was designed to support research in fourareas from 1987 through 1991: study of the hu-man genome (including mapping the genome asan aid in the diagnosis and prevention of geneticdisease), nucleic acid probes, genetic therapy, andmonoclinal antibodies. Funding for the program,originally proposed at $75 million, has been re-vised downwards to $25 million; both budget andcontent may be further revised before the pro-gram is approved.

The European Molecular BiologyOrganization

Funded by 17 European countries, EMBO servesprimarily to strengthen the training of Europeanmolecular biologists. It supports fellowships, work-shops and training courses, occasional scientificmeetings, and a journal, but it does not directlysupport research. EMBO sponsored a meeting ofEuropeans with an interest in human genome re-search in spring 1987. Few of the scientists presentexpressed an interest in mounting a major Euro-pean mapping or sequencing project; instead, mostfavored informal cooperation between individuallaboratories. The group was pessimistic aboutwhether public funds could be found for a large-scale project and raised the possibility of seekingprivate funds [Newmark, see app, A].

The European MolecularBiology Laboratory

Located in Heidelberg, West Germany, EMBLis financed by contributions from 10 of the 17member nations of EMBO. It houses the adminis-trative offices of EMBO, but the organizations haveseparate budgets and purposes. EMBL’s staff of

Page 142: Mapping Our Genes - Genome Projects: How Big? How Fast?

142

about 250 scientists and technicians, drawn frommember nations and from West Germany, workon a scientific program proposed by its director-general, at present Lennart Philipson, and sub-ject to the approval of a council composed of rep-resentatives from contributing countries. The lab-oratory was founded with the notion thatmolecular biology would require facilities thatwould be too expensive for any national researchprogram to support. For the most part, however,research in molecular biology has not requiredlarge centralized facilities, and member nationshave tended to interact less with EMBL as theyhave become proficient at molecular biology intheir own laboratories (28). consequently) mem-bers have often been grudging in their support,which limits the projects that EMBL can under-take. EMBL’s annual budget is approximately 45million deutschmarks (about $26.5 million), 25 to30 percent of which is paid by West Germany (W.

EMBL sponsors research in instrumentation,biocomputing, and gene mapping and sequenc-ing as well as other areas of biology. EMBL’s re-searchers have been active in technology devel-opment for mapping and sequencing and haveproduced prototypes of machines for automat-ing some of the steps in DNA sequencing (see ch. 2).

EMBL also operates the major European data-base of nucleotide sequences, which works in co-operation with GenBank” to gather and dissemi-nate sequence data. For EMBL to undertake amajor human genome project would require a con-siderable increase in budget —unlikely under cur-rent circumstances—and sustained enthusiasmfrom its members [Newmark, see app. A]. Director-General Philipson is eager to promote collabora-tion on a genome sequencing project, which hebelieves will increase the need for a centralizedEuropean data-handling facility. In the 1986 di-rector’s report, Philipson encouraged the estab-lishment of new support programs for a humangenome project:

If the American plan to launch a programmeon the human genome materializes, the EMBLmay be a natural collaborative partner in thisproject. It might, therefore, be worthwhile to planfor at least one new Programme in one of thosefields to be initiated in Heidelberg at the end ofthe proposed Scientific Programme (1990). To fa -

cilitate recruitment and the launching of this Pro-gramme, plans should be available by 1990 but wedo not foresee any cost during the next 4 years (36).

The European Science Foundation

Headquartered in Strasbourg, France, the ESFis subscribed to by 49 research councils andequivalent bodies from 18 European countries (33).It supports projects on a special funding basis froma small central fund; in the past, the ESF has notsponsored much research in biology, although re-cently it has supported some protein engineer-ing work. One of the foundation’s standing com-mittees, the European Medical Research Council,enables the heads of national medical researchbodies to meet once a year. The council has nobudget, however, and little influence outside theESF. At its 1987 meeting, the council decided notto attempt to coordinate European research onhuman genome mapping and sequencing [New-mark, see app. A].

The European Research Coordinat ionAgency

A French-initiated response to the U.S. Strate-gic Defense Initiative, EUREKA was set up in 1985to encourage development of advanced technol-ogies in Western Europe. Participating in EUREKAare the 18 democracies of Western Europe: the12 member states of the EEC (Belgium, Denmark,France, the Federal Republic of Germany, Greece,Ireland, Italy, Luxembourg, The Netherlands, Por-tugal, Spain, and the United Kingdom); the 5 mem-ber states of the European Free Trade Associa-tion (Austria, Finland, Norway, Sweden, andSwitzerland); and Iceland.

EUREKA promotes industry-led technologicalcollaboration among its members in several areas,including biotechnology and advanced informa-tion technology. It supplements EEC’S efforts byfunding research beyond the precompetitivestage. A EUREKA project must involve at least twoindustrial laboratories in two different Europeancountries. Governments vary in their financial sup-port of EUREKA projects: Some offer little morethan token support and assistance in administer-ing an international collaboration; others, suchas France, pay up to 50 percent of a EUREKAproject. Coordinated by a small secretariat in Brus -

Page 143: Mapping Our Genes - Genome Projects: How Big? How Fast?

143

sels, EUREKA’s performance has impressed manyobservers. Still, maintaining consistent funding isdifficult, since most of the governments support-ing EUREKA have not created procedures forfunding the program (34), There are no EUREKAprojects for human genome mapping and sequenc-ing yet, but the program might be used to linkFrench researchers to industrial partners in Eur-ope, particularly in the development of sequenc-ing technologies [Newmark, see app. A].

National Research Efforts in Europe

Denmark

The National Health Authority, the primaryfunding agency for biomedical research, supportssome gene mapping studies, although there is atpresent no centralized effort. Other funds for genemapping and sequencing come from general al-lotments to universities and research institutes,from the government, and from research coun-cils, notably the Danish Research Council. Spe-cial projects can be funded by applying to theappropriate research council. The Institute ofMedical Genetics of the University of Copenha-gen is the most prominent Danish effort in thefield. It has the longest tradition and the greatestinterest in gene mapping; sequencing is not yeta major concern, although it may be in the fu-ture. A University of Copenhagen scientist is theeditor of the international journal Clinical Genetics,which publishes mapping studies and similar re-search. There are several ongoing projects at theinstitute on various genetic diseases, but there isno concerted effort or government policy on map-ping and sequencing (70).

One project of interest is a family pedigreeproject that has been underway for more than10 years, Like the Venezuelan pedigree project(box 7-A), this is a collection of genetic materialfrom families with many children; the collectioncontains “samples of red cells, serum, plasma,thrombocytes [parts of the blood that help in clot-ting], lymphocytes [cells important in the immunesystem], as well as skin biopsies” (59). Unlike theVenezuelan material, the genetic material in theDanish project was collected from apparently nor-mal families; over the years it has been tested byclassical genetic markers to help establish poly -

morphic regions for genes of different bloodgroups, enzyme types, and so on. Extensive RFLPmapping (see ch, 2) of the material has not beendone because of limited resources, but negotia-tions are underway to contribute material to theCenter for the Study of Human Polymorphism(CEPH), an international gene mapping center lo-cated in Paris, for further mapping. There is asyet no clear policy in Denmark on whether to se-quence large portions of the family material, espe-cially because resources are limited, but the re-search group is exploring the possibility ofcollaborative arrangements within Denmark, withother countries, and with the United States. Thegoal is to establish a Danish center for human genemapping, LINK, starting with the family materialthat has already been gathered and expanding thecollection, as well as drawing in researchers fromother institutes. LINK is envisioned as a Scandina-vian counterpart to the French CEPH effort (59).

The Danish Government has established 10 newbiotechnological centers and allocated D.kr. 500million (about $80 million) for their operating ex-penses over the next 5 years (6)59); 410 millionwill be used to establish new research centers attechnical universities and private firms (24). Thebiggest center, at Aarhus, is already supportingsome gene mapping research in collaboration withCEPH.

Federal Republic of Germany

The emergence of the environmentally orientedGreen party in West Germany, combined with ageneral wariness about research with possible eu-genic applications, has made molecular geneticsresearch a sensitive political issues Nonetheless,research in molecular biology is well funded byfederal, state, and private monies. There are four

K)ne indication of this attitude is that a feclerall~ appointed com-mission of government and outside experts on genetic engineeringrecommended, in early 1987, that there be “tight limits drawn foranalyses of human hereditary factors (genomic anal~rsis) as well asfor gene therapy” (2). The commission published an extensive re-port entitled Chances and Risks of Genetic Engineering after twoyears of study. An English translation of the foreword and recom-

mendations of the report, entitled Gene Technology: Opportunitiesand Risks (16) has been made a~’ailable by the EEC. The DFG criti-cized the recommendations of the commission in the case of ge-nome analysis, arguing that the search for causes and cures forgenetic defects is a scientific duty and serves the public interest(42).

Page 144: Mapping Our Genes - Genome Projects: How Big? How Fast?

144

main sources of funds for basic research inmolecular genetics. The Max Planck Society, whichreceives a substantial allotment from the federalgovernment but is legally independent, supportsthe Max Planck Institutes, each of which is devotedto a particular area of research (72). The GermanResearch Association (DFG) obtains approximatelyhalf of its funds from the federal government andhalf from state governments and supports re-search in the universities. The German Ministryfor Research and Technology (BMFT) supportsprojects in universities as well as funding the In-stitute for Biotechnology Research and other re-search institutes. Individual states contribute tosome science research through the universities.Another source of potential support for genomeresearch is the prestigious Society for Biotechno -logical Research (GBF), a government-funded re-search center (78).

At present, West Germany does not have a co-ordinated genome mapping or sequencing project.At a meeting in September 1987, representativesof the DFG decided not to endorse a concertedgenome project, although the agency does sup-port a research program targeting molecularmethodology for studying the genome (52).

West Germans are strong supporters of inter-national cooperation. They consistently contrib-ute to EMBL, and several laboratories are carry-ing out research that could be extended at littleexpense and aligned with an international collabo-ration in genome research.

Biotechnology is being actively promoted by thefederal and state governments in West Germany.The Federal Ministry of Research and Technol-ogy’s Biotechnology Research Program, initiatedin 1985, includes as an objective the promotionof “research and development projects in publiclife care, including health, nutrition, and environ-mental protection”; one of its high-priority re-search areas is a program of “genetic engineer-ing with a focus on the investigation of genestructures, research on gene functions, and oncontrolling of genetic processes” (68). The minis-try has also encouraged the establishment of re-search centers in which university and industrywould participate and has set up seven “genecenters” to study areas including gene expression

and differentiation and the correlation betweengene structure and function. Human genome map-ping and sequencing are not explicitly includedin either the Biotechnology Research Program orthe genetic research centers, but both supportrelated research and could provide an institutionalinfrastructure and funding framework for ge-nome research.

Finland

In January 1987, scientists at the Finnish Acad-emy proposed a 5-year plan to improve biotech-nology and molecular biology research, in orderto promote industry and increase industrial ca-pabilities. The proposal included a request for theequivalent of $37 million per year for research,training, and equipment (48). Finland has estab-lished several genetic engineering research cen-ters and has plans for half-adozen more; the in-stitute associated with the University of Helsinkiis perhaps the best known.

Human genome mapping in Finland is beingdone by about 10 large and small individual re-search groups in medicine and science. They areprimarily funded by government sources, namely,university budgets and the Academy of Finland,which is the main funding source other thanuniversities. The University of Helsinki hosted theeighth international Human Gene Mapping Work-shop (HGM 8) (5). Finland has no concerted effortnor any specific policies; as in most countries, how-ever, sequencing efforts have focused on particu-lar genes. Finnish groups are involved in collabora-tive projects with groups in other countries,notably the United States, and have contributedto and received materials from international data-bases and repositories.

France

Since 1981, the French Government has soughtto make France a world power in science and tech-nology by increasing both funding and politicalinterest in research and development. The Gov-ernment has encouraged collaboration betweenuniversity and industry researchers, both withinthe country and with the rest of Europe (e.g., theEUREKA program).

Page 145: Mapping Our Genes - Genome Projects: How Big? How Fast?

145

The French Ministry of Research is directly orindirectly in charge of nearly all government-funded research. Most is carried out withinuniversities, often in units set up by the researchorganizations, the largest of which is the NationalCenter of Scientific Research (CNRS). The CNRSand the much smaller National Institute of Healthand Medical Research (INSERM) are the only twogovernment organizations that support researchrelated to human genome mapping and sequenc-ing. The Pasteur Institute in Paris, a semi-autonomous institute that receives half its fundsfrom the government, carries out related research.None of these organizations has announced a firmplan for human genome mapping or sequencing,but each is considering what part it might play[Newmark, see app. A].

An important focus of genome studies in Franceis the CEPH (see box 7-B). organized in 1983 byJean Dausset to “hasten the mapping of the hu-man genome by linkage analysis with DNA poly-morphisms)” CEPH is a privately funded centerthat collects and distributes genetic materials foruse in mapping studies. It acts as an informal coor-dinator for approximately 40 investigators in Eur-ope, North America, and Africa who use CEPHmaterials in exchange for reporting their data(25,26).

France has not initiated a coordinated genomeproject, but there is a strong undercurrent of opin-ion favoring a substantial program in human ge-nome mapping and sequencing as long as it is notfunded at the expense of other research. Genomeresearchers may try to work through EUREKAto involve other European companies with an in-terest in instrumentation or information technol-ogy. The French Government (usually through itsMinistry of Industry) is prepared to provide 50percent funding for EUREKA projects, and thereare indications that it would consider CEPH’s hu-man genome work eligible for EUREKA funding[Newmark, see app. A].

Italy

Recent administrations have given priority toimproving Italy’s scientific performance in hopesof sparking a technology-led revitalization of thecountry’s ailing economy. Considerable extrafunds for technology-related research have been

made available in the past few years, with bio-technology as one focus. The Italian Governmentannounced in April 1987 that it would allocate209 billion lire (approximately $156 million) overa 5-year period for a national biotechnologyproject involving both public research centers andindustry (64); the following month Italy’s NationalResearch Council (CNR) announced a special re-search project in biotechnology for which it willspend 84 billion lire (about $63 million) over the5-year period (51).

In May 1987, the CNR announced its decisionto initiate a project devoted to human genome se-quencing, to be run as a cooperative effort of allCNR institutes and laboratories working in biol-ogy (22). Nobel laureate Renato Dulbecco is C o-

ordinating the project, in which CNR has startedinvesting 20 billion lire (about $15 million) and75 to 100 person-years (51). A 2-year pilot projectwith a budget of $1 million per year will be under-taken first, to determine whether a large-scaleproject will be funded at around $10 million a year.(These sums are to cover only specific materials,machines, travel, meetings, and so on—not sala-ries and general overhead—since only the exist-ing number of personnel will be involved. )

A key question in the pilot project is whetherit is possible to isolate a single chromosome with-out damaging it so much that sequencing wouldbe impossible, The ability to separate the chro-mosomes would offer a shortcut to sequencing,and researchers could begin sequencing with oneof the smaller chromosomes (but one with genesof particular interest), probably chromosome 21,22, or Y (73). otherwise) researchers will considercontinuing the project using conventional tech-niques. Research institutes and laboratories inRome, Naples, Pavia, and Milan will participatein the project. Databases and information retrievalwill be managed by research units in Rome, Turin,Milan, and Bari, with the aim of making the na-tional databases compatible with and complemen-tary to existing international ones (57)73).

The pilot human genome project is still expkma-tory, so no attempt is being made yet to coordi-nate work with researchers outside Italy. Projectscientists anticipate that the final project wouldbe complementary to, if not an integral part of,any international project that arises [Newmark,

Page 146: Mapping Our Genes - Genome Projects: How Big? How Fast?

146

see app. A]. In the meantime, Italian scientists are ian scientists are not the only ones interested inenthusiastic about Italy’s role in genome mapping; chromosome 21, however; it is a popular target“there is good reason to believe that, for once, for research because it contains genes for Alz-this country will perhaps succeed in reaching the heimer’s disease and for Down’s syndrome, andstarting line ahead of &her countries” (73). Ital- it is likely to be an early focus of U.S. efforts.

BOX 7-B.— The Center for the Study of Human polymorphism (CEPH):An International Gene Mapping Center

The Centre d’Etude du Polymorphism Humain (CEPH) has become an important focus of internationalscientific cooperation in the drive to map the human genome. CEPH is a private research foundation estab-lished in 1983 by French Nobel laureate Jean Dausset with the bequest of an anonymous donor. Its aimis to “hasten the mapping of the human genome by linkage analysis with DNA polymorphisms. ”

The basic premise behind CEPH’S activities is that a genetic linkage map (see ch. 2) will be more easilyconstructed if researchers study genetic material from a common group of families—a reference panel.The most useful family pedigrees consist of four living grandparents with many children and grandchildrenso that the inheritance of DNA can be traced through three generations. CEPH maintains DNA from a panelof 40 families, each with 5 to 15 children; in most cases, all grandparents are living. The DNA from 29of the 40 families in the CEPH collection was contributed by Ray White and his collaborators from theHoward Hughes Medical Institute (HHMI) in Utah. Dausset also solicited family materials gathered by otherresearchers in the United States and Europe, including some material from normal families identified inthe Venezuelan pedigree project. In contrast to that project (see box 7-A), in which researchers collectedmaterial from families with Huntington’s disease in order to trace the gene responsible, CEPH maintainsmaterial from families with no known genetic diseases. The markers mapped to chromosomal locationsin normal CEPH families can then be used to accelerate the search for disease genes in other families.

CEPH coordinates an international collaboration of researchers from laboratories in Europe, NorthAmerica, and Africa. In order to obtain material from CEPH, collaborating investigators must first possessDNA probes that detect genetic markers, generally RFLPs. They must agree to use the probes to test theentire panel of 40 families and to provide CEPH with all of their data. There are no enforcement mecha-nisms, but so far researchers have cooperated.

Dausset’s work is supplemented by the efforts of Jean-Marc Lalouel, a mathematical geneticist at HHMIin Utah who has designed a variety of computer programs to record and analyze the data contributedby CEPH investigators. Lalouel and his collaborators have written programs that analyze genetic linkagesand automatically sketch out gene maps from the results. These programs are sent out on disk with theCEPH DNA samples. Researchers can record and analyze their data using the programs on the disk, thensend the disk back to CEPH for inclusion in a central database. HHMI supports a database station at CEPHthat will be linked to its Utah station and may soon include interactions with other databases as well.

An important factor in CEPH’s success at fostering cooperative research is the two-tiered database itmaintains. One database, available only to collaborators, contains all data that investigators produce. Atthe end of a year’s time or when the results have been published, whichever comes first, data from thecollaborative database is moved into a public database, where it is accessible to any qualified researcher.This system of having both a private and a public database ensures the timely sharing of information whileaffording investigators some proprietary protection for their results. The fact that the collaboration re-quires sharing of data–but not the actual probes, which could prove to be patentable–reduces potentialcompetitive tensions.st)l’it( *.SH M Cann, Centre d’Etude du Polynorphtsme Humam K’F, PHI (’ollaboratn e Mapp[rrg of thr Human (,enome paper submitted In preparation for the ()[ [ 16. I ? IY86 nw,etlng

of the ,.ldvisory Committee to the Director, NIHH M Cann, fXPH personal commumcation, December 1987C’entre d’Etude du Polymorphlsme Humaln, unpublished report, 1986J Dausset, H Cann, and D Cohen, “Centre d’Etude du Polymorphlsme Humam (fXPHJ (krllaboratrw Mapping of the Human Genome ‘ unpublished manuscript Apri 198?J 1, Marx, “Putting the Human Genome on the Map, ” Scienre 229150-151, 1985M Pines, .tfappmg the Human Genome (Bethesda, MD Howard Hughes Medical bwANute, 1987)It W’hlte, M I.eppert D ‘I Bishop, et al , ‘(construction of I.rnkage Maps wrth DNA Markers for Human ( hromowmes Safure 313 101.105 198.5

Page 147: Mapping Our Genes - Genome Projects: How Big? How Fast?

— . —

147

Industry is not playing a role in the pilot project,since few Italian companies have the technologi-cal interest or capability. But scientists involvedin the research believe that “the automation re-quired for the project will act as a major incen-tive for industry” and hope that industry wouldhelp finance the final project (73). At least one Ital-ian pharmaceutical company has expressed a will-ingness to participate and contribute.

The United Kingdom

The United Kingdom has a strong research tra-dition in molecular biology and genetics, and ithas done pioneering work in the mapping of non-human genomes and in the development of se-quencing techniques. The United Kingdom hasconsistently ranked second to the United Statesin the number of articles on human gene map-ping and sequencing published annually in inter-national journals (see figure 7-1 ) table 7-l). TheUnited Kingdom also ranks high in the develop-ment of physical mapping techniques and of auto-mated technologies for DNA manipulation andanalysis. Thus the United Kingdom is well placedintellectually, if not financially, to contribute sig-nificantly to mapping the human genome.

Basic biomedical research is funded mostly bythe government through the Department of Edu-cation and Science, although both the Departmentof Health and Social Security and the Departmentof Trade and Industry have funds available forcontract research. The Department of Educationand Science distributes research monies throughuniversities and through five research councils,The research councils provide support for scien-tific programs carried out in universities; somecouncils also support research within their owninstitutes. Biotechnology is an area of overlap forthe Science and Engineering Research Council(SERC) and the Medical Research Council (MRC),the two councils whose areas of interest are mostclosely related to human genome research. Thescience and engineering council supports basicbiological research outside the medical field, al-though it has supported some work on automatedDNA sequencing through a biotechnology direc-torate established to link academic research toindustrial needs. The MRC is undoubtedly theleading supporter of mapping and sequencing re-

search. Its total expenditure for genome-relatedresearch for the 1985-1986 fiscal year, both di-rect and indirect, was approximately 4.2 mil-lion ($7.4 million) [Newmark, see app. A] (88).

The MRC is similar to the NIH in supporting high-quality, investigator-initiated proposals, althoughthe council also establishes targeted programs inparticular areas. It has a longstanding commitmentto molecular biology and has the power to set upnew units devoted to particular areas of researchwhen a suitable director and sufficient funds areavailable. Although the MRC supports a good dealof relevant research and its various units and grantholders have the expertise and instrumentationnecessary for the study of genetic disease, the MRCdoes not now plan a targeted program of researchon human genome mapping or sequencing. At a1987 meeting, however, the MRC did endorse theplan of an employee, well-known scientist Syd-ney Brenner, to map the human genome (largelywith private funds) as long as the research pro-ceeded at no extra cost to the research unit Bren-ner directs (66). At Brenner’s request, the MRChas also agreed to set up a committee that willconsider questions such as who owns the clonesproduced in mapping efforts and how best to pro-vide public access to them.4

Brenner’s project will be financed in part by a4300,000 (about $525,000) prize award he re-ceived from the Louis Jeantet Foundation; theMRC and other sources will provide another<200,000 to +250,000 (about $350,000 to$440,000) per year (56). The project will build ona mapping technique developed by Alan Coulson,John Sulston, and co-workers in the MRC researchunit at Cambridge. They compiled a genetic link-age map of the nematode Caenorhabditis elegans

4“It has been agreed by the MRCI that the human ,genome workshould constitute a separate project to be carried out as an exten-sion of the work of the [Molecular Genetics] L1nit [in Cambridge],It was also considered that the longer term future of this work couldnot be tied to the finite tenure of a personal LJnit. The project mightevohe into a reference laboratory with a major service componentand would then need a different funding structure. A central aimwould be to ensure that the collection of clones and informationremained in the public domain. It \\’as therefore agreed that an Ad\i-sory Board be established to consider these and other policy mat-ters” (66).

Page 148: Mapping Our Genes - Genome Projects: How Big? How Fast?

——

148

genome, the smallest genome known for any mul-ticellular creature (it is estimated to be 80 millionbase pairs, compared with approximately 3 bil-lion base pairs for the human genome—see ch.2). Brenner expects that perhaps half of the ge-nome could be mapped by a few people within5 years. The project will include research on data-handling methods and parallel processors, sincethe mapping techniques require sophisticatedcomputing capabilities.

The Imperial Cancer Research Fund (ICRF), acharitable organization financed solely by dona-tions, has recently recruited scientists to work onthe development of a different technique for hu-man genome mapping, as well as related softwareand instrumentation [Newmark, see app. Al. TheMRC and ICRF plan to explore the possibility ofcollaboration in areas of common interest.

Other efforts in the United Kingdom includetechnology development in automated systems forgenome sequencing at the University of Man-chester Institute of Science and Technology(UMIST) (1) and biocomputing research at theUniversity of Edinburgh. The Edinburgh Biocom -puting Research Unit has considerable experiencein database searching and related problems andis undertaking a variety of studies into the infor-matics needed for analysis of map and sequencedata (15).

The United Kingdom contributes to interna-tional research efforts such as EMBL, to whichthe MRC provided 2.72 million (about $4.7 mil-lion) in 1987, The MRC maintains a level contri-bution to EMBL in real terms, after supportingsome growth of the organization in 1982, whenthe new director was appointed (66).

OTHER INTERNATIONAL EFFORTS

Australia

The largest research institution in Australia isthe Commonwealth Scientific and Industrial Re-search Organization (CSIRO), which is conduct-ing pertinent research through its Division ofMolecular Biology. Biomedical research is primar-ily the province of the National Health and Medi-cal Research Council, which at present funds anumber of researchers working on gene mappingand sequencing. The Department of Human Ge-netics and the Medical Molecular Biology Unit atthe Australian National University in Canberra aresites of some relevant research activity. In partic-ular, chromosomes 6 and 9 are the foci of investi-gation because several genes have been localizedto them (43,82). Researchers at the CytogeneticsUnit Department of the Adelaide Children’s Hos-pital in North Adelaide are constructing maps ofchromosome 16 and part of the X chromosome.They have collaborated with scientists from theU.S. Department of Energy’s Lawrence Livermoreand Los Alamos National Laboratories.

The Department of Industry, Technology andCommerce administers a system of researchgrants under its National Biotechnology Program,with priority areas including genetic engineering

and cell manipulation and culture, which couldprovide support for genome research.

Canada

Canada does not yet have a national policy ongenome sequencing. The National Research Coun-cil (NRC) is considering the creation of a task forceto address this subject within its laboratories. Anational network of biotechnology laboratoriessupported by the council has been set up, includ-ing the Biotechnology Research Institute in Mon-treal, the Plant Biotechnology Research Institutein Saskatoon, and the Division of Biological Sci-ences in Ottawa, which focuses on protein engi-neering.

In addition to the expertise that the governmentresearch institutes might lend to genome research,Canada has 15 to 25 university laboratories withthe necessary skills and equipment to participatein a human genome project. To date, however,there has been little effort to coordinate the activ-ities of these various groups. Canadian scientistsand government officials are paying close atten-tion to international developments in human ge-nome sequencing and are hopeful that opportu-nities for international collaboration will develop(67).

Page 149: Mapping Our Genes - Genome Projects: How Big? How Fast?

149

Latin America

Relatively few laboratories are involved in hu-man genome research; of those that are, the pri-mary interest is generally mapping genes for dis-eases of particular national significance. As oneobserver pointed out, “Brazil has its share of goodscientists, but they are hampered by lack of fund-ing and difficulties importing equipment and ma-terials”; presumably the same holds true in otherLatin countries (13).

Many Latin American countries realize the com-mercial potential of biotechnology; Brazil and Ar-gentina, among others, have initiated programsto encourage biotechnology research and devel-opment. Argentina has a biotechnology programunder the aegis of its Secretariat of Science andTechnology (60), and Brazil has a BiotechnologySecretariat in its Ministry of Science and Tech-nology (13). Scattered throughout Latin Americaare individual laboratories doing relevant re-search.

In Mexico, “scientists are pushing the Mexicangovernment to consider the development of ge-netic research a priority. They don’t want to fallbehind on this kind of research, because thepathology index in the Mexican population is ap-proaching that of developed countries. With epi-demics and infections decreasing, greater atten-tion can be paid to genetic problems” (69). LikeBrazil, however, Mexico has a low research bud-get (less than 0.6 percent of the gross nationalproduct is spent on research) and can neither af-ford sophisticated equipment nor train enoughscientists; both countries are interested in inter-national cooperation. The Organization of Amer-ican States reports that its Department of Scien-tific and Technological Affairs, which runs aRegional Program for the Development of Scienceand Technology in Latin America and the Carib-bean, includes projects in plant and animalgenetics but none in human genetics (65).

South Africa

Gene mapping and sequencing research is sup-ported by the Medical Research Council (MRC),the Council for Scientific and Industrial Research(CSIR), and the National Cancer Association, Nonehas initiated a formal or coordinated attempt tomap or sequence the human genome, but thereare a number of laboratories at work in the fieldof human genetics (30). Several researchers areactive in the CEPH collaboration, screening theCEPH family materials and contributing their re-sults. Researchers are examining genes for Hun-tington’s disease, cystic fibrosis, and neuro-fibromatosis in collaboration with laboratories inthe United States and the United Kingdom (10).Research is also underway on several genes ofparticular interest in the region–those for Oudt-shoorn skin disease and familial hypercholester-olemia (conditions prevalent in Afrikaners) andalbinism, which is common in the Bantu popula-tion (50).

The Union of Soviet SocialistRepublics and Eastern Europe

Although the Soviet Union has not been a ma-jor contributor to mapping and sequencing studiespublished in international journals, it has pub-lished some research on bacterial genomes (74)and the barley genome (3). Soviet scientists arealso working on computational methods for ana-lyzing DNA sequences (7). The Central Institutefor Molecular Biology in East Berlin has under-taken a variety of studies in gene mapping andsequencing and has collaborated with research-ers in the United Kingdom (45). Bibliometric anal-yses (see figure 7-1 and app. C) show that the So-viet Union and Eastern European countries havenot published a significant number of researcharticles on human gene mapping and sequencing.These figures tend to select items from interna-tional journals, however, so internal publicationsare not as thoroughly cataloged and accountedfor.

Page 150: Mapping Our Genes - Genome Projects: How Big? How Fast?

150

INTERNATIONAL COLLABORATION AND COOPERATION

The large size and humane mission of humangenome projects make them ideal candidates forinternational collaboration. International data-bases have already been established and are be-ing jointly maintained, which indicates some will-ingness to cooperate on gene mapping efforts, butit remains to be seen how far that cooperationwill extend. The potential for commercial payoffsraises difficult questions but does not precludesuccessful collaboration as long as prior agree-ment on allocation of benefits is reached (32,49).The following sections recount some precedentsfor collaboration and cooperation in internationalscience projects and the role the United States hasplayed in them. Organizational options availablefor international human genome projects are ex-amined, and some collaborative efforts alreadyunderway are described. The following chapteroutlines the questions of international technologytransfer that will undoubtedly arise in any coordi-nated international effort.

Precedents for InternationalScientific Programs

The biological sciences have been organized intointernational projects far less often than other sci-ences, but collaborations in the physical and spacesciences can provide useful organizational insights.The International Geophysical Year, box 7-C, isan example.

Since the 1940s, research in particle and high-energy physics has relied on complex and expen-sive equipment—notably, the particle accelera-tor—that is beyond the ability of any individualinvestigator, or even any one institution, to con-struct and maintain. Consequently, a number oflarge, specialized laboratories have emerged na-tionally and internationally. In the United States,centralized facilities evolved into a network of na-tional laboratories, now operated by DOE. Theselaboratories house cyclotrons, synchrotrons, andother advanced instruments and undertake re-search in a broad range of areas, cooperating inlimited ways with researchers from abroad.

The European Center for Nuclear Research(CERN) was established in 1954 to advance knowl-

edge in the field of particle physics. It is operatedby 14 European nations and has provided a frame-work for collaboration in instrumentation. Itsgoverning council consists of one technical advi-sor and one administrative advisor from eachmember nation. Participants contribute to CERNbased on their gross national products, althoughno nation can contribute more than onequarterof CERN’S annual operating budget. CERN has en-abled European nations to conduct research be-yond the capabilities of any single member na-tion and has been widely recognized for its successin the advancement of particle physics. It has re-stricted its efforts to basic research, however, andso has avoided the complications that arise in col-laborative work on applied research (80).

The enormity of the endeavor to explore andstudy space spawned proportionately large agen-cies to manage the research. The founding legis-lation of the United States’ National Aeronauticsand Space Administration (NASA) included inter-national cooperation as a major theme, and NASAhas carried out that mandate by negotiating andimplementing hundreds of cooperative projects.Some NASA projects have established formal jointworking groups on a bilateral basis with othernational agencies. These groups meet several timesa year to ‘(discuss present and future projects ofmutual interest, and to exchange information onscientific and management issues of concern” (61).

One of NASA’s major partners has been theEuropean Space Agency (ESA), a collaboration of13 European nations. The Hubble Space Telescopeis an example of collaboration between the twoagencies. In 1977, officials from NASA and ESAdrew up an agreement to work together on theproject, citing specific contributions and respon-sibilities (37). An article on data rights directedthat scientific data from the telescope be reservedfor analysis for one year, then turned over to pub-lic data centers. Results were to be made avail-able to the scientific community through publica-tion as soon as possible and appropriate. Nospecific provisions were made for patenting prod-ucts or processes developed in the course of theproject.

Page 151: Mapping Our Genes - Genome Projects: How Big? How Fast?

151

Box 7-C. —The International Geophysical Year

The International Geophysical Year (IGY) was originally conceived as the third in a series of interna-tional polar years-earlier cooperative investigations into the phenomena of the Arctic and Antarctic tookplace in 1882-1883 and 1932-1933–but the scope was expanded to include the study of all aspects of thephysical environment. Sydney Chapman, one of the organizers, described the enormous undertaking asit finally evolved:

The main aim is to learn more about the fluid envelope of our planet—the atmosphere and oceans-over allthe earth and at all heights and depths. The atmosphere, especially at its upper levels, is much affected by distur-bances on the Sun; hence this also will be observed more closely and continuously than hitherto. Weather, theionosphere, the earth’s magnetism, the polar lights, cosmic rays, glaciers all over the world, the size and form ofthe earth, natural and man-made radioactivity in the air and the seas, earthquake waves in remote places, willbe among the subjects studied. These researches demand widespread simultaneous observation.

To accomplish this, teams of scientists from 67 nations—60,000 in all-observed, measured, and recordeddata in meteorology, geomagnetism, auroras and airglow, the ionosphere, solar activity, cosmic rays,oceanography, glaciology, gravity measurements, and other disciplines over a period of 18 months in1957 and 1958.

The effort was coordinated by the Special Committee of the IGY (CSAGI) under the auspices of theInternational Council of Scientific Unions. Planning committees were appointed to organize researchprograms in 14 different disciplines. Participating nations generally had their own planning commissionsor advisory boards as well.

An essential feature of IGY was the operation of world data centers. Participants agreed to sendall of their data to three major centers, in the United States, the U. S. S. R., and Western Europe. Organiza-tions or investigators from any country could obtain copies of the deposited materials free of charge(other than the price of reproduction and transmission). In addition, the data were summarized andpresented in more than 30 volumes in the Annals of the International Geophysical Year, an informationresource that provided the raw material for subsequent research in geology, meteorology, oceanogra-phy, and other fields.SOLIRCESS Chapman Anrra/.s of rhe InIernaIiona/ Geophwmal Year, forward quoted m H Newell, Beyond the Atmosphere E+srfy Years of Space Scwnce (W’ashmgkm [X’ N 4S.!, 1980)S (’hapman /[;} }’ear of Dwco\e~~ L4nn Arbor, MI Llnwersity of Michigan Press, 1959)J M England, ,4 Patron for Pure Science “I%e National Science Foundation’s Formatwe }’ears (Washington, DC National Scuvw Founcfatum 19821 pp 297..{O4

H F. Newell Beyond rhe Atmosphere Earfy Years of Space Science (Washington, DC NASA, 1980)W’ Sullnan, .4sMu/t on fhe 1‘nknown rhe hrternatlonal Geophysical Year (New York NY M c G r a w ’ H i l l , 19611

NASA’s operating principles for international potential: ‘(Astronomical data have no commer-collaboration are a useful starting point for draw- cial value” (71). The gap between research ining up collaborative agreements.s One key dif - molecular genetics and the market has narrowedference, however, between human genome proj - rapidly in recent years, making the boundary be-ects and most space research is the commercial tween basic and applied or development-oriented

research nearly impossible to draw. Consequently,‘NASA has never formally encoded its mechanisms for interna- agreements si-mila~ to those negotiated by NASA

tional collaboration, but it has developed an informal set of guidelines:● Cooperation is on a project-by-project basis, not on a program and ESA regarding data rights and publication of

or other open-ended agreement. results could prove insufficient for human genome• Each project must be of mutual interest and have clear scien- projects. A second difference is that the instrumen-

tific value.Technical agreement is necessary before political commitment. tation required for human genome projects is nei -Each side bears full financial responsibility for its share of the ther as large nor as expensive as that used in par-

project. ticle physics and space research.Each side must have the technical and managerial capabilitiesto carry out its share of the project; NASA does not_ provide In spite of a stated desire for international co-substantial technical assistance to its partners, and little or noU.S. technology is transferred. operation, the United States has generally actedScientific results are made public (55). as the primary partner in large science projects,

Page 152: Mapping Our Genes - Genome Projects: How Big? How Fast?

152

defining them and then inviting other nations tojoin in, rather than planning, funding, and imple-menting projects jointly (54). In the present eraof constrained funding, however, the United Statesmay not always be able to carry out major re-search projects on its own.

Collaborative projects can offer significant sav-ings for participating countries by splitting thefinancial burden (although some observers havepointed out that the costs of negotiating and theloss of jobs if a project is located outside the UnitedStates may reduce the savings). Collaboration cre-ates a paradox, however: On the one hand, it mightreduce the cost for each member, making the proj-ect more feasible; on the other, it might reduceeach nation’s potential economic gain from the

project. The world economic situation has led toan increasing desire for scientific research to pro-duce commercially valuable products, therebyfostering a protective, nationalistic attitude towardresearch (see box 7-D).

Options for InternationalOrganization of Genome Research

A decision to pursue human genome projectson the international level, emphasizing coopera-tion and participation, will entail considerable or-ganizational effort. It will have the same organiza-tional goals as a domestic effort: to eliminateredundancy in research and to expedite the spreadof scientific and commercial knowledge of the ge-

BOX 7-D.— Views on International Cooperation and Collaboration in Genome Research

“Too many promising international research collaborations, from AIDS research to the sequencing ofthe human genome, languish for lack of a workable framework for tangible and short-term research. . . .The U.S. Department of Energy and the Japanese Science and Technology Agency have an interest in orga-nizing and supporting the @enomel project; each seems sensibly to have decided that two independentprojects would be a waste of resources and a source of confusion, but [they] differ sufficiently in theirobjectives as to impede agreement between themselves, let alone with others. ” Editorial in Nature 328:187,1987.

“There’s a task to be done here, and we need to get on with the task. If we try to take into accountevery country’s interest and concerns, we can only serve to delay it. ” J. McConnell, Johnson & Johnson,Science Writers’ Workshop, Brookhaven National Laboratories, Upton, NY, Sept. 14, 1987.

“An international DNA analysis center or centers equipped with super sequencing systems which areconnected to a worldwide data-network should be developed. ” A. Wada, “Many Small-Scale or a Few Large-Scale DNA Sequencers?” unpublished report, Japan, 1987.

“It is highly desirable that the U.S. continue to be the leader of the @enome mapping and sequencing]effort, but it must be consciously and effectively run as an international quest for knowledge having univer-sal importance. No single purse nor administrative center, in either the U.S. or the world, can or shouldbe created to fund or attempt to direct the task.” D. Fredrickson, National Institutes of Health, personalcommunication, December 1987.

“There is . . . a growing awareness in Europe that the first megaproject in biology is shortly beinglaunched. Europe ought to participate in it alongside the USA and Japan to ensure access to the informationand all that it implies for medical and biological science, as well as the technological spinoffs that will surelyarise. . . . There is now an opportunity to ensure that the project involves international collaboration fromits outset which should not be missed. ” L. Philipson and J. Tooze, “The Human Genome Project, ” Biofutur,June 1987, pp. 94-101.

“If they wished, either Western Europe or Japan could by themselves take on this project and it mustbe assumed that they will initiate their own efforts. So a new international body should soon be formedto ensure that collaboration, not competition, marks the relationship between these efforts in various partsof the world. In a real sense, the exact sequence of the human genome will be a resource that shouldbelong to all mankind. So it is a perfect project for us to pool our talents, as opposed to increasing stillfurther the competitive tensions between the major nations of the world.” J.D. Watson, director’s reportfor Cold Spring Harbor Laboratories, in press.

Page 153: Mapping Our Genes - Genome Projects: How Big? How Fast?

153

“The principle of ‘mutual self-interest’ . , . lies at the heart of successful cooperation. ” D. Dickson andC. Norman, “Science and Mutual Self-Interest,” Science 237:1102, 1987.

“If a sequencing factory can be built, Wada emphasizes that it would not be ‘Japan Incorporated’ againstthe rest of the world. He wants an international centre that would be open to scientists of all nationalitiesand intended for the benefit of all mankind. ” D. Swinbanks, “Human Genome: No Consensus on Sequence, ”Nature 322:397, 1986.

“This project is so vast that it necessarily requires international cooperation. Since there are 3 billionbases to be sequenced, the project will not create problems of competition.” P. Vezzoni, Consiglio Nazionaledi Richerche, Milan, quoted in A. Sommariva, “And Italy Will Study Chromosome 22,” Italia Oggi (Milan),May 22, 1987, p. 36.

“There’s considerable interest in the commercial spinoffs, and I expect each country would want tokeep those. I would hate to see U.S. tax dollars used to kill yet another U.S. industry. ” J. McConnell, ScienceWriters’ Workshop, 1987.

“On the one hand, the climate for international collaboration in science . . . is warmer than ever. Invirtually every major field, U.S. scientists can point to significant work being done in Europe, the SovietUnion, Japan, Canada, or Israel that needs to be read closely, argued about, and replicated as much asdoes work done in the United States. On the other hand, the new era is chillier, for governments andbusinesses here and abroad will continue to try to squeeze economic value out of every bit of science towin the international high-tech sweepstakes. ” D. Shapley and R. Roy, Lost at the Frontier: U.S. Scienceand Technology Policy Adrift (Philadelphia, PA: ISI Press, 1985), p. 116.

“The creation of a sequence database is the major goal of the project, whether it is done nationallyor internationally or privately. . . . I don’t think an international project as an organized scheme will emerge,. . . I expect a set of private ones will emerge, with some level of cooperation,” W. Gilbert, Harvard Univer-sity, Science Writers’ Workshop, Brookhaven National Laboratories, Upton, NY, Sept. 15, 1987.

“I am convinced that an international advisory body must be formed to oversee the data bases. . . .International cooperation [is] as important as interagency coordination in the U.S.A. But I do not thinkthat a special institution would be useful at the national and at the international level.” A. Lafontaine, Officeof the Secretary General, Brussels, Belgium, personal communication, June 1987,

“There is a strong belief here that practical collaborations on actual, welldefined projects are veryhelpful, and probably more meaningful than large-scale collaboration between governments. Cell banks,gene banks, and databases are very important in this regard.” A. de la Chapelle, University of Helsinki,personal communication, August 1987.

“International cooperation is not something that should be imposed by government agencies. . . . Realcooperation comes from individual scientists communicating with each other. ” C. DeLisi, Mount Sinai Schoolof Medicine, Science Writers’ Workshop, Brookhaven National Laboratories, Upton, NY, Sept. 15, 1987.

“I’m just concerned that if we focus on trying to set up an international effort, we will delay decisionsof the United States in proceeding with this. I’d like to see a willingness to cooperate at the internationallevel, but setting U.S. national priorities.” G. Cahill, Howard Hughes Medical Institute, comments at Issuesof Collaboration for Human Genome Projects, OTA workshop, June 26, 1987.

“[Tlhe United States does not and cannot expect to monopolize information and innovation in this field.Moreover, the initiation of a human genome project in the United States will probably not deter workin other countries, but rather will stimulate it. Given this assumption, the importance of past traditions,and the magnitude of the task of mapping and sequencing the entire human genome, every effort shouldbe made to enhance the existing contacts between the United States laboratories and those overseas, soas to speed the work. Indeed, we believe it will become necessary to have some major organized mechanismfor international cooperation, In particular, its objective would be to collate data and ensure rapid accessi-bility to it, as well as to distribute materials, such as cloned DNA fragments.” National Research Council,Mapping and Sequencing the Human Genome (Washington, DC: National Academy Press, 1988), p. 85.

Page 154: Mapping Our Genes - Genome Projects: How Big? How Fast?

nome. Just as the issues in domestic organizationrevolve around distribution of authority and tasksamong interested government agencies and pri-vate firms (described in ch. 5), the issues in inter-national organization involve coordination of in-terested sovereign nations.

An international organization could be eitherpassive or active. A passive organization wouldserve primarily as a clearinghouse of research in-formation among participating nations. This taskwould require the formulation and oversight ofstandard nomenclature and the translation of re-search reports. The organization would need tokeep track of research in progress and any tech-nological innovations reported by individual lab-oratories, and it might be intimately associatedwith databases such as GenBank” and the EMBLdata bank and with collaborative organizationssuch as CEPH. Although participation in this typeof passive organization would have to be volun-tary, all academic researchers would stand to ben-efit from the free flow of information. The pro-prietary interests of commercial researchersmight limit their participation, but collaborativearrangements could be made (12,49). The successof a passive international organization dependsprimarily on the good will of the participants.

An active international organization along thelines of the interagency task force described inchapter 6 could plan and distribute genome re-search among participating countries. There areat least three ways in which the tasks of an in-ternational genome project may be distrib-uted: 1) by physical units, such as chromosomesor genes, in which each country would analyzeone unit or a group of units; 2) by project aspect,such as sequencing, informatics, or cloning, inwhich each country would focus on one aspect;and 3) by geography, in which each country orgroup of countries with similar resources wouldestablish a genome center.

Distribution by physical units would requireeach participating nation to possess the entirespectrum of technical specialties associated withthe project—mapping, sequencing, data manage-ment, and so on, This requirement would prob-ably limit involvement to those nations that arealready scientifically advanced, regardless of anyinterest among nations attempting to develop bio -

technical capabilities. The requirement could,however, spur developing nations to acquire tech-nologies, and it might provide an economic incen-tive for commercial firms to assist in the start-upefforts. Assignment by chromosome would mostlikely cause intense politicking among the top na-tions for the most “interesting” chromosomes. Cer-tain countries or regions might be more interestedin chromosomes known to contain genes that af-fect a large portion of their populations. Such amethod of assignment would also identify a spe-cific nation with a specific achievement, effectivelyplacing flags on the map of the genome. The reali-zation of this would inject an element of competi-tion for national prestige into the context of aninternational science project. In effect, the coop-erative partners would be establishing the arenasand ground rules for competition.

An international project divided by projectaspect would require participating countries toadopt a specialty, which would accelerate devel-opment and commercial profit in that field butcould preclude achievement in related fields, Ja-pan, for example, might contribute a large shareof DNA sequencing because of its interest in auto-mating sequencing technologies. The componenttasks of a genome project are not equivalent noreasily evaluated in terms of necessary resources,so distributing them may prove difficult. Further,some aspects of the project are more visible andeconomically valuable than others. To map or se-quence an important gene is noteworthy and prof-itable; to create a database is to provide a com-mon good but to receive little of value in return.An international division of labor is an attractiveidea, but only clearly defined special talents amongthe nations would justify it.

The third possible distribution of internationalefforts is geographical—several genome centerscould be established and supported by a nationor group of nations. The vocation of these centersmight become a point of debate, however: Shouldeach cover the full spectrum of genome technol-ogy, or should they specialize?6 If each center at-tempted to cover all technologies, a division of

The idea of setting up large centers has been promoted by Amer-ican scientist and entrepreneur Walter Gilbert (44) and by .lapanAkiyoshi Wada (84,85,87). Both have referred specifically to sequenc-ing rather than to genome research in general.

Page 155: Mapping Our Genes - Genome Projects: How Big? How Fast?

155

labor might evolve based on specialized innova-tion. This might keep the centers complementaryand competitive, but not necessarily cooperative.Establishing specialized centers would predeter-mine each center’s scientific and economic suc-cess. Focusing all of them on a single aspect, forexample sequencing, would siphon funds and at-tention from the other aspects, A center arrange-ment involving only countries with state-of-the-art research capabilities might lock out interestedcountries just beginning to develop biotechnol-ogy capabilities, unless the centers were amena-ble to taking on minor partners. Few scientistsother than the two who have proposed the se-quencing center idea seem to be enthusiastic aboutthe prospect of establishing large centralized in-stitutions (see box 7-D).

If an international project is to be pursued, is-sues of participation and underlying motivationsshould be recognized clearly and early. Withoutspecific guidelines for initial and future partici-pation, any organization is likely to become en-trenched and inaccessible to latecomers. If themotivation for an international distribution of ef -fort is purely economic, then participation mightbe restricted to nations already able to demon-strate their ability to contribute. Should an inter-national effort be tied to political goals such asassisting the growth of biological research andbiotechnology in the developing world, then wide-spread participation and an organization capableof coordinating both advanced and developingcountries would be necessary. If political motivesare acknowledged, then the international orga-nization might seek to encourage the associationof national goals and priorities with genome re-search. Political motivations are probably inher-ent in international projects, but they could beused to elicit widespread participation and con-tinuing commitment. By using enticements suchas distribution of physical units of the genomeby political units of the participants, it may be pos-sible to guide nationalistic forces into a workableinternational effort.

An important factor in any international] col-laborative or cooperative agreement will be theparticipants’ domestic organization of human ge-nome projects. The agencies involved speak withmany voices, depending on their respective mis -

sions. Formal collaboration would be difficult tonegotiate without some domestic coordination (seechs. 5 and 6) to harmonize goals. Otherwise, lessformal cooperative arrangements will probablyprevail.

Even if there emerges no formal internationalorganization that can satisfy national and propri-etary goals, the United States could establish aninternational advisory board to solicit suggestionsand recommendations from the international sci-entific community regarding human genome proj-ects. Domestic advisory boards could includenonvoting members from Europe and Japan, Aninternational advisory committee for databaseoversight already exists; it has two members fromthe United States, two from Japan, and severalfrom Western Europe (14). Members of the com-mittee issue recommendations that, although notbinding, help coordinate the various nationalefforts,

Existing Collaborative Frameworks

Lack of an international organizational struc-ture does not preclude informal collaboration orcooperation. Scientific laboratories exchangeviews, visits, and materials as a matter of dailypractice; many scientists prefer informal network-ing to prescribed arrangements and institutions(see box 7-E). Policymakers in Europe are findingthat increasing support for laboratory networks,rather than establishing centers, can be an effec-tive way to conduct research on a limited bud-get, Many of the scientists involved in human ge-nome research host visiting foreign scientists andgraduate students regularly.

The United States already finances internationalcollaboration in biomedical research to a certainextent through the normal funding mechanismsof the National Institutes of Health, which mayaward grants to U.S. investigators “whose workinvolves substantial collaboration with foreign in-stitutions” (63). Researchers affiliated with foreigninstitutions are eligible for grants and contracts;in fiscal year 1984, NIH spent $35 million on for-eign grants, roughly half the budget allotted tointernational activities, NIH also gives grants forforeign or international conferences and for in-ternational research fellowships.

Page 156: Mapping Our Genes - Genome Projects: How Big? How Fast?

156

Box 7-E.—Large Centers v. Networking

The development of international sequencing centers draws enthusiastic response from some quartersand skepticism from others. Proponents such as Walter Gilbert and Akiyoshi Wada advocate the creationof several international centers containing advanced sequencing equipment as the most efficient way tosequence the genome, if not to map it. Critics contend that establishing large central institutions reducesthe innovation spawned by small research laboratories doing investigator-initiated projects. Other critics,including many industrialists, argue against “naive internationalism,” stating that the task at hand shouldbe done posthaste, without lengthy delays while international negotiations decide on the division of labor,responsibilities, and benefits.

One solution that could satisfy critics of both stripes is networking—strengthening the links betweenexisting laboratories-rather than starting up new research centers. Networking has recently gained popular-ity in the European community; indeed, Dickson has written that “the top political priority given to theidea that governments should focus their efforts on linking together scientists in existing laboratories—rather than on creating major centers or research facilities –has become perhaps the most important shiftin European-level science policy in the 1980s.”

Various research programs supported by such organizations as the European Science Foundation andthe European Economic Community (EEC) have adopted networking strategies in lieu of costlier and morecontentious decisions to set up central collaborative facilities. The ESF has supported laboratory networksfor research in areas including polar science and individual psychological development. Particularly rele-vant for genome research is a network on the molecular neurobiology of mental illness, in which scientistsare hunting for pedigrees of families with psychiatric problems in order to locate informative genetic poly-morphisms for linkage analysis studies (see ch. 2 and box 7-A). The EEC supports research under the Stimu-lation Program, providing money to allow scientists from different countries working on the same projectto meet, perform joint experiments, and so on. One successful project that the Stimulation Program funded,according to Dickson, was “a research program into the development of new high-field magnets, whichnow links together scientists working in 58 research institutions in the 12 member states of the EEC. TheEEC’s Biotechnology Action Program, which encourages a translational approach to the research it spon-sors, has developed a similar networking approach-European Laboratories Without Walls (ELWWS). ELWWslink individual researchers from laboratories in different institutions (preferably in more than one country)together for multidisciplinary but focused, precompetitive research projects. The ELWW program empha-sizes rapid, open flow of information and material between participants and incorporates joint planningand evaluation of the scheduled experiments.

Perhaps because European laboratories have traditionally been poor at communicating beyond theirnational borders—European scientists are more likely to collaborate or cooperate with American scientiststhan with other Europeans—the networking strategy has met with increasing enthusiasm and has fosterednotable successes. Whether the strategy would work to link Europe, Japan, and the United States is notcertain. Even within Europe there are potential problems. Networking could lead to the support of eliteresearch groups and exclude those from poorer countries that do not yet have the facilities to be desirableresearch partners. For projects with potential commercial value, proprietary rights and the open exchangeof information can become troublesome issues. Dickson reports that some policymakers argue that “therelative absence of centralized strategic thinking could turn out to be a major weakness. ” Despite thesecaveats, networking is a model for international organization that could reduce the anxieties accompanyingthe planning and implementation of international cooperative or collaborative projects.SOURCES:D. Dickson, “Networking: Better Than Cr8ating New Centers?” Science 237:1106.1107, 1987.J. HeiIbron and D. Kevles, we app. A.J Maddox, “New Eurupean Colbhmations,” Nature 330:417, 1987.J. McConnell, Johnson & Johnson, 6cience Writers’ Workshop, Brookhaven National Laboratories, Upton, NY, Sept 14, 1987R van der Meer, E Magnien, and D de Nettancourt, “European Laboratories Without Walla: Focused Precompetitive Researth,” Trends in Biotechno&y 5318.321, 1987

Page 157: Mapping Our Genes - Genome Projects: How Big? How Fast?

157

DOE also engages, to a limited extent, in inter-national research cooperation and collaborationthrough its national laboratories. It has been crit-icized, however, for earning ‘(a poor reputationabroad for long-term commitment to internationalcollaborations,” which “will make it extremely dif-ficuh for DOE to attract foreign countries into sig-nificant new partnerships” (3 I). So far, however,DOE scientists working on genome projects havecollaborated freely with researchers from othercountries (82).

Existing research organizations can also becomecenters of collaboration. CEPH coordinates over40 international investigators and research lab-oratories for mapping studies (see box 7-B). It sendsgenetic materials to major gene mapping labora-tories around the world; in exchange, the labora-tories share their results and data.

Washington University-RIKEN Collaboration

A recent agreement between researchers fromWashington University in St. Louis, Missouri, andthe Institute of Physical and Chemical Research(RIKEN) in Tsukuba, Japan, illustrates the poten-tial of international collaboration at the level ofindividual institutions (38). The 3-year program,effective November 1, 1987, enables researchersfrom Washington University’s new Center forGenetics in Medicine (founded by a donation fromphilanthropist James McDonnell) to work with re-searchers from the Tsukuba Life Sciences Cen-ter of RIKEN. The research will combine the ex-pertise of the university’s scientists in cloning yeastcells with the technological know-how of theRIKEN scientists, who have developed automatedDNA analysis equipment. The initial focus of theresearch will be to sequence the entire yeast ge-nome and to improve techniques for cloning hu-man chromosomes into yeast cells.

This collaboration, the first bilateral agreementbetween American and Japanese scientists in thefield of genetics, also provides for information andpersonnel exchanges with the Pasteur Institutein Paris and the Academia Sinica in Shanghai,China. Data and results from the collaboration willbe disseminated freely to the international com-munity.

International Human GeneMapping Workshops

A series of biannual international gene mappingworkshops—the ninth (HGM 9) was held in Parisin September 1987—has provided a mechanismfor extensive international interaction. Prior toeach workshop, committees are appointed foreach of the human chromosomes. The commit-tees are in charge of evaluating the research thathas been done on the chromosome; they solicitpapers from the international research commu-nity and select the ones to be presented. At theworkshop, the committee for each particular chro-mosome works toward a consensus on whichmapping data will be accepted as the standard.The committees also decide upon the officialnomenclature for map sites and for probes, andtheir deliberations provide a measure of qualitycontrol for the research. Data accepted at theworkshop are submitted to the Human Gene Map-ping Library in New Haven, Connecticut, and sub-sequently entered into that database (see app. D).In 1987, a new database, Genatlas, was initiatedspecifically for the purpose of managing the map-ping data from HGM 9. The conference proceed-ings are published in the Journal of Cytogeneticsand Cell Genetics. Proceedings of some of the con-ferences have been independently published.

The growth in the size of the HGM workshopsis one indication of the overall growth of the fieldof human genetics. Early conferences attractedan exclusive group of participants, but the ninthdrew hundreds. Data are accumulating so rap-idly that biannual conferences may not be suffi-cient; plans are already underway for an infor-mal workshop, dubbed HGM 9.5, to be held in1988 (9).

International Journals

The scientific publication process is the mostimportant form of data sharing within and acrossnational borders—an ongoing form of interna-tional cooperation. A bibliometric analysis of theinternational literature showed a rapid rise in thenumber of mapping and sequencing articles pub-lished in international journals between 1977 and1986 (see figure 7-2). U.S. researchers have con-

Page 158: Mapping Our Genes - Genome Projects: How Big? How Fast?

158

Fi~ure 7-2.—Human Gene Mapping and Sequencing

2000

1500

1000

Articles Published ‘Annually

m

1977 1978 1979 1980 1981 1982 1983 1984 1985 1986

A bibliometric analysis conducted for the Office of Technol-ogy Assessment by Computer Horizons, Inc. [app. A] showeda steady increase in the total number of articles publishedannually in international journals on human gene mapping,gene markers, nucleotide sequences, and related topics from1977 through 1986. [See app. E for details on the key wordsused in the literature search.]

SOURCE: Office of Technology Assessment, 19S8,

sistently contributed the largest number—from38 to 46 percent of all articles with genetic mapor linkage results (see figure 7-I, table 7-1, andapp. E) [Computer Horizons, Inc., see app. Al. TheUnited Kingdom is the next largest contributor,publishing 8 to 11 percent of the articles annu-ally, while France and West Germany are nextwith 5 to 8 percent and 2 to 5 percent, respec-tively. Japan’s share of the basic research has in-creased fairly steadily, from 2 percent to 5 per-cent of the total. These data show the internationalnature of genome research and of the medical andscientific literature in general. There exists somesegregation of Eastern European journals due torestrictions on export of information, and lan-guage may pose a barrier for non-English-speakingscientists (since many international journals arepublished in English), but for the most part scien-tific journals are thoroughly international. Scien-tists from one nation freely report data in jour-nals from another.

Databases and Repositories

The operation of databases and repositories hasbeen a standard mode of international coopera-tion in many scientific fields, and human genomeprojects are no exception; several databases andrepositories relevant for human genome projects

exist (see app. D). The cooperative arrangementsthat have evolved among the international data-bases for nucleotide sequences and for proteinsequences are examples of effective internationalcollaboration.

Databases for nucleotide sequences were startedat Los Alamos National Laboratory (later fundedby NIH and operated under the name GenBank@)and EMBL (officially dubbed the EMBL Data Li-brary) during the late 1970s. By the fall of 1980,the database organizers recognized the need forcollaboration between the two, and from 1980through 1982 the databases exchanged sequencedata on an informal basis until their first majorreleases. In August 1982, GenBank@ and EMBLheld their first joint meeting and agreed to usea similar system of accession numbers and to di-vide the journals each would scan for data. Thecompatibility of the databases was further en-hanced by agreements, reached in 1985 and 1986,on common sets of data and annotation. The DNAData Bank of Japan formally joined the collabora-tion in May 1987, The division of responsibilitiesfor various aspects of the operation of the data-bases was formalized in meetings in the summerand fall of 1987.

An international workshop on database needsin molecular biology was convened in Heidelberg,West Germany, in 1987. The participants recom-mended that an international advisory commit-tee composed of experts from the fields of molecu-lar biology and information sciences be formedto provide advice and guidance for expanded co-operation among the databases (35). The fundingagencies that support the databases followed therecommendation and appointed a committee,which consists of three members from the UnitedStates, three from Europe, and two from Japan.The committee will meet yearly to advise data-base staff on matters such as format and annota-tion. Its recommendations are not binding, how-ever, since each database is responsive primarilyto the agencies that support it. The first meetingwas held in February 1988.

Formal collaboration on protein sequence data-bases is more recent. The US. database, the Pro-tein Identification Resource (formerly known asthe Dayhoff database, or NBRF), was started in

Page 159: Mapping Our Genes - Genome Projects: How Big? How Fast?

159

the late 1960s. The European and Japanese coun-terparts—the Martinsreid Institute for Protein Se-quence Data (MIPS) (27) and the Japan Interna-tional Protein Information Database (JIPID)–began operations in 1987. The close collaborationamong the three includes use of the same format,the same software, and a regional division of mon-itored journals (41).

The continued development and maintenanceof databases and repositories are the most com-monly endorsed mode of international coopera-tion on human genome projects (see box 7-D). TheNational Academy of Sciences supported the estab-lishment of an international organization to gatherand distribute data and materials in its 1988 re-port on human genome mapping (62).

CHAPTER 7 REFERENCES

1. Abstracts in Biocommerce 10:11, January 1988.2. Altenmueller, G. H., “ ‘Genetic Engineering’ Commas-

sion of Inquiry Presents Its Report: No Blinders,but Firmly in Hand: No Straitjacket for GeneticEngineering Research, Only Clearly Defined Guide-lines, ” VDI Nachricten (Diisseldorf), Jan. 23, 1987,p. 1.

3. Ananyev, Y. V., Bochkanov, S. S., Ryzhik, M. V., etal., “Characterization of Cloned Repetitive DNA Se-quences of Barley Genome, ” Genetika (Moscow)22:920-928, 1986.

4. Anderson, A., “U.S. Pressures Japan Over Imbal-ance in Basic Research, ” Nature 329:662, 1987.

5. Anderson, M., Embassy of Finland, personal com-munication, December 1986.

6. Bolund, L., University of Aarhus, Denmark, per-sonal communication, December 1987.

7. Borodovskiy, M. Y., Sprizhitskiy, Y. A., Golovanov,Y. I., et al., “Statistical Characteristics of PrimaryFunctional Regions of Escherichia Coli Genome,Part 3, Computer Recognition of Coding Regions,”Molekulyarnaya Biologiya (Moscow) 20:1390-1398,1986.

8. Bowman, J., University of Chicago Medical School,personal communication, December 1987.

9. Cahill, G., Howard Hughes Medical Institute, per-sonal communication, December 1987.

10. Cameron, C. M., Department of National Health andPopulation Development, South Africa, personalcommunication, October 1987.

11. Cantley, M., Commission of the European EconomicCommunity, personal communications, August1987; January 1988.

12. Centre d'Etude du Polymorphism Humain, unpub-lished report, 1986.

13. Chamberlain, J. W., U.S. Embassy, Brazil, personalcommunication, August 1987.

14. CODATA, “First CODATA Workshop on NucleicAcid and Protein Sequencing Data,” program andabstracts from conference held at National Bureauof Standards, Gaithersburg, MD, May 3-6, 1987.

15. Collins, J., and Coulson, A. F.W., University of Edin-burgh, personal communication, December 1987.

16. Commission of the European Communities, “GeneTechnology: Opportunities and Risks,” unpublishedtranslation of the foreword and recommendationsof a report by the Committee of Inquiry of the Ger-man Bundestag.

17. Commission of the European Communities, “Pro-posal for a Council Regulation on a Community Ac-tion in the Field of Information Technology andTelecommunications Applied to Health Care: AIM(Advanced Inforrnatics in Medicine in Europe), Pi-lot Phase,” COM (87) 352 final, Brussels, July 24,1987.

18. Commission of the European Communities, “BI-CEPS Background Documents: Outline of a Com-munity Action in the Field of Medical Informatics, ”European Research Organisation for Advanced In-formatics in Medicine, XII/1036/86, August 1986.

19. Commission of the European Communities, “Sin-gle European Act,” Bulletin of the European Com-munities, Supplement 2/86, 1986.

20. Commission of the European Communities, Offi-cial .Journal of the European Communities, No. L83/4, Mar. 25, 1985.

21. Conseil Europeen des Fdd&’ations de l’IndustrieChimique, “Bio-Informatics in Europe: An Indus-try Position Paper” (Brussels: CEFIC, 1987).

22. Consiglio Nazionale della Richerche, “ProgettoStrategic C. N. R.: Mappaggio e Sequenzimento delGenoma Humano,’’unpublished manuscript, 1987.

23. Crawford, M., “Japan’s U.S. R&D Role Widens, BegsAttention,” Science 233:270-272, 1986.

24. Dahllof, S., “500 Million Kroner State Support toDanish Biotech,” NY Teknik (Stockholm), Jan. 29,1987, p. 7.

25. Dausset, J., “Human Polymorphism Study Center,”paper presented at the International Conferenceon New Genetics and the Human Gene Map, Paris,Sept. 11-12, 1987.

26. Dausset, J., Cann, H. M., and Cohen, D., Centre d’E-

Page 160: Mapping Our Genes - Genome Projects: How Big? How Fast?

160

tude du Polymorphism Humain (CEPH): Collabora-tive Mapping of the Human Genome,” unpublishedmanuscript, April 1987.

27. Dickman, S., “New Protein Database for Europe,”Nature 327:265, 1987.

28. Dickson, D., “EMBL: ‘Small Science’ on a EuropeanScale,” Science 237:1108-1109, 1987.

29. Dickson, D., and Norman, C., “Science and MutualSelf-Interest,” Science 237:1101-1102, 1987.

30. Dowdle, E. B,, MEDGENE, South Africa, personalcommunication, October 1987.

31. Energy Research Advisory Board, international Col-laboration in the U.S. Department of Energyb Re-search and Development Programs, DOE/S-0047(Washington, DC: U.S. Government Printing Office,1985), p. 2.

32. Epstein, J., OTA, personal communication, Decem-ber 1987.

33. “ESF’s Seibold on Forging Links for European Sci-ence,” The Scientist, October 19, 1987, pp. 16-17.

34. “Eureka’s Adolescent Growing Pains,” Le Figaro(Paris), July 12, 1987, p. 21.

35. European Molecular Biology Laboratory and Na-tional Institutes of Health, “Future Databases forMolecular Biology,” unpublished report from aworkshop held Feb. 25-27, 1987, Heidelberg.

36. European Molecular Biology Laboratory, The 1987-1990 Scientific Programme of the EMBL (SP87),EMBL/86/3, rev. 1 E, Sept. 1, 1986.

37. European Space Agency, Memorandum of Under-standing Between the European Space Agency andthe United States Nati”onal Aeronautics and SpaceAdministration, ESA/C(77)51, rev.1 Annexe, unpub-

38.

39.

40.

41.

42.

43.

44.

lished.Fitzpatrick, T., “Washington University, JapaneseResearch Group Sign Genetic Research Agree-merit, ” news release, Washington University, St.Louis, MO, Dec. 7, 1987.Fujimura, R. K., Biotechnology in Japan (Washing-ton, DC: Department of Commerce, in press).Fujimura, R.K., Oak Ridge National Laboratory, per-sonal communications, August 1987; December1987.George, D., Protein Identification Resource, Wash-ington, DC, personal communication, December1987.“Genome Analysis Can Increase Therapy Chancesfor Genetically Determined Diseases, ’’Handelsblatt(Ilisseldorf), May 14, 1987, p. 6.Gibson, F., Australian National University, personalcommunication, January 1987.Gilbert, W., “Genome Sequencing: Creating a NewBiology for the Twenty-First Century,” Issues in Sci-ence and Technology (spring): pp. 26-35, 1987.

45. Grienitz, H., “Human Genetics—Humane Genetics)”Spectrum (East Berlin) 3:10-13, 1987.

46. “Human Frontier Wins Muted Support,” New Sci-entist, June 18, 1987, p, 32.

47. Hunkapillar, M., Applied Biosystems, Inc., personalcommunication, October 1987.

48. Ingman, B.J., “Finland: 184 Million Markka Biotech-nology Program 1988-1992)” Hufvudstadsbladet(Helsinki), January 31, 1987, p. 15.

49. Issues of Collaboration for Human GenomeProjects, OTA workshop, June 26, 1987.

50. Jenkins, T., University of Witwatersrand, Johan-nesburg, personal communication, November1986.

51. Kaminsky, H., U.S. Embassy, Italy, personal com-munication, July 1987.

52. Klemenz, H., University of Heidelberg, personalcommunication, September 1987.

53. Kohara, Y., Akiyama, K., and Isono, K., “The Phys-ical Map of the Whole E. Coli Chromosome: Appli-cation of a New Strategy for Rapid Analysis andSorting of a Large Genomic Library,” Cell 50:495-508, 1987.

54. Logsdon, J., George Washington University, per-sonal communication, August 1987.

55. Logsdon, J., “U.S.-European Cooperation in SpaceScience: A 25-Year Perspective,” Science 223:11-16,1984.

56. Maddox, J., “Brenner Homes in on the Human Ge-nome,” Nature 326:119, 1987.

57. Mannarino, E., Embassy of Italy, personal commu-nication, July 1987.

58. Matsubara, K., Institute for Molecular and Cellu-lar Biology, osaka University, Japan, personal com-munication, February 1987.

59. Mohr, J., University of Copenhagen, personal com-munication, August 1987.

60. Morris, R.G., U.S. Embassy, Argentina, personalcommunication, September 1987.

61. National Aeronautics and Space Administration,Life Sciences Report, 1987 (Washington, DC: NASA,1987), p. 65.

62. National Research Council, Mapping and Sequenc-ing the Human Genome (Washington, DC: NationalAcademy Press, 1988).

63. Novello, A.C., “National Institutes of Health Awardsto Institutions in Foreign Countries, 1976 -85,” TheLancet, Sept. 6, 1986, pp. 561-563,

64. Oddo, G., “More Business Involvement in CNR’sFinalized Programs, ’’ll Sole 240re (Milan), July 14,1987, p. 6.

65. Organizaci6n de 10S Estados Americans) Situacionde la Genetica en la Region: Informe Preliminary(Washington, DC: OAS, 1986).

Page 161: Mapping Our Genes - Genome Projects: How Big? How Fast?

161

66. Probert, M., Medical Research Council, UnitedKingdom, personal communications, July 1987; De-cember 1987.

67. Roe, L., Ministry of State, Canada, personal com-munication, December 1987.

68. Schmitz, D., “Biotechnology in the Federal Repub-lic of Germany (FRG),” report prepared for the U.S.Department of State, December 1986.

69. Simpkins, L. C., U.S. Embassy, Mexico, personalcommunication, September 1987.

70. Skou, B., Royal Danish Embassy, personal commu-nications, November 1986; February 1987; April1987.

71. Smith, R., The Johns Hopkins University, personalcommunication, July 1987.

72. Soil, D., Yale University, personal communications,August 1987; September 1987.

73. Sommariva, A., “And Italy Will Study Chromosome22,” ltalia Oggi (Milan), May 22, 1987, p. 36.

74. Sukholdolets, V. V., “Bacterial Genome Construc-tion: New Advances in Genetic Engineering,”Genetika (Moscow) 22:901-903, 1986.

75. Swinbanks, D., “Human Frontiers Program SeeksInternational Help,” Nature 330:683, 1987.

76. Swinbanks, D., “Japan’s Human Frontiers ProjectStays in the Doldrums,” Nature 328:100, 1987.

77. Swinbanks, D., “What Future Now for Japanese Bio-technology Research?” Nature 329:661, 1987.

78. Technologies Elettriche (Milan), January 1987, pp.78-87.

79. U.S. Congress, Office of Technology Assessment,Commercial Biotechnology: An International Anal-ysis (Springfield, VA: National Technical Informa-tion Services, 1984).

80. U.S. Congress, Office of Technology Assessment,Star Power: The U.S. and the International Quest

for Fusion Energy, OTA-E-338 (Washington, DC:U.S. Government Printing Office, October 1987).

81. U.S. Congress, Office of Technolo~v Assessment,New Developments in Biotechnology, 4: U.S. Invest-ment in Biotechnology (Washington, DC: U .S. Gov-ernment Printing Office, in press).

82. Van Belkom, P., National Health and Medical Re-search Council, Australia, personal communica-tions, December 1986; November 1987.

83. van der Meer, R. R., “EC-biotechnology: EuropeanChallenge,” Trends in Biotechnology 4:277-279,1986.

84. Wada, A., “Japanese Super DNA Sequencer Proj-ect,” Science and Technology in Japan 6 (22):20-21, 1987.

85. Wada, A., “Automated High-Speed DNA Sequenc-ing,” Nature 325:771-772, 1987.

86. Wada, A., University of Tokyo, personal commu-nication, September 1987.

87. Wada, A., “Strategy for Building an Automatic andHigh Speed DNA-Sequencing System, ” in Proceed-ings of the 4th Congress of the Federation of Asianand Oceanian Biochemists (London: CambridgeUniversity Press, in press).

88. Yarrow, D.J., British Embassy, personal commu-nications, January 1987; December 1987.

89. Yoder, S., “Japanese Tap Advanced TechnologiesTo Accelerate Deciphering of DNA,” Asian WallStreet Weekly, Dec. 14, 1987, pp. 1, 24.

90. Yoshikawa, A., University of California, Berkeley,personal communication, December 1987.

91. Yuan, R., Biotechnology in Western Europe (Wash-ington, DC: Department of Commerce, 1987).

Page 162: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 8

Technology Transfer

Page 163: Mapping Our Genes - Genome Projects: How Big? How Fast?

CONTENTS

P a g e

Patent and Copyright Policies . . . . . . . . . . . . . . . . . . . . . . . . . .166Patents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........166Copyrights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..., . . . . . . . . . . . . . . . . . 170Trade Secrets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

International Technology Transfer. . . . . . . . . . . . . . . . . . . . . . .................172Economic Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........172Humanitarian and Scientific Benefits . . . . . . . . . . . . . . . . . . . . . ..............173National Prestige . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........174Military Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174

Chapter 8 References. . . . . . . . . . . . . . . . ..., . . . . . . . . . . , . . . . . . . . . . . . . . . . . .174

Page 164: Mapping Our Genes - Genome Projects: How Big? How Fast?

chapter 8

Technology Transfer

‘(The politics of knowledge—the question of who owns and controls the distributionand use of scientific information—is by no means a new issue. The pure scientist work-ing in an ivory tower has long been extinct. ”

Dorothy Nelkin, Science as Intellectual Property: Who Controls Research?(Washington, DC: American Association for the Advancement of Science, 1984), p. 92.

The economic impact of genome projects willdepend on how many new products and servicesare created by them. Some large scientific projectssuch as space programs and electronics researchfacilities have been justified by their potential forspinning off technologies. The magnitude of suchspinoffs is unpredictable, however. Often, thereemerge useful products that could not have beenforeseen IHeilbron and Kevles, see app. A], Giventhe many surprises in molecular biology over thepast decade, it is impossible to predict exactly howgenome projects will result in products, but theyundoubtedly will yield many new applications inpharmaceuticals, agriculture, and other industrialsectors. Uncertainty about the magnitude of eco-nomic impact means that genome projects can-not be justified purely as an economic investment.As the projects go forward for scientific and med-ical reasons, however, it makes sense to ensurethat their results are fully used. The process ofconverting scientific knowledge into useful prod-ucts is technology transfer.

The Federal Government influences the effi-ciency of technology transfer through its researchand development policies. Government has tradi-tionally supported research that will have largebut unmeasurable noneconomic benefits (e.g., re-search aimed at improving health as a value initself rather than simply disease impact measuredin dollars) or that is too risky for individual firmsto support (e.g., projects that are expensive, highlyuncertain in outcome, or long-term). Argumentsfor increased Federal support of biomedical re-search since World War II have generally empha-sized improvements in health. Economic argu-ments for increased biomedical research fundinghave typically been analyses of economic drag–how much the Nation could save by avoiding dis-ability or disease (18). This argument is changing

to concern for efficient translation of science intoproducts. Policymakers are shifting their atten-tion to technology transfer as products derivedfrom molecular genetics find their way to the mar-ketplace, international trade imbalances worsen,and rising deficits intensify scrutiny of Federalbudgets.

A major effort is underway in many developedand some developing nations to target biotech-nology for investment because it is considered par-ticularly likely to produce economic benefits(3, 16,19,23). Most foreign governments’ efforts topromote biotechnology include strategic planningof national research programs and encourage-ment of research and development in private firms(e.g., tax incentives, subsidies for industrial re-search centers, business grants, or governmentrisk capital). The United States has no deliberateFederal policy to encourage biotechnology per se(16,19,23), although legislation introduced late inthe first session of the looth Congress would cre-ate a national biotechnology policy board.

Most genome projects could produce both di-rect and indirect economic benefits. Some projectsare expected to yield directly marketable prod-ucts (e.g., DNA sequenators, analytical instru-ments, DNA probes for diagnostic tests). Otherswould accelerate development of products (e.g.,maps, repositories, and databases).

Different groups have divergent concerns abouttechnology transfer. Scientists fear that corporateparticipation will inhibit the free flow of informa-tion and impede scientific progress. Policymakerswant to ensure that a large Federal investmentin genome projects is translated efficiently intonew products and services, ultimately creatingnew jobs and other economic benefits. They arewary of projects in which U.S. taxpayers will fund

165

Page 165: Mapping Our Genes - Genome Projects: How Big? How Fast?

166

research that is commercialized and used by for-eign interests. In this view, foreign governmentsshould support an equitable fraction of basic re-search, and American investments should not al-low jobs and profits to migrate abroad. Industrialrepresentatives want a say in planning researchprograms and access to scientific results as theyare produced. Individual companies wish to en-sure that any funds they invest will earn suffi-cient returns.

Congress could encourage technology transferby funding personnel exchange among govern-

ment, academic, and industrial sectors, with min-imal bureaucratic strictures, and by supportingsymposia, journals, and other modes of informa-tion exchange. When advisory committees areformed to guide Federal genome projects, indus-trial representatives could ensure that projectsare planned with an eye to economic exploitation.These options are covered in chapter 6. The re-maining options relate to protection of inventionsresulting from federally funded research, dis-cussed below.

PATENT AND COPYRIGHT POLICIES

Ideas and know-how—intellectual property—are granted many of the same legal protectionsas tangible private property. Intellectual propertylaw traces its roots directly to the U.S. Constitu-tion, which authorizes the Federal Government“to promote the Progress of Science and the use-ful Arts, by securing for limited times to theirAuthors and Inventors the exclusive Right to theirrespective Writings and Discoveries,” The purposeof intellectual property protection is to encourageinventors and discoverers to share their knowl-edge, while ensuring that they benefit from thefruits of their labors. Legal protections balancethe social good stemming from wide disclosureof new knowledge against individuals’ or compa-nies’ rights to gain from what would not have ex-isted without their efforts.

Three types of intellectual property protectionare relevant to discussion of the technologies likelyto emerge from genome projects: patents, copy-rights, and trade secrets.

Patents

patents grant inventors the right to excludeothers from producing, using, or marketing theirinventions (as defined in the patent claims) fora specified period. The purpose of patent law isto give inventors an incentive to risk their timeand money in research and development, whilerequiring public disclosure. Patent laws in differ-ent countries vary in degree of protection, en-forcement, penalties for violation, and criteria for

approval. In the United States, the period of pro-tection is 17 years, with extensions for pharma-ceuticals to cover some of the delay imposed byregulation. Patents apply to inventions, but notto ideas, mathematical formulas, or discoveriesof preexisting things. A patentable invention mustbe new, useful, and not obvious. A patent holdercan permit others to use or make the inventionby licensing it.

Profit is only one of many motivations for patent-ing an invention. Another is to maintain controlover it. Leo Szilard filed a patent on the processof nuclear fission, for example, hoping to bringit to the attention of military authorities in theUnited States and Great Britain (12). Cyclotronsused in nuclear physics were patented to ensuretheir proper medical applications, yet this did notinhibit research (in fact, most physicists were noteven aware of the patents) IHeilbron and Kevles,see app. A]. The Rockefeller Institute patented thesphygmomanometer (blood pressure cuff) to en-sure that clinicians would have ready access toit and that later discoverers could not limit its use(11). Nonprofit organizations supporting genomeprojects are likely to encourage patents when theywould ensure broad public use (9).

U.S. patents are obtained from the Patent andTrademark Office in the Department of Com-merce (other nations have analogous institutions).The patentability of inventions is initially deter-mined by this office. The scope of protection andmore refined factors for granting patents are de-fined by case law, when patents are challenged

Page 166: Mapping Our Genes - Genome Projects: How Big? How Fast?

167

in court, The principles for determining patenta-bility do not depend on any particular type of tech-nology, but interpretation of them does. Uncer-tainty about the patentability of inventions isgreater in biotechnology than in most other areasbecause the techniques are new and complex. In-terpretation of criteria for granting patents, defin-ing the scope of patent claims, and determiningwhat constitutes infringement have not been clar-ified by case law, because the case law does notyet exist.

Patent Policies for FederallyFunded Research

Uncertainty about patentability need not para-lyze research efforts, because interpretation ofpatent law does not interfere with most federallysupported research (as explained below). Patentpolicies of Federal agencies will nonetheless in-fluence how genome research is commercialized,and these patent policies have changed dramati-cally over the past decade. The changes are in-tended to promote commercial application of fed-erally funded research by permitting privateownership and control of its results. The reason-ing is that research will be more broadly dissemi-nated and effectively used if those who conductit are granted title to the patents on resulting in-ventions, thus providing an incentive to commer-cialize the inventions [Rosenfeld, see app. A].

Changes in patent policy resulted from studiesshowing that, while the Federal Government heldtitle to roughly 28,000 inventions in 1975, fewerthan 5 percent had been licensed to businesses(15). The Patent and Trademarks Amendmentsof 1980 (Public Law 96-517) were passed to granttitle to small businesses and nonprofit organiza-tions funded to do research by the Federal Gov-ernment. These were further amended in theTrademark Clarification Act of 1984 (Public Law98-620), most significantly by removing restric-tions on licensing. Regulations implementing theselaws were made final by the Department ofCommerce in March 1987.

The policies applying to small businesses andnonprofit organizations were extended to largebusinesses, with some exceptions, by a memoran-dum from President Ronald Reagan dated Febru-ary 18, 1983, The Technology Transfer Act of 1986

(Public Law 99-502) permitted new licensing andjoint venture arrangements, and granted agen-cies authority to form consortia with private con-cerns. Executive Order 12591, issued by PresidentReagan in April 1987, encouraged technologytransfer of federally funded research. The orderwas based on existing statutes and promoted con-sortium formation, exchange of research person-nel between government laboratories and indus-trial firms, special technology transfer programsat federally owned laboratories, and transfer ofpatent rights to government grantees and con-tractors,

The policies embodied in these statutes, regu-lations, and executive orders constrain the author-ity of Federal agencies to force sharing of dataif sharing would conflict with recipients’ takingtitle to inventions [Rosenfeld, see app. A], The gov-ernment can override recipients and take title topatents only in special situations. One of these iswhen an agency determines that retaining title‘(will better promote the policy and objectives” ofthe patent statutes. This clause has been narrowlyinterpreted and has rarely been used by the re-search agencies involved in genome projectsIRosenfeld, see app. Al. The Federal Governmentcan also impose licensing requirements to ‘(allevi-ate health and safety needs,” “meet requirementsfor public use specified by Federal regulations, ”or meet “certain statutory provisions requiringproducts to be manufactured in the United States.”These provisions have also been narrowly inter-preted and impose such a daunting burden ofproof on agencies that they are unlikely to be used.They could conceivably be invoked if patent rightsinterfered with the pooling of data that must becollective to be useful or if clinical benefits weredelayed (e.g., slow commercialization of genetictests or therapies), but only if problems were se-vere and obvious.

Federal research agencies’ patent policies neednot unduly slow exchange of information. The de-gree to which information flow is impeded willdepend on when grant recipients and contractorsfile patent applications. Many genome projects willresult in patentable inventions, particularly thosefocused on technology development. Recipientsof Federal funds may follow one of three coursesof action: file applications early and subsequently

Page 167: Mapping Our Genes - Genome Projects: How Big? How Fast?

168

release data; file early and do not take extra ac-tions to release data (relying on the patent proc-ess to do so); or decide not to patent.

Filing patent applications early and publishingdata soon thereafter are optimal for encouragingrapid dissemination of knowledge, protecting in-ventors’ rights, and preserving economic bene-fits in the United States. Early patenting and sub-sequent disclosure would release data for publicuse but would help inventors maintain control oftheir inventions and assure them and their spon-soring institutions of any financial rewards. Earlypatent application would also protect the Nationbecause statutes give a preference to U.S. manu-facture of any resulting products or services. Earlypatent application not followed by special effortsto disseminate data would ensure benefits for thegrant recipient or contractor but would needlesslydelay exchange of useful information-patents areoften not published for several years, and it hastaken over 7 years for some biotechnology patentsto be awarded.

Investigators may decide not to apply for a pat-ent because they wish to avoid substantial legalcosts and bureaucratic entanglements or becausethey believe that science should not become com-mercially oriented. This can make new methodsfreely available to all, but it can also inhibit fullexploitation of an invention. It is also against theintent of Federal statutes, which require recipi-ents of Federal funds to report patentable inven-tions. An inventor can lose control of an inven-tion if he or she does not file a patent and anotherinventor does so. A product or process that is notpatented is unlikely to be used commercially, be-cause any firm investing in manufacture will wanta guarantee that its investment will be protected.Failure to patent also invites foreign exploitationof research funded at U.S. taxpayers’ expense: Pat-ent rights could be claimed by a foreign company,research institution, or individual; U.S. firmswould not be given manufacturing preference;and the US. inventor could be prevented fromuse of the invention. Export of economic benefitshas occurred frequently in biological scienceswhen initial discoveries have not been patented.Penicillin was discovered in England, for exam-ple, but the patent was obtained by U.S. corpora-tions. The cell fusion process for making mono-

clinal antibodies was developed in London, butmany of its applications were exploited first inthe United States. In both cases, the United King-dom claimed the Nobel Prize, but the United Statesreaped most of the economic benefits.

Federal agencies and Congress may wish to over-see patent practices of grantees and contractorsclosely to ensure that patents are filed early anddata exchanged soon thereafter. Disclosure of datashould not be long delayed by policies designedto encourage patenting of inventions, because dataper se are not inventions eligible for patent pro-tection. There is a gray area, however, betweeninvention of new methods and the data that re-sult from using them.

Scientists may be reticent to disclose details ofmethods used to generate data if doing so en-dangers patentability. An invention must be novelto be patented: that is, it must not be widely usedby parties other than the inventor for more thanone year, and publication of the method cannotprecede filing the patent by more than one year.(Some foreign countries do not permit even theone-year grace period.) If investigators are uncer-tain whether disclosing details of method wouldthreaten a patent, they may choose not to pub-lish those details. Uncertainty over patentabilitycan indeed inhibit the free exchange of informa-tion. It has led one commentator to list three pos-sible ways of altering patent laws: 1) making thedefinition of novelty more flexible; 2) establish-ing an intellectual property protection that is anal-ogous to but more limited than patents and thatrequires less rigorous proof of novelty and nonob -viousness; or 3) legislating special intellectual prop-erty protections for biotechnology (8). Furtherstudy is needed “to determine whether and howbiotechnology demands special treatment as in-tellectual property before legislative reform willbe in order” (8). This suggests that patent policiesmight be high on the agenda for congressionaloversight but low on the legislative calendar.

Filing patents early and then disclosing the re-sults could worsen an already considerable back-log of pending patents. Approximately 7,000 bio-technology patents have been filed at the Patentand Trademark Office and await final action (20).If the benefits of patent protection are judged im-

Page 168: Mapping Our Genes - Genome Projects: How Big? How Fast?

169

portant by Congress, then one option would beto increase the resources in the biotechnology sec-tions of the Patent and Trademark Office. Thiscould include higher salaries, more opportunitiesfor training to keep abreast of technological de-velopments, easier access to technical databases,and more examiners. Increased resources couldnot only reduce uncertainty by diminishing thebacklog of pending patents, but also increase theattention devoted to each application and reducesubsequent litigation.

Patent Policies at Research Agencies

The Department of Commerce recently promul-gated final regulations for Federal agencies to usewhen funding research at small businesses,universities, and nonprofit organizations [37 CFR401]. While these regulations, issued in March1987, have had little time to take effect, the Na-tional Science Foundation (NSF) and the NationalInstitutes of Health (NIH) have followed similarpolicies since the late 1970s,

The General Accounting Office found that uni-versity administrators, industry representatives,and small businesses all reported a “significantpositive impact on research and innovation” fromtaking title to inventions that resulted from fed-erally funded research, University and industryofficials also reported benefits from the 1984 lawthat removed licensing restrictions (15). Agencieslikewise reported a generally positive assessment,with greater potential for licensing patents thanwhen title was retained by the Federal Gov-ernment.

The situation at the Department of Energy (DOE)is more complex. A substantial fraction of DOEresearch funding goes to national laboratories,which are owned by the Federal Government andoperated by private contractors. At most of thelaboratories, the contractor can elect to take titleto inventions. Title rights are restricted, however,at facilities that conduct research on weapons andnaval propulsion systems. This could prove rele-vant to genome projects because several of thegroups that have been directly engaged in DOE’sHuman Genome Initiative are located at labora-tories with restricted title policies—namely, Law-rence Livermore National Laboratory and Los

Alamos National Laboratory, both operated by theUniversity of California. Regulations state that limi-tations on the contractor’s right to take title shouldbe restricted to ‘(inventions occurring under” na-val nuclear propulsion or weapons-related pro-grams [37 CFR Part 401.3(a)(4)]. This should per-mit the contractor to take title to inventions fromhuman genome projects because the projectswould not be conducted under restricted pro-grams, even at the affected laboratories. Negotia-tions between DOE and contractors are more com-plicated, however, when restrictions differ amongprograms at the same facility. Legislation has beenproposed to mandate patent policies for genomeprojects at the national laboratories; the policieswould be modeled on those of other researchagencies.

The regulations and executive orders imple-menting patent policies at research agencies arequite recent. It would be premature to alter thosepolicies fundamentally until the results of currentlaw can be assessed (with the possible exceptionof DOE policies regarding national laboratories,noted above).

There are additional roles for Congress. First,Congress could monitor the practices of Federalagencies and funding recipients to ensure that theintent of existing statutes is carried out. Second,Congress could increase resources to the Patentand Trademark Office to enable more efficientprocessing of patents. Third, Congress could in-crease resources for universities and other recip-ients in order to manage patent filing in the UnitedStates and abroad. Finally, Congress could askagencies engaged in genome projects to specifytheir patent policies more clearly, At present, writ-ten material on patent policies at NIH, DOE, andNSF is difficult to obtain, and there is no singlesource for information on patent policies at allagencies involved in genome projects [Rosenfeld,see app. A]. The interagency nature of genomeprojects means that recipient institutions will oftenbe funded by more than one agency. A clear pres-entation of patent guidelines at various agencies,with explanations of the advantages of early pat-ent filing and the implications of doing so (andnot doing so), might diminish confusion and pro-mote commercial application.

Page 169: Mapping Our Genes - Genome Projects: How Big? How Fast?

170

Copyrights

Copyright law is intended to protect works ofauthorship. It has traditionally been applied toworks of art, books, and articles but has had toadapt to technological change. Copyrights nowextend to computer software and electronic en-tertainment media, for example (6, 17). Copyrightis intended to protect the expression of ideas, notthe ideas themselves—a difficult but crucial dis-tinction.

The Copyright Act of 1976 is the most recentstatute relevant to genome projects, extending pro-tections to nontraditional media such as computersoftware. The extensions may also prove relevantfor research in molecular biology (6). Case lawhas evolved doctrines to test the distinction be-tween idea and expression and to define the scopeof protection. An author can prohibit others fromcopying his or her book, for example, but the con-cepts and methods described in the book are notprotected. Arguments have been made that copy-right could apply to DNA (6), but this line of argu-ment is not widely accepted and the scope of pro-tection (if it exists) is quite narrow (5). The abilityto copyright a native DNA sequence derived froma human chromosome or other natural source isparticularly uncertain (5). A preliminary commu-nication from the Copyright Registration Officeindicates that such sequences would not be ac-cepted, although the book or printed map con-taining them—the particular expression of mapor sequence data—would (10).

Even if DNA maps can be copyrighted, suchcopyrights are unlikely to inhibit research sub-stantially. In normal circumstances, obtaining acopyright does not require extra time and is thusnot a justification for delaying disclosure of re-sults. A company could charge for access to mapor sequence information in much the same waythat commercial databases charge for informa-tion sharing. Access and service charges are notnew—molecular biologists routinely pay for serv-ices that are less expensively or more rapidly per-formed by others. They buy copyrighted booksand read copyrighted journals. Many materialsused in biological research (clones, enzymes,chemicals) can be made by individual investiga-tors, but it is easier to purchase such materialsfrom a company set up to make them.

The type of research conducted by a privatecompany engaged in mapping and sequencingDNA would be feasible in a large number of lab-oratories. Copyrights would not prevent investi-gators from using information published or other-wise provided by a company or from duplicatingthe work. A company that has developed exten-sive map and sequence information would eithercharge so little that it is cheaper for a researcherto obtain it from the company than to do the work,or the researcher would in fact repeat the work.In either case, the community of researchers isno worse off than if the company had not mappedor sequenced.

If copyright practices prove to impede research,then agencies can take steps to correct the defi-ciencies. Agencies have much broader discretionfor copyright policies than for patents IRosenfeld,see app. A].

Trade Secrets

Information held by one company that is use-ful in its business and unavailable to competitorsis called a trade secret. Trade secrets can be pro-tected from misappropriation—that is, improperdisclosure–through the courts, which awardmonetary damages for unauthorized use. A tradesecret must be in continual use, be well establishedin practice, and have actual or potential commerc-ial value (19). The holder must take steps to guardit. Trade secrets do not involve slow and costlylegal steps for registration, their duration is notlimited by law, and they need not meet patentorcopyright criteria. Uncertainties about patents andcopyrights are not relevant (although legal criteriafor protection under trade secret laws must bemet). Trade secret protections are principally se-cured under State rather than Federal laws, andthere is some variation among the States. Tradesecrets have limited scope: In a rapidly movingfield they may not last long. Trade secret lawscannot ensure returns on a research investmentif another inventor discovers the secret methodor finds a new way to do the same thing. Protec-tion does not apply, even if competitors figure outthe secret by examining a product (reverse engi-neering). Most important, trade secrets must bekept secret. This would be quite difficult to justifyfor federally funded research.

Page 170: Mapping Our Genes - Genome Projects: How Big? How Fast?

171

The scientific equivalent of a trade secret is non-disclosure. This is referred to pejoratively as sit-ting on data and is widely viewed as improperbeyond the period needed to confirm accuracyof results and take advantage of a lead for fur-ther research. The period of nondisclosure varieswidely among researchers, even those in the samefield, Researchers who share data and materialsearly and freely are widely praised, such as themany collaborators who worked to find the mus-cular dystrophy gene (see ch. 3). But nondisclo-sure—for a few months to a year—is not uncom-mon in order to maintain a research advantageor to establish first discovery, even in researchleading to Nobel Prizes (or perhaps especially insuch research) (21,22). Permanent nondisclosureof an important result is, however, inimical to thepurpose of scientific inquiry—the discovery anddissemination of new knowledge.

Nondisclosure is of particular concern when theresults must be pooled in order to be useful (e.g.,maps derived from data contributed by variousgroups). The need for pooled data can create asituation known as the prisoner’s dilemma: whencooperation of all parties yields the maximum ben-efits, but one party can benefit if he does not co-operate and the others do, (So called becauseprisoners planning a jail break all benefit fromcooperation, but one stooge can benefit individu-ally by telling the guard of the plans.) An investi-gator searching for the location of an unknowngene stands to gain if other groups with markersmake them freely available but he does not. Hecan then use both his and others’ work to speedthe search, while denying others access to his mar-kers. Similar situations will arise in connectionwith submitting information to databases, send-ing materials to other researchers or central re-positories, and other cases directly related to ge-nome projects. Agencies will need to monitor thefree exchange of data and materials, particularlywhen the efforts must be collective, and take stepsto correct inequities. The need for joint effortshighlights the importance and fragility of col-laborative institutions such as the Center for theStudy of Human Polymorphism (CEPH) (see ch. 7).

Many journals have either explicit or unwrit-

ten policies that research data and materials de-scribed in an article must be made available toother researchers at the time of publication. Re-

searchers preserve their option for exclusive usefrom the time of discovery until publication. Manyscientists make materials available even beforepublication, which can require many months.Linking availability of materials to publication isa powerful mechanism, because one measure ofscientific prestige is priority—who discoveredsomething first. Priority is generally determinedby date of publication. In large collaborative sci-entific projects, mechanisms have evolved to per-mit scientists time to pursue hot research leadswhile ensuring that others gain fair access. (CEPH’Spolicy of sharing one set of data only among col-laborators and making another set publicly avail-able is an example.)

An informal policy of disclosure operates in Fed-eral agencies through the process of peer review,If a researcher is known to hoard data—and suchinformation spreads rapidly through scientificcommunities—then proposals submitted by thatindividual are unlikely to be given high priorityby study sections (7). Review groups withhold sup-port from research whose results they cannot see.This mechanism is slow—it can only be used whena grant is up for renewal, every 3 years or more—but it can be quite effective. If further measuresare needed, Federal agencies could require sub-mission of materials and data—map positions orDNA sequence data, for example—to the appro-priate database. Such a policy would not be eas-ily enforceable, however, and would be con-strained by investigators’ patent rights. Somejournals now require submission of DNA se-quences in proper form to GenBank” or its sisterdatabase in Europe at the time a paper is accepted.Agencies could devise incentives to make contri-bution of data and materials attractive, an alter-native that is more easily implemented and lesspolitically troublesome than negative sanctions.Those submitting data to CEPH, for example, ben-efit from knowing the position of their markersrelative to markers found by others. Persons man-aging the DNA sequence databases have contem-plated giving researchers a similar incentive,

Federal agencies have substantial power to re-quire disclosure when it does not impede gran-tees’ and contractors’ intellectual property rights.Grant recipients and contractors need ample timeto file patent applications, but legal protections

Page 171: Mapping Our Genes - Genome Projects: How Big? How Fast?

172

of intellectual property are unlikely to inhibit when broad access to data is necessary to fulfillagency policies promoting disclosure, particularly the agency’s mission.

INTERNATIONAL TECHNOLOGY TRANSFER

Human gene mapping is inherently internationalin scope. Recent breakthroughs in assemblingrough genetic maps, for example, have dependedon an international collaboration of investigatorsfrom Europe, North America, and Africa usingfamily data from four continents. Several currenttechnologies for sequencing and physical mappingwere developed in the United Kingdom and otherEuropean nations, not the United States; however,recent years have seen increased emphasis on re-taining the economic benefits of federally fundedresearch for the United States.

International technology transfer is the move-ment of inventions and know-how across nationalborders. Concerns about international technologytransfer fall into four areas: economic benefits,humanitarian and scientific benefits, national pres-tige, and military applications.

Economic Benefits

Concerns about economic implications of inter-national technology transfer focus primarily onthe export of jobs and services generated by re-search funded at public expense, Policies to com-bat this fall into three main areas: patent policies,restrictions on flow of information and materi-als, and promotion of domestic technology trans-fer so that benefits remain within nationalborders,

The patent policies described above have sev-eral provisions on international technology trans-fer that are relevant to genome projects. For for-eign recipients of Federal funds or those subjectto a foreign government, agencies must considerwhether the recipient’s government or companyenters into international cooperative fundingagreements on a “comparable basis” and whetherthe recipient’s government protects U.S. intellec-tual property rights [Executive Order 12591, Apr.10, 1987]. Recipients of Federal R&D funds mustensure that the products of the invention will be

“manufactured substantially in the United States”[35 U.S.C. 204]. Since jobs and economic wealthare linked more tightly to manufacturing than toinitial research and development, even foreign-held U.S. patents resulting from Federal fundingwould have economic benefits in the United States.Moreover, Federal agencies are not required togrant patent rights to foreign recipients or thosesubject to control of a foreign government, evenif they are universities or nonprofit organizations[37 CFR 401.14(a)(l)]. Foreign recipients are thusmanaged differently than their U.S, counterparts.Agencies could conceivably require foreign recip-ients to assign title to the U.S. Government or re-quire that U.S. research partners take title,

Exploiting federally funded research inventionsabroad will usually entail seeking foreign patents.Several international conventions govern patents,but conditions for granting patents differ amongnations. The United States permits a grace periodof one year from the date of publication to filea patent application, for example, but many othergovernments do not. If investigators wish to en-sure worldwide patentability, therefore, they mustfile foreign patents before publication, The periodof patent protection also differs. Researcher in-stitutions accepting Federal funds must knowabout these and other differences when makingdecisions about foreign patents, Disseminatingknowledge about such differences could be en-couraged by research agencies in concert withthe Department of Commerce. Agencies could alsoencourage institutions receiving Federal funds topursue foreign patents.

The current necessity for filing patents individu-ally in many countries is expensive and wastefulfor all nations. International patent policies havebeen discussed several times at meetings of theOrganization for Economic Cooperation and De-velopment. Attempts are being made to harmonizeinternational practices (14).

Page 172: Mapping Our Genes - Genome Projects: How Big? How Fast?

173

Humanitarian and ScientificBenefits

The humanitarian and scientific benefits of ge-nome projects will be great. The United States hasconsistently performed a significantly higher frac-tion of the total mapping and sequencing effortthan any other nation (see ch. 7). The knowledgeresulting from these efforts has been freely sharedwith the rest of the world, to the benefit of citizensof all nations. The scientific knowledge generatedat Federal expense since World War II may wellprove to be one of the most significant interna-tional contributions of modern American culture.

Imposing restrictions on the flow of informa-tion and scientific materials from U.S. research-ers to researchers abroad would be politicallytroublesome and technically difficult. Details ofwhat to share and what to restrict would be diffi-cult to describe in advance, and policies restrict-ing the flow of data are against scientific traditions,which transcend national borders. Withholdingmap locations and DNA sequence informationwould be a violation of scientific ideals, particu-larly when such information could be clinicallyuseful. Unilateral restrictions imposed by theUnited States would invite reciprocation, to thedetriment of worldwide scientific progress.

The same tradition of free international ex-change does not necessarily apply to the exchangeof services and products—for example, mappingservices, instruments, automation equipment, andreagents—which is governed more by interna-tional trade agreements than by scientific prac-tices. Many national governments wish to assisttheir companies in developing goods and servicesfor export. Genome projects focused on technol-ogy development are likely to be seen in this light.Nationalistic economic policies make projects todevelop instruments or other salable goods poorcandidates for international cooperation. Euro-pean nations may be exceptions, because theyhave a basis for cooperation through several bio-technology programs of the European EconomicCommunity.

Restrictions on international exchange of sci-entific personnel would disrupt many molecular

biology laboratories in the United States andabroad. The United States has often reaped thebenefits of international scientific exchange, Sen-ior scientists, postdoctoral fellows, and graduatestudents from other nations work in U.S. labora-tories and attend conferences. In exchange, U.S.scientists visit and are occasionally educated atuniversities and research centers abroad (12,22).The team of scientists that developed the atomicbomb for the U.S. Army, for example, was heav-ily dependent on scientists trained in Europe (1.2).Molecular biologists from abroad have often set-tled in the United States because it is so conduciveto scientific research; several Nobel laureates atAmerican universities immigrated during their sci-entific careers. Many projects in molecular biol-ogy have depended heavily on foreign scientistsworking in the United States, and many of thebest stay or eventually return (4). The UnitedStates may in fact benefit from international per-sonnel exchanges more than it is hurt by them.The Federal Government could nonetheless limitfunding of foreign researchers at U.S. institutions,although this would probably generate ill will andprovoke reciprocal actions by other governments.

One of the problems in assessing the potentialimpact of policies to reduce funding of foreignresearchers in American laboratories is the ab-sence of information about their research careers.If most foreign researchers remain in the UnitedStates or are particularly productive investigatorswhile receiving Federal funds, then policies to re-strict their ingress would be counterproductive.

Extending current restrictions or use of Fed-eral funds for American researchers to travelabroad would be even less politically acceptableand more difficult to implement. It would resultin direct loss of information to the United States,because persons traveling abroad are as likely toimport information from their foreign collabora-tors as to export it. Policies designed to inhibitthe exchange of personnel, materials, and infor-mation across national borders threaten benefitsbut gain little for the United States.

Promoting domestic exploitation and foreignpatenting of new technologies is a more positive

Page 173: Mapping Our Genes - Genome Projects: How Big? How Fast?

774

and less politically troublesome means to the sameend of improving U.S. economic competitiveness.Such policies can preserve the U.S. lead in researchwithout provoking retaliation or tarnishing thecountry’s prestige.

National Prestige

One argument for Federal sponsorship for ge-nome projects is that they are highly conspicu-ous and beneficial: Other nations will do the workif the United States does not, to the detriment ofUS. prestige. Similar arguments have been prof-fered for the supersonic transport, space pro-grams, and other technical projects. These argu-ments tie the stature of U.S. science andtechnology to leadership of genome projects. Theinternational prestige attached to genome projectsis a purely political judgment; it cannot be assessedtechnically.

What would be the consequences if Japan ora European nation were to have the first com-plete set of ordered DNA clones representing allhuman chromosomes, or the first reference se-quence of the human genome? Such questions arebest answered primarily by the scientific and tech-nical merits of the projects, not by an appeal toa vague notion of national prestige. If projects aretechnically unsound or uneconomical, then theUnited States would not benefit from a commit-ment to them. Other countries could do so, but

CHAPTER 8

1. Dashiell, T.R., “The Department of Defense and Bio-technology,” Technology in Society 8:223-228, 1986.

2. Dashiell, T. R., comments at Issues of Collaborationfor Human Genome Projects, OTA workshop, June26, 1987.

3. Fujimara, R. K., International Trade Administration,U.S. Department of Commerce, Biotechnology inJapan (Washington, DC: U.S. Government PrintingOffice, in press).

4. Hall, S. S., Invisible Frontiers: The Race To Synthe-size a Human Gene (New York, NY: AtlanticMonthly Press, 1987).

5. Issues of Collaboration for Human GenomeProjects, OTA workshop, June 26, 1987.

6. Kayton, I., “Copyright in Living Genetically Engi-

they would only be hurting themselves. If theprojects are technically sound, then the UnitedStates would do well to lead or at least partici-pate in them, but national prestige would not bethe principal justification for involvement. Nationalprestige is not a useful basis for judging majorscientific or technical projects.

Military Applications

Military applications of results of genomeprojects should not prove to be a major consider-ation in technology transfer. U.S. policies ban theexport of goods and technologies that could beused for military purposes by specified hostilecountries. Such policies are administered by theDepartment of Commerce in consultation with theDepartment of Defense and the Department ofState. The export of some goods produced usingbiological technologies could be affected (1), Atpresent, however, DNA mapping, sequencing, andother means of analysis relevant to genomeprojects are not on the list of controlled technol-ogies, and this should remain true for the fore-seeable future (2). The main reason is that the tech-nologies and data resulting from genome projectswould not have immediate military applications.Like other technologies and data, some could con-ceivably be used for a military purpose, such asdevising vaccines against biological warfareagents, but genome projects would not in them-selves promote biological warfare.

REFERENCES

7.

8.

9.

10.

neered Works, ” George Washington Law Review50:191-218, 1982,Kirschstein, R, comments at Issues of Collabora-tion for Human Genome Projects, OTA workshop,June 26, 1987.Kern, D., “Patent and Trade Secret Protection inUniversity-Industry Research Relationships in Bio -technology, ’’Harvard Journal on Legislation 24(win -ter):191-238, 1987.Kumin, L., Howard Hughes Medical Institute, Be-thesda, MD, personal communication, December1987.Library of Congress, Copyright Registration Office,Washington, DC, personal communication, May1987.

Page 174: Mapping Our Genes - Genome Projects: How Big? How Fast?

175

U.S. Government Printing Office, April 1986).18. U.S. Congress, Office of Technology Assessment,

Research Funding As an Investment: Can J1’e Meas-ure the Returns? OTA-TNI-SET-36 (Washington,DC: U.S. Government Printing Office, April 1986).

19. U.S. Congress, Office of Technology Assessment,Commercial Biotechnology: An international Anal-Ysis, OTA-BA-218 (Springfield, VA: National Tech-

11. Lowrance, W., comments at Issues of Collabora-tion for Human Genome Projects, OTA workshop,June 26, 1987.

12. Rhodes, R., The Making of the Atomic Bomb (NewYork, NY: Basic Books, 1986).

13. Rycroft, R. W., “International Cooperation in Sci-ence Policy: The U.S. Role in Macroprojects, ” Tech-nology in Societhv 5:51-68, 1983.

14. Tusso, B., Office of Economic Development, Paris, nical Information Service, January 1984).France, January 1988. 20.

15. U.S. Congress, General Accounting Office, PatentPolicy: Recent Changes in Federal Law ConsideredBeneficial, GAO/RCED-87-44 (Washington, DC: Gen- 21.era] Accounting Office, April 1987).

16. U.S. Congress, Office of Technolo&v Assessment, 22.New Developments in Biotechnology, 4.- U.S. Invest-ment in Biotechnology (Washington, DC: U.S. Gov- 23.ernment Printing Office, in press).

17. U.S. Congress, Office of Technology Assessment,Intellectual Property Rights in an Age of Electronicsand information, OTA-CIT-302 (Washington, DC:

Van Horn, C., personal communication, U.S. Pat-ent and Trademark Office, Washington, DC, Feb-ruary 1988.wade, N., The Nobel Duel (Garden City, NY: An-chor/Doubleday, 1981).Watson, J. D., The Double HeZix (New York, NY:New American Library, 1969).Yuan, R.T., International Trade Administration,U.S. Department of Commerce, Biotechnology inIVestern Europe (Washington, DC: U.S. Govern-ment Printing Office, 1987),

Page 175: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendixes

Page 176: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendix A

Topics of OTA Contract Reports

The following reports were prepared by outside con-tractors for the Office of Technology Assessment forthis assessment. They are available on microfiche oras hard copy from the National Technical InformationService (NTIS), 5285 Port Royal Road, Springfield, VA22161; te]: (703) 487-4650.

● Mapping Our Genes Contractor Reports, Vol.1, Order No. PB 88-160 783/AS—Bibliometric Analysis of Work on Human Gene

Mapping, Samuel R. Reisher and Michael B. Albert,CHI Research/Computer Horizons, Inc.

—Medical Implications of Extensive Physical and Se-quence Characterization of the Human Genome,Theodore Friedmann, Center for Molecular Ge-netics, School of Medicine, University of Califor-nia, San Diego

—Mapping the Human Genome: Some Ethical Im-plications, Jonathan Glover, New College, OxfordUniversity

—Mapping and Sequencing the Human Genome:Considerations From the History of Particle Ac-celerators, John L. Heilbron, University of Cali-fornia, Berkeley, and Daniel J. Kevles, CaliforniaInstitute of Technology

—Mapping the Human Genome: Historical Back-ground, Horace Freeland Judson, The Johns Hop-kins University

—Long-Term Implications of Mapping and Sequenc-ing the Human Genome: Ethical and Philosophi-cal Implications, Mark A. Lappe, College of Medi-cine, University of Illinois at Chicago

● Mapping Our Genes Contractor Reports, Vol.2, Order No, PB 88-162 805/AS—The Mapping and Sequencing of Genomes: A Com-

parative Analysis of Methods, Benefits and Dis-

benefits, Stephen M. Mount, Department of Bio-logical Sciences, Columbia University

—Mapping the Human Genome: Experimental Ap-proaches for Cloning and Ordering DNA Frag-ments, Richard M. Myers, Department of Physiol-ogy, University of California, San Francisco

—Mapping and Sequencing the Human Genome inEurope, Peter A. Newmark, Nature

—Application of Human Genome Mapping for theGlobal Control of Genetic Disease, Sir David J.Weatherall, Nuffield Department of Clinical Medi-cine, John Radcliffe Hospital, Oxford University

—In Search of the “Ultimate Map “of the Human Ge -nome: The Japanese Efforts, Akihiro Yoshikawa,Berkeley Roundtable on the International Econ-omy, University of California

OTA also sponsored two workshops during thisassessment, and the transcripts of those workshopshave been submitted to NTIS as:

Mapping Our Genes, Transcript of Work-shop ‘(Issues of Collaboration for Human Genome Projects,)) June 26, 1987, Order No. PB 88-162 797/AS; andMapping Our Genes, Transcript of Work-—shop “Costs of Human Genome Projects,Aug. 7, 1987, Order No. PB 88-162 813/AS.The following report was written for OTA but was

not sent to NTIS because it will be available to the publicthrough a law journal article:● “Sharing of Research Results in a Federally Spon-

sored Gene Mapping Project, ” Susan Rosenfeld, Sci-ence and the Law Committee, Association of the Barof the City of New York, August 1987; to be pub-lished by the Rutgers Computer and TechnologyLaw Journal, vol. 14, No. 2, 1988.

179

Page 177: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendix B

Estimated Costs of HumanGenome Projects

Congress has primary responsibility for funding re-search through Federal agencies because of its respon-sibility for the national budget each year. Appropriat-ing Federal funds for any special genome projects willtherefore fall to Congress. These appropriations willexpress Congress’ judgment regarding the relativevalue of genome projects. In setting appropriationlevels, Congress will weigh the costs of the programsagainst their anticipated benefits (in economic and so-cial terms) and will balance the value of proceedingagainst the costs of not doing so, as measured in lostbenefits or opportunities.

Proposals for genome projects are intended to sup-port research, but research needs are inherently un-predictable: Technological breakthroughs could dra-matically diminish budget needs, and unanticipatedobstacles could just as dramatically increase them. Esti-mates of near-term projects using existing technologiesare necessarily more accurate than future projects thatpresume technological developments. Costs for someof the larger components, such as sequencing signifi-cant portions of human or nonhuman DNA, hinge onunit costs that are highly uncertain now and are rap-idly changing due to technical advances (e.g., the costof sequencing a single base pair of DNA). These un-certainties suggest that a 5-year budget plan is the bestthat can be produced, and projected costs for eventhe first 5 years might need to be substantially revised.The costs of human genome projects can be separatedby components, although the boundaries betweensome of them are imprecise. The costs projected inthis appendix are based on a process followed by OTAto generate estimates from internationally recognizedexperts.

OTA Cost Estimates

In order to better estimate potential costs of humangenome projects, OTA held a workshop on August 7,1987. At that workshop, there was apparent consensuson rough estimated costs of several components andconfusion or disagreement about many others. A fol-low-up letter was sent to workshop participants andover 150 experts from executive agencies, universi-ties, and corporations to confirm estimates made atthe workshop and to expand them. Replies were re-ceived from over 70 persons. The revised cost estimateswere externally reviewed by over 100 individuals and

institutions in a draft report circulated in November1987, and some minor revisions are based on com-ments received during this review. The resulting costprojections attempt to include most of the direct costsof research. They do not include indirect costs ofuniversity administration (although they do includeadministration in Federal agencies).

In some cases, it may prove possible to attract fund-ing from the private sector—foundations, medical re-search institutes, or corporations. If so, Federal spend-ing could be correspondingly reduced. In many cases,however, the Federal Government will eventually paythe full costs. If a company developed mapping andsequencing information or new instruments, for ex-ample, the first-and for a longtime the predominant—users would remain researchers funded to do biomedi-cal research by the Federal Government. This wouldbe the case for most technologies developed as partof human genome projects (use by researchers beingthe primary goal of the enterprise). A company’s in-vestment would thus be charged back to the govern-ment by charging for use of information or purchaseof instruments by the research community. In somecases there may be a market for products outside thebiomedical research community. If so, the private sec-tor funds could indeed displace government funds.Funding from research foundations, medical institutes,and other philanthropies would also, as a rule, substi-tute directly for government costs.

There was strong consensus about the importanceand feasibility of improving the research infrastruc-ture (databases and repositories) and generatinggenetic and physical maps of human and nonhumanchromosomes; there was substantial uncertainty aboutsequencing strategies and their associated costs. It isagreed that the need for new technologies is para-mount, but there is disagreement about how much itwould cost to develop them or how such efforts shouldbe organized.

Discussion in the following sections reviews costsby component.

Computers and Computational Methods

Cost estimates for the necessary persomel, research,and equipment are $12 million per year for the earlyyears, increasing to 15 percent of the overall budgetas it exceeds $80 million annually. This would be in

180

Page 178: Mapping Our Genes - Genome Projects: How Big? How Fast?

181

addition to continued support of existing databases andcomputer facilities. Spending should be relatively flatover time, because hardware will have to be purchasedin early years and research will take an increasingproportion of the budget in later years. Hardware willhave to be upgraded, however, so cost estimates arenecessarily uncertain for future years. While it is log-ical to link computational needs to human genomeprojects, funding devoted to storage of genetic dataand sophisticated analysis of DNA will prove impontant in molecular biology even if maps are not com-pleted and other human genome projects are notfunded.

Genetic Maps

Genetic mapping has been conducted for severalyears, and a rough map of human chromosomes al-ready exists. Discussion at the OTA cost workshop cen-tered on a map with two to four times the resolutionof current maps. Subsequent letters and discussionshave centered on a further increase in resolution,preferably such that a gene being studied would beseparated from its closest DNA markers (on average)in only 1 of 100 family members. (Geneticists call thisa l-entimorgan map. ) Estimates based on existing pro-cedures yield annual costs of $6 million per year for5 years. Since this is an existing technology and thereare already facilities to do the mapping, a startupperiod is not needed. Funds saved from new methodscould be devoted to automating the processes so fur-ther refinements of maps in humans and other organ-isms would be easier to construct in the future.

The two principal groups constructing humangenetic marker maps to date have not been federallysupported. One has used private corporate funds, andthe other has been funded by the Howard Hughes Med-ical Institute (HHMI). HHMI-sponsored work is a nearlydirect substitute for goverment funding. Future workwould be of greater magnitude, however, and may re-quire Federal investment. In the case of work sup-ported by Collaborative Research (the largest corporategroup) and other companies, the Federal Governmentwill probably pay for access to the probes either asa lump sum (to obtain access for all federally fundedresearchers) or indirectly (as federally funded re-searchers pay for access to individual DNA markersor mapping services).

Physical Maps.

physical maps would be quite useful for future re-search. Ordered clone sets linked to them would beeven more useful. Pilot projects on selected humanchromosomes and on many lower organisms are in

progress, and a useful set of ordered clones from allthe human chromosomes may be feasible in the next5 to 10 years.

Projections based on existing technolo~ yield costsof $60 million for a usefully complete set of orderedcosrnid clones over 5 years, New technologies may per-mit the creation of ordered sets of much larger DNAfragments (using yeast artificial chromosomes, YACS),and these would be extremely useful also. Costs of con-structing ordered libraries composed of both cosmidand YAC clones are estimated at $7o million.

There are substantial uncertainties regarding bothtypes of clones. Physical mapping of human chromo-somes using cosmid clones has only begun in the lastyear, and therefore the rate and completeness of suchmapping are highly uncertain. Mapping with yeast arti-ficial chromosomes is much newer, although promis-ing. The main uncertainty regarding YACS is not cost,but feasibility: If such mapping is possible, it wouldbe substantially less expensive than mapping using cos-mids (although cosmid maps might be needed for manyresearch applications).

Ordered clone libraries are difficult to complete.Progress is rapid at first, but it is unlikely that a chro-mosomal region can be spanned without gaps betweengroups of continuous clones. Maps complete enoughto be useful can be expected from several years’ ef-fort, but if truly complete maps are necessary, thenefforts must be continued, perhaps at funding levelsequal to those for initial construction. Half or moreof the total effort maybe required for the last 10 per-cent of the maps. Clone libraries with gaps are quiteuseful, however, because a chromosomal region of in-terest is likely to be represented even in incompletelibraries.

Cost estimates start at $10 million for the first year(building on current Federal expenditures), rise to $2omillion in $5 million increments over 2 years, and thendrop to $10 million (with the proviso that continuedhigher funding may be necessary if complete maps aredeemed essential).

Projects To Link Genetic andPhysical Maps

Identifying the parts of DNA that carry the instruc-tions for making protein and integrating them intogenetic and physical maps would be very useful. Whichstretches of DNA are actually used to produce proteinvaries with the tissue (many genes are expressed differ-ently in different tissues or stages of development).The likely process would be to make DNA copies(cDNA) of the RNA that is translated into protein froma variety of tissues (’both healthy and diseased) and at

Page 179: Mapping Our Genes - Genome Projects: How Big? How Fast?

182

various stages of development. Locating cDNAs onphysical and genetic maps would result in cDNA maps.

Such maps could be used to pick out protein-codingregions along stretches of DNA of unknown function.This would make the physical map much more use-ful, by highlighting regions of particular interest, andwould provide a missing step in the search for geneswhose approximate location had been determined bygenetic linkage maps. DNA sequencing might also be-gin by using cDNAs to select regions likely to be ofinterest (because they are known to produce protein).Maps of cDNA would give clues to a gene’s functionif the pattern of expression related to a known bio-logical process. Comparison of cDNAs from human andother organisms can give clues to function by relatingexpression to degree of evolutionary relatedness. Ifa genetic disease is located in a certain chromosomalregion and cDNA maps show that one DNA segmentfrom that region is transcribed only in the tissue af-fected by the genetic disease, then the gene cor-responding to the cDNA is a good candidate for thegene causing the disease. Maps of cDNAs have beensuggested by several groups [2,3,11,12,13,15].

The first step in constructing cDNA maps would beto collect and organize existing sets of cDNA clones.New sets of cDNA clones could be made from missingtissues, disease states, developmental stages, or organ-isms. The various cDNAs could then be located ongenetic linkage maps and physical maps. The cost ofthis process is highly uncertain, in part because thenumber of genes in human and many other organismsis not known. Those specifically asked about this com-ponent estimated that its costs would likely range from$2 million to $5 million per year, depending on howmuch work could be done by merely cataloging exist-ing cDNA clone sets; how many new sets would haveto be constructed; how many organisms, tissues, de-

velopmental stages, and disease states would be usedas sources; and the extent of genetic and physical maps.The costs of cDNA mapping would increase with theincreasing detail of genetic and physical maps. OTAestimates start from a base of $2 million, increasingannually by $1 million increments.

Resource Material Repositories

Estimated costs of storing the clone sets linked tophysical maps, cell lines for genetic research, and thevarious DNA analytical materials for genetic mappingoriginally ran to over $250 million. The largest com-ponent, dwarfing all others, was the cost of storingthe DNA clones linked to physical maps. Such storagecosts are virtually prohibitive, and these estimates weredropped, Subsequent discussions with experts on stor-age of materials for molecular biology, specifically withpersons at the American Type Culture Collection,yielded storage estimates an order of magnitude lower.The estimates summarized in table B-1 are for collec-tion and storage of clone sets. Costs of disseminationwould be borne by users through user fees. Costs ofcollecting and storing mutant cell lines and DNA ana-lytical materials (such as probes) have not been in-cluded..

Sequencing

There is little consensus on how much DNA sequenc-ing should be done as part of genome projects, par-ticularly whether a complete human reference se-quence should be an objective. There is consensus,however, that sequencing technology is crucial andripe for innovation. Cost projections should becomeeasier in 2 to 3 years, as the first automated DNA se-quencing machines are improved; massive sequenc-ing would not begin in most schemes for several years,

Total. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 68 131 186 228

Page 180: Mapping Our Genes - Genome Projects: How Big? How Fast?

183

except as part of pilot projects and for DNA regionsknown to be of special interest [6]. The debate aboutsequencing involves disagreement about the costs ofsequencing per base pair, the amount of redundancynecessary to make a sequence useful, the expectedpace of technological improvements, which laboratorypreparation steps are included, and how much DNAwould be sequenced as part of human genome projects(rather than through traditional funding mechanisms).Estimates of the cost of sequencing vary widely, rang-ing from several pennies to several dollars per DNAbase [6,16], but there is some agreement that costswould drop to $0.20 to $0.30 per base pair by the endof 1988, based on existing technologies. Some of thediscrepancy in the estimates comes from includingdifferent components. The costs of special cloning pro-cedures, preparing DNA, use of reagents, techniciantime, and capital costs of instrumentation should allbe included in cost estimates.

Judgments about which technologies to use and howmuch sequencing should be performed are best madeeach year by an advisory committee with access totechnical experts. Such judgments would presumablybe based on costs, the availability of material to se-quence, and consensus on which regions to sequencefirst. For OTA projections, a few assumptions havebeen made. For the first 2 years’ budget, sequencingwould be covered as technology development—per-formed on lower organisms or human chromosomalregions of known interest —for possible sequencing ona larger scale. For years 3 to 5, it would be based onsequencing one small chromosome per year at $0.20per base pair ($30 million per year, based on threefoldredundancy and 50 megabases per year), permittinga phase-in period for implementation of the technol-ogies. This estimate is for purposes of budgeting only,however, and could prove wildly high or low. If thetechnologies for cloning, preparing DNA for sequenc-ing, and finally determining a DNA sequence becomesignificantly cheaper, as some experts predicted at theOTA workshop, the amount of DNA sequenced at thatcost could be increased. If costs remain high, only alimited amount of DNA could be sequenced, accord-ing to priorities set by the oversight board. After thefifth year, the budget could go up or down in propor-tion to need.

Quality Control and ReferenceStandards

The large amounts of map and sequence informa-tion and new materials created by human genomeprojects will be useful only if the information is ac-curate and resource materials are cataloged reliably.

If there are many different groups involved in the ef-forts, problems of quality control could impede use-ful applications. The scope and magnitude of this prob-lem will become clear only when the technologies aredefined and the results of mapping and sequencingefforts begin to accumulate. Special budget allocationsfor comparing results from different groups or toestablish measurement standards may become neces-sary. Budget needs for quality control will be nil in thefirst year and will grow in early years until they con-stitute 5 percent of the overall budget. For initial esti-mates, it is projected to grow by $1 million per yearfrom a base of zero.

Technology Development

Investments in methods and instruments associatedwith genome projects are likely to lead quickly to com-mercial applications. The objective for technology de-velopment is open-ended, however, and it could beeither the largest component or a relatively small frac-tion of genome projects. Responses to OTA letters anddrafts showed no consensus on the proper budget.Many scientists familiar with industrial developmentencouraged higher figures, while academic molecu-lar biologists set lower ones. A maximum figure of $500million to be spent over 5 years was mentioned at theOTA workshop, in line with recommendations of acommittee established by the Department of Enery(DOE). There was some support for the alternative ofdevoting 25 percent of the total budget to technologydevelopment [6]. Minimum estimates were for a steadystate of $20 million to $30 million. Several individualsnoted the importance of developing technologies earlyon, while recognizing the need to keep early budgetsrealistically low because a new research programwould require the accumulation of trained personneland pilot work to provide a foundation for later work.

The approach used in OTA estimates is to increasefunding from $10 million the first year to a stable fig-ure of $100 million by yearly increments. Funding forbiological instrumentation centers under the NationalScience Foundation might account for part of this, andmethods or instruments of great interest to industrymight lead to some cost sharing with private firms.If so, Federal funding could be reduced accordingly.Technology development funding, like sequencing, isamong the most flexible of the proposed projects andcould be adjusted by the oversight board and the con-gressional appropriations process,

Training of Personnel

Training of investigators and scientific exchangeamong participants are crucial and would include grad-

Page 181: Mapping Our Genes - Genome Projects: How Big? How Fast?

184

uate and postgraduate fellowships, scientific work-shops, and national scientific meetings. Some personsurge that fellowship funds be targeted to shortageareas, but others believe that targeted programs areless effective than untargeted ones for the best peo-ple in any relevant discipline. If training were targeted,it might include development of dual expertise in com-puters and molecular biology, organic chemistry andmolecular biology, engineering and molecular biology,and clinical medicine and informatics or molecular bi-ology. Training would also be needed for technicians,and for sabbaticals for scientists interested in shiftingfrom their fields to genome projects. Workshopsamong participating groups and national symposiumsto communicate results would permit rapid dissemi-nation of new methods and insights. Exchange pro-grams among industrial, national laboratory, and aca-demic scientists would promote technology transfer.Training and personnel costs are estimated to merit10 percent of each annual budget. For initial projec-tions, funding might start at $4 million and increaseyearly by $2 million.

Administrative Costs

Participants in the August 1987 OTA workshop esti-mated that 1 to 3 percent of each year’s budget wouldbe needed for administrative overhead. That estimatewas subsequently increased to 5 percent in responseto letters and after analyzing administrative costs atFederal research agencies. Administrative costs includeoperation of a national advisory board; oversight ofdatabases, repositories, networks, and other services;setting instrumentation standards for cloning, map-ping, and sequencing technologies; administration ofgrants and contracts; and other purposes. Some addi-tional features would be unique to genome projects,for example, analysis of likely social impacts and ethi-cal dilemmas created or intensified by genome projects.The need for such analysis has been explicitly notedin hearings and has been highlighted by researchagency administrators and congressional staff. It couldbe obtained through grants to bioethicists, lawyers,economists, and social scientists for publications orworkshops on various topics.

Summary

The costs of the components of human genome proj-ects are projected in table B-1. These would start froma base of $47 million in fiscal year 1989 (if 1989 werethe first year) and increase to $228 million in fiscalyear 1993. It is not useful to project budgets beyondthen, because technological development is so uncer-tain. The projected figures do not attempt to assign

functions to particular agencies, merely to state over-all direct research costs. Future budgets will need tobe revised in light of actual appropriations.

History of Earlier Estimates

Perhaps the earliest evidence of a human genomeproject is found in a letter from Robert L. Sinsheimer,then Chancellor of the University of California, SantaCruz (UCSC), to University of California PresidentDavid Pierpont Gardner, on November 19, 1984. A po-tential benefactor had withdrawn support from aproject, and Sinsheimer took the opportunity to pro-vide a counterproposal that might interest the benefac-tor. In doing so, Sinsheimer suggested that a humangenome institute be founded at UCSC, with startupcosts of $25 million and an annual operating budgetof $5 million. This was, in effect, the first cost esti-mate for a human genome project.

The letter from Sinsheimer to Gardner referred toan enclosure, later to be used as a basis for discussionat the May 1985 Santa Cruz Human Genome Work-shop, in which the institute was formally proposed [71.The proposal assumes existing mapping technologiesand a continued rate of development of DNA sequenc-ing speed equal to the exponential increase of the pastdecade. The proposal then concludes that “within afew years, the human genome could be reduced toan ordered set of cloned fragments” and that “50 tech”nicians could approach completion of the [sequencing]project in 10-20 years.” The proposal estimates theyearly support of each technician at $100,000, yield-ing an annual budget of $5 million and a total projectcost of about $100 million. The proposal also calls for$25 million for startup facilities, and it distributes theoperating money among the mapping and sequencingproject itself (75 percent), developing techniques (10percent), application to basic biology and medicine (10percent), and education and training of students andother personnel (5 percent).

The Santa Cruz workshop displays similar optimismabout the mapping aspect of a genome project, sug-gesting that a physical map could be completed by a20-person group in 3 to 5 years [14]. The workshopalso included discussion of a restriction fragmentlength polymorphism (RFLP), or genetic, map. This mapcould be achieved in “a few years” at a resolution finerthan 50 centimorgans. Based on then current technol-ogies, sequencing the 3 billion base pairs of the hu-man genome was taken as “not feasible.” The work-shop went beyond the initial proposal and discusseddetails about the computer requirements for a project.There was, however, no explicit cost estimate for thesedetails,

Page 182: Mapping Our Genes - Genome Projects: How Big? How Fast?

185

The next round of cost estimates came out of DOE’sworkshop in Sante Fe in March 1986. Appended to theworkshop notes and the correspondence the workshopgenerated between the participants and DOE’s MarkBitensky was a cost estimate by Christian Burks of theLos Alamos National Laboratory. Burks calculates theperson-years required for various aspects of theproject, which, for a physical mapping and sequenc-ing endeavor, including computer and administrativecosts and assuming some sequencing advances, totals3,505 person-years [5]. Allowing for hardware andoverruns of his estimate, Burks concludes a genomeproject would cost between $0.5 and $2.5 billion.

The next major meeting, at Cold Spring Harbor, fo-cused primarily on sequencing. The only estimate toissue from the discussion was the oft-quoted 30,000person-years required to sequence the human genomeone time through [8]. This estimate—translated into$3 billion by either $1 per base pair or $100,000 perperson-year–was based solely on existing technologyand was therefore obsolete within days of the confer-ence, when the automated sequencer at the Califor-nia Institute of Technology was announced [18].

By the middle of 1986, the Caltech sequenator hadmade it clear that advances in sequencing technologywould drive costs down. HHMI’s Informational Forumat the National Institutes of Health in Washington, D. C.,continued to quote the 30,000 person-year estimate,but it also cautiously offered an estimate of 300 person-years, assuming a two-order-of-magnitude increasefrom automation [17]. The HHMI forum likewise gavea dual estimate for the physical map (200 person-years,or 30 to 40 with automation advances) [4], and for com-puter storage of sequence information ($0.30 per basepair, $0.03 with advances) [11.

Nine months later, DOE brought out its own cost esti-mates, presented as a yearly budget for a genomeproject. In the Health and Environmental ResearchAdvisory Committee report, the subcommittee scien-tists estimate that sequencing, with redundancy foraccuracy, would cost $60 million, assuming advancesin automation [19]. Sufficient automation should beavailable 5 years hence [10]. The remainder of the bud-get is not described in detail, but it does specify that$500 million will go to various aspects of technologi-cal development, including mapping and informatics,assuming $100,000 per person-year of research [9]. Thetotal for the DOE-proposed projects comes to $1.02billion. Table B-2 presents a summary of estimates.

The National Research Council of the National Acad-emy of Sciences established the Committee on Map-ping and Sequencing the Human Genome, whose re-port was released in February 1988 [13]. That reportrepresents the views of an exceptionally distinguishedpanel of experts from diverse scientific backgrounds,

The panel members began their deliberations withwidely differing knowledge of the state of gene map-ping and divergent opinions about the merit of specialresearch efforts. While writing the report, the panelreached a consensus that a special effort was meritedand recommended additional funding of $200 millionper year. This level would be reached over the initial3 years. During the first few years, the budget wouldbe roughly divided into $120 million for research in10 or so multidisciplinary centers and numerous smallresearch groups. Construction and materials wouldcost $55 million per year, and $25 million would Oper-ate repositories, databases, training, and administra-tive functions. In later years, the budget would increasefor dedicated production of map and sequence data.This $200 million annual budget would continue untilat least the year 2000.

Appendix B Refemmces

1. Be]], George 1., comments at Informational Forum on theHuman Genome, sponsored by Howard Hughes MedicalInstitute, July 23, 1986, at the National Institutes of Health,Bethesda, MD.

2. Berg, Paul, comments at Costs of Human Genome Projects,OTA workshop, Aug. 7, 1987.

3. Berg, P., Stanford University Medical Center, personalcommunication, January 1987.

4. Brenner, Sydney, comments at Informational Forum onthe Human Genome, sponsored by Howard Hughes Med-ical Institute, July 23, 1986, at the National Institutes ofHealth, Bethesda, MD.

5. Burks, Christian, “The Cost of Sequencing the CompleteHuman Genome,” Santa Fe Genome Sequencing Work-shop, Mar. 3-4, 1986.

6. Costs of Human Genome Projects, OTA workshop, Aug.7, 1987.

7. Edgar, Bob, Noller, Harry, and Ludwig, Bob, “Human Ge-nome Institute: A Position Paper, ” enclosure in personalcorrespondence from Robert L. Sinsheimer, Universityof California at Santa Cruz, to David Pierpont Gardner(Nov. 19, 1984) distributed at the Santa Cruz Human Ge-nome Workshop, May 24-25, 1985.

8. Gilbert, Waker, comments at Molecular Biology of Homosapiens, Cold Spring Harbor Laboratories symposium,May 28-June 4, 1986.

9. Hood, L., California Institute of Technology, personalcommunication, June 1987.

10. Hood, L. California Institute of Technology, personal com-munication to David Guston, June 1987.

11. Karnei, R., and Stocker, J.T., Armed Forces Institute ofPathology, personal communication, December 1987.

12. McKusick, V. A., and Ruddle, F.H., “Toward a CompleteMap of the Human Genome,” Genornjcs 1:103-106, 1987.

13. National Research Council, National Academy of Sciences,Report of the Committee on Mapping and Sequencingthe Human Genome (Washington, DC: National AcademyPress, 1988).

Page 183: Mapping Our Genes - Genome Projects: How Big? How Fast?

186

14. Notes and Conclusions, Santa Cruz Human GenomeWorkshop, sponsored by Department of Energy and LosAlamos National Laboratory, June 4, 1985.

15. Philipson, L., and Tooze, J., “The Human Genome Project,”BioFutur, June 1987, pp. 94-101.

16. Roberts, L., “New Sequencers to Take on the Genome, ”Science 238:271-273, 1987.

17. Smith, David, remarks at Informational Forum on theHuman Genome, sponsored by Howard Hughes MedicalInstitute, July 23, 1986, at the National Institutes of Health,Bethesda, MD.

18. Smith, Lloyd M., Sanders, Jane Z., Kaiser, Robert J., etal., “Floumscence Detection in Automated DNA SequenceAnalysis,” Nature, 321:674-678, 1986.

19. Subcommittee on the Human Genome, Health and Envi-ronmental Research Advisory Committee, Report on theHuman Genome Initiative, prepared for the Office ofHealth and Environmental Research, Office of EnergyResearch, Department of Energy (Germantown, MD:DOE, April 1987).

Table B-2.–Comparison of Genome Cost Estimates (millions of dollars (M) or person-years (py))

UCSCa position paper1 1/19/64 . . . . . . . . . .

UCSC workshop5124-5185 . . . . . . . . .Sante Fe workshop3186 . . . . . . . . . . . . .

Cold Spring HarborSymposium

5/28-6/2/66 . . . . . . . .

HHM1/NIH

150 py administration $50&2,500

OTAd

$500 M technology $1,020 M

$ 60 Mlyr 10 centersa $3,000 M$60 M&r grants and (over 15 yrs.)

technologydevelopment forsmall groups

$55 M/yrc facilityconstruction (earlyyears, decreasinglater)

$25 Mlyradministration,quality control,advisory committeefunctions

$200 M/yr

Page 184: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendix C

Participants in OTA workshops

ISSUES OF COLLABORATION FOR HUMAN GENOME PROJECTS

Workshop Sponsored by theOffice of Technology Assessment

and theHoward Hughes Medical Institute

June 26, 19872322 Rayburn House Office Building

United States CongressWashington, DC

ChairmanWilliam W. Lowrance, Ph.D.

DirectorLife Sciences and Public Policy Program

The Rockefeller University

Bernadette Alford, Ph.D. Susan Rosenfeld, Esq.Legal and Regulatory Affairs Science and the Law CommitteeCollaborative Research Association of the Bar of the

George F. Cahill, M.D.City of New York

Vice President for Scientific Training Frank Ruddle, M.D.and Development Department of Biology

Howard Hughes Medical Institute Yale University

C. Thomas Caskey, M.D. Robert Smith, Ph.D.Director History of Science DepartmentInstitute for Molecular Genetics The Johns Hopkins UniversityBaylor College of Medicine Robert E. Stevenson, Ph.D.James F. Childress, Ph.D. DirectorDepartment of Religious Studies American Type Culture CollectionUniversity of Virginia

Susan Cozzens, Ph.D.Department of Social SciencesIllinois Institute of Technology

George Gould, Esq.Associate Patent CounselHoffman LaRoche

Geoffrey M. Karny, Esq.Dickstein, Shapiro & Morin

Charles E. Van HornDirectorOrganic Chemistry and BiotechnologyU.S. Department of Commerce, Group 120Patent and Trademark Office

LeRoy Walters, Ph.D.DirectorCenter for BioethicsKennedy Institute of EthicsGeorgetown University

187

Page 185: Mapping Our Genes - Genome Projects: How Big? How Fast?

188

COSTS OF HUMAN GENOME PROJECTSWorkshop Sponsored by the

Office of Technology Assessment

August 7, 1987Office of Technology Assessment Conference Center

Washington, DC

ChairmanPaul Berg, M.D.

Willson ProfessorDepartment of Biochemistry

Stanford University Medical Center

Christian Burks, Ph.D. Keith McKenney, Ph.D.Los Alamos National Laboratory Center for Chemical Physics

George F. Cahill, M.D.National Bureau of Standards

Vice President for Scientific Training and Mark L. Pearson, Ph.D.Development Director of Molecular Biology

Howard Hughes Medical Institute E.I. du Pent de Nemours & Co.

Anthony V. Carrano, Ph.D. Robert E. Stevenson, Ph.D.Biomedical Sciences Division DirectorLawrence Livermore National Laboratory American Type Culture Collection

Helen Donis-Keller, Ph.D. John Sulston, D. Phil.Collaborative Research, Inc. MRC Laboratory of Molecular Biology

Walter Gilbert, Ph.D.Cambridge, England

Biological Laboratories James D. Watson, Ph.D.Harvard University Director

Leroy Hood, M.D.ChairmanDivision of BiologyCalifornia Institute

Cold Spring Harbor Laboratory

of Technology

Page 186: Mapping Our Genes - Genome Projects: How Big? How Fast?

Atmendix I)

Databases, Repositories, andInformatics

Among the most useful products of genome projectswill be information and materials—information aboutgenes and their locations and sequences, and biologi-cal materials such as DNA fragments from chromo-somes of known pedigree, ordered cosmids, andclones. Proper management of data and materials isessential to increase the efficiency and productivityof research and to reduce duplication of efforts so thatgenome projects can succeed in meeting the needs ofmedical scientists and molecular biologists in this cen-tury and the next.

Existing databases and repositories that gather, main-tain, analyze, and distribute data and materials arealready struggling to keep up with the exponentialgrowth of molecular biology. Present capabilities willhave to expand greatly to handle the increase of infor-mation resulting from a targeted set of genome proj-ects. while it is logical to link computational needsto genome projects, however, funding devoted tostorage of genetic data and materials and to sophis-ticated analysis of DNA will prove important inmolecular biology even if a major mapping and sequencing initiative is not undertaken. Because theessential databases, repositories, and linking computernetworks provide goods and services for the entireresearch community, the Federal Government has along-standing tradition of supporting them and is ina unique position to further enhance the resources.

This appendix describes some existing databases andrepositories and outlines present and future databaseneeds relevant for human genome projects specificallyand molecular biology in general.

Databases

Various databases exist that serve the needs of re-searchers in genome mapping and sequencing (see ta-ble D-l). One set of databases gathers, stores, and dis-tributes information directly related to genetic mapsand physical maps. Some databases specialize in mapand sequence information from one specific genome—for example, there are databases exclusively devotedto the mouse, E. coli bacteria, drosophila, and nema-tode genomes—while others carry particular kinds ofinformation from all the relevant genomes. Other data-bases gather data on the sequences and structures ofproteins and amino acids that are not direct resultsof mapping and sequencing research but are neces-

sary for addressing basic research problems under-pinning genome research. The data from the different types of maps and from different species haveimportant interconnections, so it is essential thatthe information be linked for comparative studies.

Genetic Maps

Genetic maps can be generated in several ways (seech. 2). Pedigree analysis of linked traits yields a mapin which traits can be ordered sequentially and witha rough estimate of the distance between them. RFLPsand other DNA probes can help link the traits withspecific genes or regions of DNA to produce more re-fined maps. Maps of the functional regions within in-dividual genes aid in the search for underlying causesof genetic diseases and for the mechanisms by whichgenes control development and function. severaldifferent databases serve the different informationneeds for specific kinds of maps.

On-Line Mendelian Inheritance in Man (OMIM).–An atlas of human traits that are known to be inher-ited-expressed genes—has been compiled into a refer-ence work known as Mendelian Inheritance in Man,which has been published in seven editions. The list-ing has been edited by Victor McKusick of The JohnsHopkins University since 1966. As of March 1, 1988,4,336 traits had been identified as genetically based,including over 2,000 diseases.

Since 1986, the Howard Hughes Medical Institute(HHMI) has supported computerization of the list, andit is now accessible for on-line searches free of charge(4). It is cross-referenced in the Human Gene MappingLibrary so that information on expressed genes canbe linked to map data.

Human Gene Mapping Library (HGML).–A1so calledthe New Haven Database, HGML consists of five linkeddatabases-one each for map information, relevantliterature, RFLP maps, DNA probes, and contacts (re-searchers with information on data or materials). Inaddition, the map database is linked to OMIM. All ofthe databases are cross-referenced, so that data abouta gene or probe of interest can be drawn from all fiveduring the same search (10).

DNA Nucleotide Sequences

Databases containing raw DNA sequences, informa-tion about the origin of the DNA segment sequenced

189

Page 187: Mapping Our Genes - Genome Projects: How Big? How Fast?

— .

190

Table D-- I.—Some Existing U.S. Databases and Repositories

Location Funding source Annual budgeta

Nucleotide sequence data:GenBank@ Los Alamos National Laboratory,

Intelligenetics Corp., CA

Genetic map data:On-Line Mendelian Johns Hopkins University

Inheritance in Man (OMIM) Baltimore, MD

Human Gene Mapping Library New Haven, CT(HGML)

Protein and amino acid sequence and structure data:Protein Identification National Biomedical Research

Resource (PIR) FoundationWashington, DC

Protein Data Bank (PDB) Brookhaven National LaboratoryUpton, NY

Repositories:American Type Culture Rockville, MD

Collection (ATCC)/HumanDNA Probe andChromosome Library

Human Genetic Mutant Cell Coriell Institute for Medical ResearchReDositorv Camden, NJ

NIH, b DOE, NSF, USDA $3,500,000

Johns Hopkins University, $ 550,000’HHMI, NLM

HHMI $ 500,000

NIHd $ 500,000

NSF, NIH,e DOE $ 260,000

NIH f $ 300,000’

NIHg $ 750,000

aBudget figures are approximate. Several of the databases have multiyear contracts; amount listed Is the average YearlY allOtment.%IH sponsors of GenBankQ, pact and present, include the National Institute of General Medical Sciences (N IGMS), the Division of Research Resources (DRR), the

National Institute for Allergy and Infectious Diseases, the National Cancer Institute, the National Library of Medicine, the National Eye Institute, and the NationalInstitute of Diabetes and Dlseaaes of the Kidney. The NIGMS administers the contract and coordinates the funding.

‘The Johns H@ins university contribution to OMIM is difficult to meagure, because it includes many indirect factors (Staff sIJppOrf, SpaCe, publication COStS, etC.).

HHMI contributes S318,000 and the NLM $100,000 annually.dThe NIH sponsor iS DRR.eNIH sponsors are NIGMS and DRR.fNIH sponsors are the National Institute of Child Health and Human Development (NICHD) and DRR; DOE has contributed some funds through DRR.

gThe NIH sponsor is NIGMS.

SOURCE: Office of Technology Assessment, 1988.

(which gene, which organism), and various annotationsthat summarize information about important featuresin the sequence (sites cut by DNA-cutting enzymes, reg-ulatory sequences, protein-coding regions) will bedirectly affected by genome projects that emphasizesequencing. The major databases for nucleotide se-quences are GenBank@ and its European counterpart,EMBL (8). Each carries sequence data and related in-formation for the human genome as well as bacterial,yeast, fruit fly, mouse, and other genomes. Since 1982,GenBank@ and EMBL have split the task of data collec-tion, with each database monitoring specific journalsin molecular biology to locate and enter sequence data,and they cooperate closely in sharing and distributingit. They have recently been joined by the DNA DataBank of Japan (DDBJ), which is in charge of monitor-ing Asian journals and contributing to the reciprocalexchanges. (DDBJ served primarily as an access nodeto GenBank” and EMBL starting in 1984, but did notstart gathering its own data until 1987.)

GenBank”.-GenBanko originated at the DOE’s LosAlamos National Laboratory in 1979 and started to re-ceive funding from the NIH in 1982. It is the majorU.S. database for nucleic acid sequence information

from humans and other organisms (3). GenBank” ispresently administered and receives a major portionof its funds from the National Institute of General Med-ical Sciences (NIGMS) of NIH. Data are entered and up-dated by curators at Los Alamos and are distributedby Intelligenetics Corp. (Mountain View, CA).

The amount of data contained in GenBank” hasgrown exponentially since its inception. In addition,the number of users has increased from a small setof one hundred or so who accessed it when the firstNIH contract started to tens of thousands of scientistswho now access either directly or through commer-cial distributors. GenBank”’s new 5-year contract,which took effect in October 1987, significantly in-creases funding to meet the growing demand.

Protein and Amino AcidSequences and Structures

Databases that gather information on protein andamino acid structure and function are crucial for theapplication of genomics research to clinical and phar-maceutical problems, as well as for advancing the un-derstanding of basic problems in biology-how genes

Page 188: Mapping Our Genes - Genome Projects: How Big? How Fast?

191

function, how they code for proteins and enzymes,and how their protein products are structured andfunction (see ch. 2). The effects of map and sequencedata on these databases will depend on the strategyfollowed for genome projects. For example, a con-certed nucleotide sequencing effort would affect re-search on protein and amino acid structure moreslowly than increased funding to researchers study-ing specific genes and their gene products—generallyproteins (6).

Protein Identification Resource (PIR).-PIR is “a re-source designed to aid the research community in theidentification and interpretation of protein sequenceinformation” (14). It contains sequence data for pro-teins and amino acids, with annotations that indicateknown functional regions. PIR is run by the nonprofitNational Biomedical Research Foundation and receivesmost of its funding from NIH’s Division of ResearchResources. Modest user fees cover the distributioncosts; academic users pay a flat fee, while commercialusers are charged by the amount of computer timethey use. PIR has recently started cooperating withthe Japan International Protein Database (JIPID) andthe new European database, Martinsreid Institute forProtein Sequence Data (MIPS), to establish an interna-tional data network for protein sequences.

Protein Data Bank (PDB).–The Protein Data Bankwas founded in 1971 as “an international computer-ized archive for structural data on biological macro-molecules” (1), [t gathers information on the atomiccoordinates of the structure of nucleic acids, mes-senger RNA, amino acids, proteins, and carbohydratesthat have been derived from crystallographic studies.Structural information is a vital link in the understand-ing of how proteins function, which eventually leadsto knowledge of the mechanisms of genetic disease andsuggests possible directions for rational drug design.

PDB is based at DOE’s Brookhaven National Labora-tory and supported primarily by NSF, with additionalfunds from the National Institute of General MedicalSciences of NIH. Modest user fees help cover the costsof distribution. Use of the database has been growingrapidly and is predicted to continue growing in paral-lel with human genome projects. Linking PDB withdatabases that contain genetic map and sequence in-formation will enhance the long-term goals of humangenome research (12).

Present and Future NeedsThe many types of information that are produced

in molecular biology necessitate the maintenance ofa ~’arietuv of specialized databases. At the same time,however, the information in different databases mustoften be combined in order to understand the fulldimensions of an-y specific research problem. It is cru-

cial for the scientific community to be able to ac-cess information on a topic of interest from a vari-ety of databases that may handle different aspectsof the problem. Thus databases must use standard-ized or easily translatable formats and they must beinterconnected. The problem of format has been rec-ognized and is being addressed in scientific meetings,by database advisers, and by funding agencies. Sev-eral programs are underway to improve the linkagesbetween databases. An experimental project at the Na-tional Library of Medicine, discussed below, will de-velop a system to link a variety of databases relevantto molecular biology.

The speed with which data are entered into the data-bases has been a major concern. The exponential in-crease in data has not always been matched by in-creases in the support for databases and personnel tooperate them, causing a lag time of several monthsor even years between the publication of data and theirentry, in fully annotated form, into databases. If thelag time is excessive, the efficiencies of centralized datamanagement and retrieval are lost. One solution thatis being explored is the direct submission of data tothe databases by the researchers as a requirement forpublication in journals. At least one journal has alreadyagreed to cooperate with GenBank@ and EMBL in anattempt to speed acquisitions in this way (19). Anotherpossibility is to encourage funding agencies to makethe submission of data or materials to the appropriatedatabases a condition of receiving research grants. Theautomation of data entry will be necessary as theamount of data increases. Automated methods are al-ready under development; the capacity to enter datamay be built into some automated sequencing ma-chines.

The timely exchange of data is also affected by is-sues of intellectual property rights and technologytransfer. Open and rapid exchange of information andmaterials speeds research and is particularly impor-tant when the data have medical or clinical implica-tions. If the data and materials become commerciallyvaluable, however—and many researchers predict thatthey will—the values of open access and free exchangecould clash with the desire to protect proprietary rightson potentially patentable data or materials. Becauseaccess to databases and repositories is international,there are concerns that U.S.-funded research couldbe commercialized by other countries. The problemsare not intractable, however: There are set’eral suc-cessful precedents of advance contracts that specifvhow data will be contributed to databases while pro-tecting property rights (4,21). (See also ch. 8.)

A major problem faced by databases for the past dec-ade has been insufficient funding for handling the ex-ponential increase of data. Costs will continue to rise

Page 189: Mapping Our Genes - Genome Projects: How Big? How Fast?

——. .— —

792

as more map and sequence data are generated. Thegovernment agencies and other organizations that sup-port genome projects appear to recognize the impor-tance of continued funding for relevant databases. Forexample, the increased budget in the new GenBank@

contract (for 1987 through 1991) indicates that fund-ing agencies are aware of the need to enhance data-base maintenance. An initiative within the National Li-brary of Medicine to strengthen information resourcesfor molecular biology and biotechnology (discussed be-low) should lend further support to databases neededfor genome projects. The Howard Hughes Medical In-stitute has been particularly active in supporting data-base resources and networks to link them. It is essen-tial that financial support continue to keep pace withthe growing body of data.

Repositories

Genome projects will generate biological materialsas well as sequence and map data. Access to these ma-terials is a key element in making the map informa-tion useful. A scientist searching for a gene of unknownlocation would want to have access to a panel of DNAmarkers that could give an approximate location, thena more closely spaced set of markers to locate it moreprecisely. Once the gene’s location was established onthe genetic map, the investigator would select DNAclones covering that region of the human chromosomesfrom a repository, thus obtaining the DNA encodingthe gene. Each of these steps would require access toa set of cloned DNA fragments. Existing repositoriesare hardly sufficient, but how much must be investedin them will depend on conclusions on the value ofcentralized sources rather than housing materials inindividual labs.

Companies developing a new product derived fromor related to a human gene would also wish to haveaccess to such materials in many instances. Storageand handling of such DNA resources is thus a crucialfunction. The materials will be most widely useful ifthey are stored at national collection and storage fa-cilities. DNA probes, vectors, and some other materi-als are best maintained at a facility such as the Amer-ican Type Culture Collection (ATCC). Others, such ascell lines derived from individuals and families withgenetic diseases, are stored in the Human Genetic Mu-tant Cell Repository in Camden, New Jersey. Other ma-terials that are unlikely to have substantial demandfrom a wide variety of investigators might be storedat the laboratories that generated them and distrib-uted on a more informal basis to those requesting them.Present methods and technologies for the amplifica-tion, characterization, storage, and distribution of ma-terials are expensive and time-consuming; the costs

of storage could become a major component of map-ping and sequencing projects. Newer and cheaper stor-age methods will have to be developed as productionof DNA fragments increases. The development of auto-mated techniques for organizing, managing, and ac-cessing materials will be necessary; research on auto-mated repository management is already underwayat ATCC and at DOE’s Los Alamos National Labora-tory (11, 21).

Even with the advent of automated repository man-agement techniques, however, the high cost of stor-ing and maintaining materials makes the selection ofmaterials to collect particularly crucial. While it mightbe desirable to keep large collections of clones gener-ated in an attempt to develop libraries of overlappingclones or contigs (see ch. 2), the curators of reposi-tories and the scientists who use them will have tochoose which materials are of utmost importance, andthese decisions should be periodically reviewed (22,23).

American Type Culture Collection

The ATCC maintains a variety of different collectionsof animal, plant, and bacterial cell lines, hybridomas,phage, and recombinant DNA vectors, as well as anNIH-sponsored repository of human DNA probes andchromosome libraries (20). The collection of chromo-some libraries includes materials from DOE’s NationalGene Mapping Library (see ch. 5). The ATCC ampli-fies and stores samples and distributes them, alongwith pertinent information, to investigators for a nomi-nal fee. Investigators must agree not to use the mate-rials for commercial purposes nor to sell them,

The repository maintains a database of informationon the source and characteristics of the material inits collection. Its advisory committee has recommendedthat the database be included in a mapping databasesuch as HGML.

Human Genetic Mutant Cell Repository

Sponsored by the National Institute of Genera] Med-ical Sciences of NIH, the Human Genetic Mutant CellRepository was founded in 1972 to maintain a collec-tion of well-characterized human cell cultures (2,17).The cultures are available to investigators worldwideat a nominal fee. The repository contains over 4,OOO

individual cultures, which represent more than 400genetic diseases and 700 to 800 chromosomal aberra-tions (7). The curators of the collection have increas-ingly sought to include material from multigenerationalfamily groups for linkage analysis; the repository nowmaintains cell lines from the Venezuelan Huntington’spedigree (see box 7-A) and others such as cystic fibro-sis families, families with fragile X-linked mentalretardation, and so on.

Page 190: Mapping Our Genes - Genome Projects: How Big? How Fast?

793

Data Analysis, In formatics, andComputer Resources

Development of analysis methods to search for andcompare sequence information, to predict sequencesthat code for proteins and the structures of those pro-teins, and to aid in other aspects of the analysis of datafrom genome projects will eventually need to utilizeparallel processing techniques and the capacity of su-percomputers. Most researchers agree that the hard-ware to tackle the complex problems of sequence anal-ysis and comparison already exists but that satisfactorysoftware must be developed. The DOE, the NIH, theNLM, and the NSF support various programs andgrants for the development of software to representand analyze data and for the development of computerresources such as supercomputing centers and com-puter networks. Several of these resources are de-scribed below. Numerous private firms are develop-ing or marketing computer programs that searchdatabases or analyze data on nucleic acid or proteinsequences.

BIONET TM

BIONETTM is a nonprofit computer network run byIntelligenetics, Inc. [Mountain View, CA) and fundedby the Division of Research Resources of NIH and bymodest user fees (13). Its goals are to “provide compu-tation assistance in data analysis and problem solvingto molecular biologists and researchers in related field,to serve as a focus for the development and sharingof new software, and to promote rapid sharing of in-formation and collaboration among a national commu-nity of scientists” (9). BIONETTM provides access toseveral major databases (GenBank”, EMBL, PIR, PDB,and databases of restriction enzymes and plasmid vec-tors) as well as to software for analyzing nucleic acidand protein sequences. The network also aids com-munication between its members through a series ofbulletin boards on topics of user interest and throughan electronic mail system. BIONETTM serves users inthe United States, Canada, and Europe.

National Biotechnology InformationCenter

The National Biotechnology Information Center isan initiative to develop and enhance a range of toolsfor molecular biology information that is being spon-sored by the National Library of Medicine (NLM) (18).The project is presently the subject of several authori-zations bills but has already received some appropria-tions for a range of projects, including the buildingand maintenance of databases, developing a compre-

hensive listing of existing databases, and improving in-formation retrieval systems. NLM has already devel-oped a prototype of a retrieval system, called theInformation Retrieval Experiment (IRX) that connectsdata from several different databases and graphic andvisual sources. For example, a database search for aspecific disease gene will yield information on whetherthe gene has been mapped, the map of the gene ingraphic form, bibliographic information on publica-tions about the map, as well as information on clinicalsymptoms, diagnosis, and visual representations of af-fected patients (X-rays, diagrams, photos, and so on).The NLM initiative will enhance the management ofdata from genome projects and will forge links betweeninformation from many areas of molecular biology toaid in basic and biomedical research (15). The NLMis in an advantageous position to coordinate databaseactivities through its expertise in handling informa-tion through existing literature databases such asMEDLINE.

The Matrix of Biological KnowledgeWorkshop

The Matrix of Biological Knowledge Workshop, amonth-long conference held during the summer of1987, was an attempt to formulate models and makerecommendations for the organization of knowledgeand data from all disciplines in biology (16). R was spon-sored by the NIH, the DOE, the Sloan Foundation, andthe Santa Fe Institute.

The workshop grew out of the efforts of a commit-tee sponsored by the NIH that attempted to set forthand evaluate models used in biomedical research. Sev-eral scientific meetings prior to the workshop had ad-dressed the particular complexities of biological data;at the workshop, biologists, computer scientists, anddatabase experts actually tried to work out some ofthe problems raised at earlier meetings. Participantsat the workshop issued the following general recom-mendations:

. . . that support for a centrally coordinated effort toestablish a knowledge base of databases in the biologi-cal sciences be aggressively pursued; that the currentindependent efforts to establish interdatabase struc-tures and analysis tools be coordinated with a long-termview towards maximum integration; . . . that these co-ordinated efforts incorporate the most up-todate com-puter science and analytical methods; and finally, thatthese activities directly involve the experimental andbiotechnology communities in order to ensure the uti]-ity of the ensuing developments (16).

These recommendations appear to reinforce the direc-tion of ongoing efforts in agencies that sponsor data-bases. The specific recommendations issued by work-

Page 191: Mapping Our Genes - Genome Projects: How Big? How Fast?

194

ing groups in each of seven broad categories may proveuseful for the future management of databases in allof biology.

1.

2.

3.

4.

5.

6.

7,

8.

9.

10.

Appendix D References

Abola, E. E., Bernstein, F.C., and Koetzle, T.F., “The Pro-tein Data Bank,” in P.S. Glaeser (cd.), The Role of Datain Scientific Progress (New York, NY: Elsevier SciencePublishers, 1985).Aronson, M. M., Miller, R.C., Nichols, W .W., et al., “Break-point Map of Human Translocation Cell Cultures Avail-able From the NIGMS Human Genetic Mutant Cell Re-pository,” Journal of Cytogenetics and Cell Genetics30:179-189, 1981.Burks, C., Fickett, J.W., Goad, W .B., et al., “The GenBank”Nucleic Acid Sequence Database,” Computers Applica-tions in Bioscience 1:225-233, 1985.Cahill, G.C., Vice President for Scientific Training andDevelopment, Howard Hughes Medical Institute, Be-thesda, MD, personal communication, December 1987.Cassatt, J., National Institutes of Health, personal com-munication, April 1987.George, D., Protein Identification Resource, National Bio-medical Research Foundation, Washington, DC, personalcommunication, December 1987.Greene, A. E., Human Genetic Mutant Cell Repository,Coriell Institute for Medical Research, Camden, NJ, per-sonal communication, March 1988.Harem, G., and Cameron, G., “The EMBL Data Library,”Nucleic Acids Research 14:5-9, 1986.Introduction to BZONET Mountain View, CA: Intel-ligenetics, Inc., 1987).Kidd, K., Lecture at “Genomics Meeting: Assessing theRepository, Informatics, and Quality Control Needs,” Law-rence Livermore National Laboratory, Livermore, CA,Aug. 26-27, 1987.

11. Knobeloch, D., and Beugelsdijk, T.J., ‘(Automated Sam-ple Management: Organizing, Managing, and Accessing

12.

13.

14.

15.

16.

17%

18.

19.

20.

21.

22.

23

the DNA Fragments Generated in the Process of Map-ping and Sequencing the Human Genome,” in Repository,Data Management, and Quality Assurance Needs for theNational Gene Library and Genome Ordering Projects,see ref. 23.Koetzle, T., Protein Data Bank, Brookhaven National Lab-oratory, Upton, New York, NY, personal communication,September 1987.Kristofferson, D., “The BIONETTM Electronic Network,”Nature 325:555-556, 1987.Ledley & Barker, Protein Identification Resource projectdescription, November 1987.Masys, D., National Library of Medicine, Bethesda, MD,personal communication, December 1987.Morowitz, H.J., and Smith, T. (eds.), Report of the Matrixof Biological Knowledge Workshop (Santa Fe, NM: SantaFe Institute, 1987).“The National Institute of General Medical Sciences Hu-man Genetic Mutant Cell Repository, ” Somatic Cell andMolecular Genetics 12:421, 1986.National Library of Medicine, “Biotechnology Informa-tion: A Plan for the National Library of Medicine,” un-published report, 1987.“A New System for Direct Submission of Data to theNucleotide Sequence Data Banks,” Nucleic Acids Reseamh15, No. 18, 1987.Nierman, W.C., Benade, L. E., and Maglott, D. R., Ameri-can Type Culture Collection: NIH Repository of HumanDNA Probes and Libraries (Rockville, MD: American TypeCulture Collection, 1987).Stevenson, R., American Type Culture Collection, Rock-ville, MD, personal communication, October 1987.U.S. Congress, Office of Technology Assessment, “Costsof Human Genome Projects,” workshop held Aug. 7, 1987(see app. A).U.S. Department of Energy, Office of Health and Environ-mental Research, and National Institutes of Health, Reposi-tory, Data Management, and Quality Assurance Needsfor the National Gene Library and Genome OrderingProjects, unpublished report from a workshop, August1987.

Page 192: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendix E

Bibliometric Analysis ofHuman Genome Research

Computer Horizons, Inc. (CHI) was hired by the Of-fice of Technology Assessment to conduct a bibliomet-ric analysis of work on human gene mapping, includ-ing an international bibliography of the most relevantliterature. This bibliography centered on an examina-tion of the growth of relevant scientific literature keyedto the words “Gene or Genes or Genetic,” “Marker orLinkage or Map,” and “Human. ” Additional key wordcombinations included “Human Chromosome, ” “Hu-man DNA Sequence, “ “Human Nucleic Acid Sequence, ”“Human Restriction Fragment Length Polymorphism,”and various combinations designed to select paperson methods and techniques of DNA analysis.

The use of publication counts as a measure of re-search activity is part of the field of bibliometrics. Agrowing body of research has demonstrated the use-fulness of bibliometric techniques: Counts of scientificpapers and the numbers of citations to them have beenshown to be indicators of research productivity. Limi-tations to bibliometric techniques do exist, particularlyin balancing the treatment of non-English publications.Since the literature of science is dominated by English-speaking researchers, there is an inherent bias againstcitations of foreign-language publications. In the pres -

ent case, however, the primary purpose of the litera-ture search was to provide indicators of growth ratherthan to develop specific bibliographies. As such, theanalysis clearly demonstrated a rapid growth in thescientific literature related to mapping and sequenc-ing the human genome, and an acceleration of thisgrowth over very recent years.

Over 11,000 entries of relevant literature were pre-sented in the bibliography, which scanned appropri-ate publications from 1977 through 1986. The litera-ture search included journals published in English,French, German, Dutch, Italian, Polish, Japanese, Span-ish, Russian, Bulgarian, Swedish, Finnish, Norwegian,Danish, and Hebrew. All entries were subsequentlygrouped by OTA into the country or region of originto identify national and regional trends in research.The regions included the United States, Western Euro-pean countries, Japan, and other non-European coun-tries. The table below presents the results. The datawere used as the basis for figures 7-1 and 7-2 and ta-ble 7-1 in chapter 7, which display the growth in thetotal number of articles on human gene mapping andsequencing and the breakdown by country or region.

Annual Publications in Human Genetics:Articles Published on Human Genes or Genetic Markers and Linkage Maps

Year 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986

United States. . . . . . . . . . . . . . . 187 218 235 314 308 364 487 577 689 818

Japan . . . . . . . . . . . . . . . . . . . . . 7 11 17 22 32 45 39 58 67 85Western Europe

Denmark . . . . . . . . . . . . . . . . .Federal Republic of

Germany. . . . . . . . . . . . . . .Finland . . . . . . . . . . . . . . . . . .France . . . . . . . . . . . . . . . . . .Italy . . . . . . . . . . . . . . . . . . . . .Netherlands . . . . . . . . . . . . . .United Kingdom . . . . . . . . . .Other. . . . . . . . . . . . . . . . . . . .

6 14 9 7 12 7 9 12 21 25

203

218

153231

236

348

254919

148

369

175746

337

4215204630

42

5 ;24136645

415

5715188844

519

6444259762

6912703925

12669

78 10015 1494 11444 6650 45

184 18570 92

Other Non-European countriesAustralia. . . . . . . . . . . . . . . . .Canada . . . . . . . . . . . . . . . . . .Eastern Europe and

U.S.S.R. . . . . . . . . . . . . . . . .South Africa . . . . . . . . . . . . .Other. . . . . . . . . . . . . . . . . . . .

212

817

1117

1728

1814

2229

2426

2338

20 3860 68

230

2032

176

20

41

217

35

79

388

3375

366

3357

333

4187

364

3261

516

42

81

60 6216 987 6395 101Uncertain . . . . . . . . . . . . . . . . . .

Total . . . . . . . . . . . . . . . . . . . . 419 516 618 735 772 899 1,070 1,298 1,650 1,885SOURCE: Office of Technology Assessment, 19S8.

195

Page 193: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendix F

Acknowledgements

OTA wishes to acknowledge the many people con-tributed to the preparation of this report. Members ofthe advisory panel and contractors are listed at thefront of the report. Workshop participants are listedin appendix C. Others helped by commenting on draftsof the report, providing information or advice, or con-senting to interviews with OTA staff. OTA particularlythanks those at the National Institutes of Health, theDepartment of Energy, the National Science Founda-tion, the Howard Hughes Medical Institute, and otherorganizations for their invaluable efforts to informOTA about their activities. Our gratitude is extended to:

Carlos AbellaEmbassy of SpainBruce M. AlbertsUniversity of California alDuane AlexanderNational Institute of Child

DevelopmentNorman AndersonLarge Scale BiologyWyatt AndersonUniversity of GeorgiaDavid G. BaldwinTetrarch Inc.David BaltimoreWhitehead InstitutePhil Beachy

San Francisco

Health and Human

Carnegie Institution of WashingtonGeorge I. BellLos Alamos National LaboratoryCeleste BergCarnegie Institution of WashingtonFred BergmannNational Institute of General Medical SciencesMichel BernonEmbassy of FranceRalph BledsoeDomestic Policy CouncilLars BolundUniversity of Aarhus, DenmarkJudith BostockOffice of Management and BudgetDavid BotsteinGenentech, Inc.

James E. BowmanUniversity of ChicagoElbert BranscombLawrence Livermore National LaboratoryDouglas BrutlagStanford UniversityMartin BuechiEmbassy of SwitzerlandJohn BurrisNational Academy of SciencesAndrew BushU.S. SenateMark CantleyCommission of the European Economic CommunityCharles R. CantorCollege of Physicians and Surgeons of

Columbia UniversityC. Thomas CaskeyBaylor College of MedicineJames CassattNational Institute of General Medical SciencesJames W. ChamberlainU.S. Embassy, BrazilAndrew T. L. ChenCenters for Disease ControlJames F. ChildressUniversity of VirginiaGeorge M. ChurchHarvard University Medical SchoolMary ClutterNational Science FoundationStanley CohenStanford Medical SchoolFrancis CollinsUniversity of MichiganP. Michael ConneallyIndiana University Medical CenterCheryl CorsaroNational Institutes of HealthAlan CoulsonMRC Molecular Biology LaboratoryCambridge, United Kingdom

Page 194: Mapping Our Genes - Genome Projects: How Big? How Fast?

197

Charles L. CoulterDivision of Research ResourcesNational Institutes of HealthDavid CoxUniversity of California at San FranciscoJames F. CrowUniversity of WisconsinTerry CurtinU.S. SenateKay E. DaviesOxford UniversityBernard D. DavisHarvard University Medical SchoolRonald DavisStanford UniversityMichael DeanNational Cancer InstituteLarry L. DeavenLos Alamos National LaboratoryAlbert de la ChapelleUniversity of HelsinkiEnrique Martin de] CampoOrganization of American StatesCharles DeLisiMount Sinai School of MedicineVincent DeVitaNational Cancer InstituteDenis DewezEmbassy of BelgiumRussell F. DoolittleUniversity of California, San DiegoJanet DoriganOffice of Science and Technolo~ PolicyRenato DulbeccoThe Salk InstituteIrene EckstrandNational Institute of General Medical SciencesPeter FarnhamAmerican Society for Biochemistry and

Molecular BiologyJames FickettLos Alamos National LaboratoryJohn C. FletcherUniversity of VirginiaDonald S. FredricksonNational Institutes of HealthJean FrezalHospital for Sick Children, Paris

F. Edwin FroehlichU.S. SenateRobert FujimuraOak Ridge National LaboratoryDavid GeorgeNational Biomedical Research FoundationFrank GibsonThe Australian National UniversityPaul GilmanU.S. SenateAlan GoldhammerIndustrial Biotechnology AssociationGeorge M. GouldHoffmann-La Roche, Inc.Denise GreenlawU.S. SenateSantiago GrisoliaUniversity of Kansas Medical CenterRalph HardyBoyce Thompson InstituteWendy HarrisJohns Hopkins University PressNemat HashemAin Shams UniversityCairo, EgyptCyril G. HideSouth African EmbassyC. Edgar HildebrandLos Alamos National LaboratoryJoseph R. HlubucekEmbassy of AustraliaSverker HogbergSwedish EmbassyMichael HunkapillerApplied Biosystems, Inc.Thomas IsenhourUtah State UniversityTrefor JenkinsUniversity of the WitwatersrandJohannesburg, South AfricaChalmers JohnsonUniversity of California, BerkeleyIrving S. JohnsonEli Lilly & Co.Eric T. JuengstThe Pennsylvania State UniversityRobert F. KarneiArmed Forces Institute of Pathology

Page 195: Mapping Our Genes - Genome Projects: How Big? How Fast?

198

Steven KeithUS. SenateMichael J. KellyIntelligenetics, Inc.Kenneth KemphuesCornell UniversityThomas KoetzleProtein Data Bank, Brookhaven National LaboratoryGeorge S. KoppU.S. House of RepresentativesArthur KornbergStanford UniversityDavid KristoffersonBIONET/IntelligeneticsLouis KunkelHoward Hughes Medical Institute, BostonAlphonse LafontaineMinisterie van Volksgezondheid en van het GezinBrussels, BelgiumRoe LairdMinistry of State/Science and TechnologyCanadaEric LanderWhitehead InstituteNorman LatkerU.S. Department of CommerceEileen LeeU.S. House of RepresentativesHans LehrachImperial Cancer Research FundRachel LevinsonNational Institutes of HealthJack G. LewisUniversity of Southern CaliforniaJiayao LiThe Embassy of the People’s Republic of ChinaWashington, D.C.Donald A.B. LindbergNational Library of MedicineJohn LogsdonGeorge Washington UniversityJeffrey T. LutzU.S. Embassy, IndonesiaJerold R. MandeU.S. SenateEmmanuele MannarinoU.S. Embassy, ItalyDaniel R. MasysNational Library of Medicine

Kenichi MatsubaraOsaka UniversityJapanSunil MaulikBIONET/IntelligeneticsGeorge MazuzanNational Science FoundationJack McConnellJohnson & JohnsonVictor A. McKusickThe Johns Hopkins UniversityMortimer L. MendelssohnLawrence Livermore National LaboratoryBruce MerrifieldU.S. Department of CommerceBradie MethenyTricom, Inc.Jerome P. MikscheU.S. Department of AgricultureSankar MitraOak Ridge National LaboratoryJan MohrInstitute of Clinical Genetics, University of CopenhagenRobert G. MorrisU.S. Embassy, ArgentinaDiane MortonCornell UniversityJay MoskowitzNational Institutes of HealthTom MurrayCase Western Reserve UniversityDeLill NasserNational Science FoundationDorothy NelkinNew York UniversityNorrine NoonanOffice of Management and BudgetStephen O’BrienNational Cancer InstituteMaynard V. OlsonWashington University School of MedicineGilbert S. OmennUniversity of WashingtonStuart OrkinHoward Hughes Medical Institute, BostonJoseph OstermanU.S. Army Medical Research and Development

Command

Page 196: Mapping Our Genes - Genome Projects: How Big? How Fast?

199

Joseph PalcaNatureM. Iqbal ParkerUniversity of Cape TownJane PetersonNational Institute of General Medical SciencesUlf PettersonUniversity of Uppsala, SwedenKate PhillipsCouncil on Governmental RelationsBetty PickettDivision of Research ResourcesNational Institutes of HealthMaya PinesHoward Hughes Medical InstituteGeorge PosteSmith Kline & French LaboratoriesMichael ProbertMedical Research CouncilUnited KingdomTheodore T. puckEleanor Roosevelt Institute for Cancer ResearchRobert RabinNational Science FoundationAlan S. RabsonNational Cancer InstituteLisa J. RainesIndustrial Biotechnology AssociationWilliam F. RaubNational Institutes of HealthJeremy RifkinFoundation for Economic TrendsJerry H. RobertsNational Institutes of HealthLeslie RobertsScience

Richard RobertsCold Spring Harbor LaboratoriesThomas RollinsU.S. SenateLeon E. RosenbergYale UniversityLuigi Rossi-BernardoConsiglio Nazionale della Ricerche, ItalyJohn RothUniversity of UtahLesley RussellU.S. House of Representatives

Joseph SambrookUniversity of Texas Health Science CenterMona SarfatyU.S. SenateD. SchmitzU.S. Embassy, Federal Republic of GermanyDavid SchwartzCarnegie Institution of WashingtonCharles ScriverMcGill University andScience Council of CanadaGerald SeizerNational Science FoundationLeroy C. SimpkinsUS. Embassy, MexicoMichael SimpsonLibrary of CongressRobert SinsheimerCalifornia Institute ofMark SkolnickUniversity of UtahBent Skou

Technology

Royal Danish EmbassyDavid A. SmithU.S. Department of EnergyLloyd SmithUniversity of WisconsinRobert SmithThe Johns Hopkins UniversityTemple SmithDana Farber Cancer InstituteRand SnellU.S. SenateM. Anne SpenceUniversity of California, Los AngelesJ. Claiborne StephensHuman Gene Mapping LibraryRobert E. StevensonAmerican Type Culture CollectionIrene Stith-ColemanLibrary of CongressJ. Thomas StockerU.S. Department of DefenseGary StormoUniversity of ColoradoWilliam SzkrybaloPharmaceutical Manufacturers’

Association

Page 197: Mapping Our Genes - Genome Projects: How Big? How Fast?

200

Bruna TessoOrganization for Economic Cooperation and

DevelopmentIgnacio TinocoUniversity of California, BerkeleyKevin M. UlmerSeQ, Ltd.Victor L. UrquidiEIColegio de Mexico, A.C.Paul Van BelkomNational Health and Medical Research Council

Commonweakh of AustraliaDorle VawterUniversity of Minnesota Health CenterRobert WalgateThe Panes InstituteBertil WennergrenSwedish Commission on Genetic

Engineering

Norman WhiteleyApplied Biosystems, Inc.Robert WilliamsonSaint Mary’s Hospital, LondonJoAnn WiseUniversity of IllinoisCarl WoeseUniversity of IllinoisJohn WooleyNational Science FoundationJames B. WyngaardenNational Institutes of HealthDouglas J. YarrowEmbassy of the United KingdomPhilip YouderianUniversity of Southern CaliforniaDebbie ZuckerU.S. Senate

Page 198: Mapping Our Genes - Genome Projects: How Big? How Fast?

Appendix G

Glossary

Alleles: Alternative forms of a genetic locus; allelesare inherited separately from each parent (e.g., ata locus for eye color there might be alleles resultingin blue or brown eyes).

Amino acid: Any of a group of 20 molecules that com-bine to form proteins in living things. The sequenceof amino acids in a protein is determined by thegenetic code.

Autoradiography: A technique that uses X-ray filmto visualize radioactively labeled molecules or frag-ments of molecules; used in analyzing length andnumber of DNA fragments after they are separatedby gel electrophoresis.

Autosome: A chromosome not involved in sex deter-mination. The diploid human genome consists of 46chromosomes, 22 pairs of autosomes and 1 pair ofsex chromosomes.

Base pak Two nucleotides (adenosine and thymidineor guanosine and cytidine) held together by weakbonds. Two strands of DNA are held together in theshape of a double helix by the bonds between basepairs.

Centimorgam A unit of measure of recombination fre-quency. One centimorgan is equal to a 1 percentchance that a genetic locus will be separated froma marker due to recombination in a single genera-tion. In human beings, 1 centimorgan is equivalent,on average, to 1 million base pairs.

Cloning: The process of asexually producing a groupof cells (clones), all genetically identical to the origi-nal ancestor. In recombinant DNA technology, theprocess of using a variety of DNA manipulation pro-cedures to produce multiple copies of a single geneor segment of DNA,

Complementary DNA, cDNA DNA that is synthesizedfrom a messenger RNA template; the single-strandform is often used as a probe in physical mapping.

Contigs: Groups of clones representing overlapping,or contiguous, regions of a genome.

Crossing over The breaking during meiosis of onematernal and one paternal chromosome, the ex-changing of corresponding sections of DNA, and therejoining of the chromosomes.

Gvalue paradox: The lack of correlation between theamount of DNA in a haploid genome and the bio-logical complexity of the organism. (C-value refersto haploid genome size,)

Determinisnx The theory that for every action takenthere are causal mechanisms such that no other ac-tion was possible.

Diploid: A full set of genetic material (two paired setsof chromosomes), one from each parental set. Allcells except sperm and egg cells have a diploid setof chromosomes. The diploid human genome has46 chromosomes. Compare haploid.

DNA, deoxyribonucleic acid: The molecule that en-codes genetic information. DNA is a double-strandedmolecule held together by weak bonds between basepairs of nucleotides. There are four nucleotides inDNA: adenosine (A), guanosine (G), cytidine (C), andthymidine (T). In nature, base pairs form only be-tween A and T and between G and C, thus the se-quence of each single strand can be deduced fromthat of its partner.

DNA probes: Segments of single-strand DNA that arelabeled with a radioactive or other chemical markerand used to identify complementary sequences ofDNA by hybridizing with them. See hybridization.

DNA sequence: The relative order of base pairs,whether in a stretch of DNA, a gene, a chromosome,or an entire genome.

Domain: A discrete portion of a protein with its ownfunction. The combination of domains in a singleprotein determines its unique overall function.

Double helix: The shape in which two linear strandsof DNA are bonded together.

Electrophoresis: A method of separating largemolecules (such as DNA fragments or proteins) froma mixture of similar molecules. An electric currentis passed through a medium containing the mixture,and each kind of molecule travels through themedium at a different rate, depending on its elec-trical charge and size. Separation is based on thesedifferences.

Enzyme: A protein that acts as a catalyst, speedingthe rate at which a biochemical reaction proceedsbut not altering its direction or nature.

Eukaryote: Cell or organism with membrane-bound,structurally discrete nucleus and other well-developed subcellular compartments. Eukaryotes in-clude all organisms except viruses, bacteria, andblue-green algae. Compare prokaryote.

Eugenics: Attempts to improve hereditary qualitiesthrough selective breeding. See posititive eugenics,negative eugenics, eugenics of normalcy.

Eugenics of normalcy: Policies and programs in-tended to ensure that each individual has at leasta minimum number of normal genes.

Exons: The protein-coding DNA sequences of a gene.Compare introns.

201

Page 199: Mapping Our Genes - Genome Projects: How Big? How Fast?

202

Gamete Mature male or female reproductive cell witha haploid set of chromosomes (23); that is, a spermor ovum.

Gene The fundamental physical and functional unitof heredity. A gene is an ordered sequence of nucleo-tides located in a particular position on a particularchromosome. See gene expression.

Gene expression The process by which a gene’s blue-print is converted into the structures present andoperating in the cell. Expressed genes include thosethat are transcribed into mRNA and then translatedinto protein and those that are transcribed into RNAbut not translated into protein (e.g., transfer andribosomal RNAs).

Gene families: Groups of closely related genes thatmake similar products.

Gene product: The biochemical material, either RNAor protein, made by a gene. The amount of geneproduct is used to measure how active a gene is;abnormal amounts can be correlated with disease-causing genes.

Genetic code: The sequence of nucleotides, coded intriplets along the mRNA, that determines the se-quence of amino acids in protein synthesis. The DNAsequence of a gene can be used to predict the mRNAsequence, and the genetic code can in turn be usedto predict the amino acid sequence.

Genetic engineering technologies: See recombinantDNA technologies.

Genetic linkage map: A map of the relative positionsof genetic loci on a chromosome, determined on thebasis of how often the loci are inherited together.Distance is measured in centimorgans.

Genetics: The study of the patterns of inheritance ofspecific traits.

Genomtx All the genetic material in the chromosomesof a particular organism; its size is generally givenas its total number of base pairs.

Genome projects: Research and technology develop-ment efforts aimed at mapping and sequencing someor all of the genome of human beings and otherorganisms.

Genomic library: A collection of clones made froma set of overlapping DNA fragments representingthe entire genome of an organism. Compare library.

Haploid: A single set of chromosomes (half the fullset of genetic material), present in the egg and spermcells of animals and in the pollen cells of plants. Hu-man beings have 23 chromosomes in their repro-ductive cells. Compare diploid.

Homeo box: A short stretch of nucleotides whose se-quence is virtually identical in all the genes that con-tain it. It has been found in many organisms, fromfruit flies to human beings. It appears to determine

when particular groups of genes are expressed inthe development of the fruit fly.

Human gene therapy: Insertion of normal DNAdirectly into cells to correct a genetic defect.

Human Genome Initiative: Collective name for sev-eral projects begun in 1986 by DOE to 1) create anordered set of DNA segments from known chromo-somal locations, 2) develop new computational meth-ods for analyzing genetic map and DNA sequencedata, and 3) develop new techniques and instru-ments for detecting and analyzing DNA,

Hybridization The process of joining two complemen-tary strands of DNA, or of DNA and RNA, togetherto form a double-stranded molecule.

Informatic: The study of the application of computerand statistical techniques to the management of in-formation. In genome projects, informatics includesthe development of methods to search databasesquickly, to analyze DNA sequence information, andto predict protein sequence and structure from DNAsequence data.

International technology transfer: Movement of in-ventions and technical know-how across nationalborders.

Introns: The DNA sequences interrupting the protein-coding sequences of a gene that are transcribed intomRNA but are cut out of the message before it istranslated into protein. Compare exons.

Karyotypcx A photomicrograph of an individual’s chro-mosomes arranged in a standard format showingthe number, size, and shape of each chromosome;used in low-resolution physical mapping to corre-late gross chromosomal abnormalities with the char-acteristics of specific diseases.

Library: A collection of clones in no obvious orderwhose relationship can be established by physicalmapping. Compare genomic library.

Linkage The proximity of two or more markers (e.g.,genes, RFLP markers) on a chromosome; the closertogether the markers are, the lower the probabilitythat they will be separated during meiosis and hencethe greater the probability that they will be inheritedtogether.

Locus: The position on a chromosome of a gene orother chromosome marker; also, the DNA at thatposition. Some restrict use of locus to regions of DNAthat are expressed. See gene expression.

Marker An identifiable physical location on a chro-mosome (e.g., restriction enzyme cutting site, gene,RFLP marker) whose inheritance can be monitored.Markers can be expressed regions of DNA (genes)or some segment of DNA with no known codingfunction but whose pattern of inheritance can bedetermined.

Page 200: Mapping Our Genes - Genome Projects: How Big? How Fast?

203

Meiosis: The process of two consecutive cell divisionsin the diploid progenitors of sex cells. Meiosis re-sults in four rather than two daughter cells, eachwith a haploid set of chromosomes.

Messenger RNA, mRNA: A class of RNA produced bytranscribing the DNA sequence of a gene. The mRNAmolecule carries messages specific to each of the20 amino acids. Its role in protein synthesis is totransmit instructions from DNA sequences (in thenucleus of the cell) to the ribosomes (in the cytoplasmof the cell).

Multifactorial or multigenic disorders See polygenicdisorders.

Mutation: Any change in DNA sequence that resultsin a new characteristic that can be inherited. Com-pare polymorphism.

Negative eugenics: Policies and programs intendedto reduce the occurrence of genetically determineddisease.

Nucleotide: A subunit of DNA or RNA consisting ofa nitrogenous base (adenine, guanine, thymine, orcytosine in DNA; adenine, guanine, uracil, or cyto-sine in RNA), a phosphate molecule, and a sugar mol-ecule (deoxyribose in DNA and ribose in RNA). Thou-sands of nucleotides are linked to form the DNA orRNA molecule. See DNA, base pair, RNA.

Oncogene A gene, one or more forms of which is asso-ciated with cancer. Many oncogenes are involved,directly or indirectly, in controlling the rate of cellgrowth.

Physical map: A map of the locations of identifiablelandmarks on DNA (e.g., restriction enzyme cuttingsites, genes, RFLP markers), regardless of in-heritance. Distance is measured in base pairs. Forthe human genome, the lowest-resolution physicalmap is the banding patterns of the 24 different chro-mosomes; the highest-resolution map would be thecomplete nucleotide sequence of the chromosomes.

Polygenic disorderw Genetic disorders resulting fromthe combined action of alleles of more than one gene(e.g., heart disease, diabetes, and some cancers). Al-though such disorders are inherited, they dependon the simultaneous presence of several alleles, thusthe hereditary patterns are usually more complexthan those of single-gene disorders. Compare single-gene disorders.

polymorphism: Difference in DNA sequence amongindividuals. Genetic variations occurring in morethan 1 percent of a population would be considereduseful polymorphisms for genetic linkage analysis,Compare mutation.

Positive eugenics: The achievement of systematic orplanned genetic changes to improve individuals ortheir offspring.

Prokaryote: Cell or organism lacking membrane-bound, structurally discrete nucleus and subcellu-lar compartments. Bacteria are examples. Compareeukaryote.

Protein: A large molecule composed of chains ofsmaller molecules (amino acids) in a specific se-quence; the sequence is determined by the sequenceof nucleotides in the gene coding for the protein.Proteins are required for the structure, function,and regulation of the body’s cells, tissues, and or-gans, and each protein has a unique function. Ex-amples are hormones, enzymes, and antibodies.

Recombinant DNA technologies: Procedures used tojoin together DNA segments in a cell-free system (anenvironment outside of a cell or organism). A re-combinant DNA molecule can enter a cell and repli-cate there, either autonomously or after it has be-come integrated into a cellular chromosome.

Replication: The synthesis of new DNA strands fromexisting DNA. In human beings and other eukary -otes, replication occurs in the nucleus of the cell.

Resolution: Degree of molecular detail on a physicalmap of DNA, ranging from low to high.

Restriction enzyme, endonuclease: A protein thatrecognizes specific, short nucleotide sequences andcuts DNA at those sites. There are over 400 suchenzymes in bacteria that recognize over 100 differ-ent DNA sequences. See restriction enzyme cuttingsite.

Restriction enzyme cutting site: A specific nucleo-tide sequence of DNA at which a restriction enzymecuts the DNA. Some sites occur frequently in DNA(e.g., every several hundred base pairs), others muchless frequently (e.g., every 10,000 base pairs).

RFLP, restriction fragment length polymorphism:Variation in DNA fragment sizes cut by restrictionenzymes; polymorphic sequences that are respon-sible for RFLPs are used as markers on genetic link-age maps.

Ribosomal RNA, rRNA: A class of RNA found in theribosomes of cells.

RNA, ribonucleic acid: A chemical found in the nu-cleus and cytoplasm of cells; it plays an importantrole in protein synthesis and other chemical activi-ties of the cell. The structure of RNA is similar tothat of DNA. There are several classes of RNAmolecules, including messenger RNA, transfer RNA,ribosomal RNA, and other small RNAs, each servinga different purpose.

Sex chromosomes: The X and Y chromosomes in hu-man beings that determine the sex of an individual.Females have two X chromosomes in diploid cells;males have an X and a Y chromosome.

Single-gene disorders: Hereditary disorders caused

Page 201: Mapping Our Genes - Genome Projects: How Big? How Fast?

204

by a single gene (e.g., Duchenne muscular dystrophy,retinoblastoma, sickle cell disease). Compare poly -genic disorders.

Somatic cells: Any cells in the body except reproduc-tive cells and their precursors.

Technology transfen The process of converting sci-entific knowledge into useful products.

Transcription: The synthesis of mRNA from a se-quence of DNA (a gene); the first step in gene ex-pression. Compare translation,

Transfer RNA tRNA: A class of RNA having structureswith triplet nucleotide sequences that are com-plementary to the triplet nucleotide coding se-

quences of mRNA. The role of tRNAs in protein syn-thesis is to bond with amino acids and tranfer themto the ribosomes, where proteins are synthesizedaccording to the instructions carried by mRNA.

Translation: The process in which the genetic codecarried by mRNA directs the synthesis of proteinsfrom amino acids. Compare transcription,

Vector: DNA molecule originating from a virus, a bac-terium, or the cell of a higher organism used to carryadditional DNA base pairs; vectors introduce for-eign DNA into host cells, where it can be reproducedin large quantities. Examples are plasmids, cosmids,and yeast artificial chromosomes.

Page 202: Mapping Our Genes - Genome Projects: How Big? How Fast?

—. .—.

I n d e x

Page 203: Mapping Our Genes - Genome Projects: How Big? How Fast?

Index

Acid Precipitation Task Force, 120acquired immune deficiency syndrome, 64, 72, 96, 117adenosine, 21-22adenosine deaminase, 61agriculture, genome mapping implications for, 73alanine, codon, 23Alberts, Bruce, 107albinism, 149albumin, 61alcoholism, 86aldolase, chromosome assignment of, 33Alexander, Duane, 94 n.1alleles, 27, 28, 58alpha globin, 61alpha interferon, 64.Alzheimer’s disease, 65, 95, 146American Society for Biochemistry and ,Molecular Biol-

ogy, 127American Type Culture Collection, 101, 141, 190, 192amino acids

codon, 23databases, 97, 98, 190-191generation of, 22-23

anemias, 64, 136antibodies, 21apo]iproprotein E, 61Applied Biosystems, Inc., automated DNA sequencer, 47,

108, 138applied research, government controls on, 87arginine, codon, 23Argonne National Laboratory, 122Armed Forces Institute of Pathology, 103arthritis, 64Ashburner, Michael, 42Asian nations, interest in genome projects, 8asparagine, codon, 23aspartic acid, codon, 23atria] natiuretic factor, 64Australia, genome research, 8, 133, 148, 195automation of DNA sequencing

DOE initiative for, 7-8by Japan, 9, 47-48, 137-138new technologies, 46-48robotic devices, 48, 137-138standard setting for equipment for, 103

autoradiography/autoradiographsautomated scanning of, 48use in DNA sequencing, 46use in physical mapping, 32, 40, 45use in RFLP mapping, 29

autosomeskaryotyping of, 32mapping of genetic loci on, 27, 30, 33-34

bacteriaDNA sequencing of, 45E. co/i, 41, 45, 47, 100genome mapping, 4, 40, 41

haploid DNA content, 25mitochondria similarities to, 71S. typhimuriurn, 45

bacteriophage T4, genomic map of, 40Baltimore, David, 124basic research

government restriction of, 87value of, 81

Baylor College of Medicine, 97bears, 36, 69behavior, human, genetic factors in, 85-86Berg, Paul, 126beta globin, 61, 65, 70beta interferon, 64Billings, John Shaw, 97Bio-Rad Laboratories, automated DNA sequencer, 47-48biomedical research

HHMI support for, 8NIH funding, 6, 95

Bimolecular Engineering Programme, 140BIONETTM, 193biotechnology

databases, 97European research programs in, 140-141international competitiveness in, 11NBS support for, 103

Biotechnology Action Program, 140-141Biotechnology Research and Innovation for Detrelopnlent

and Growth in Europe, 140-141bison, 72blood pressure disorders, 64Brenner, Sydney, 147burn treatment, 64Botstein, David, 126Bush, Vannevar, 102

C-value paradox, 24-25Caenorhabditis elegans

amount sequenced, 47genome mapping, 42genome size, 47

California Biotechnology, probe development, 58California Institute of Technology, automated DNA se-

quencer, 47, 102Cambridge University, genome mapping project, 42Canada, genome research, 8, 133, 148, 195cancer

mutation-induced, 25polygenic nature, 62treatment, therapeutic agents, 64

catalase, 61cataract surgery, 64cDNA

clones, 59-63, 67libraries, 59mapping, 30, 32, 44, 63restriction enzyme cutting, 35

cell culture, see somatic cell hybridization

207

Page 204: Mapping Our Genes - Genome Projects: How Big? How Fast?

208

Cell Line Two-Dimensional Gel Electrophoresis Database, automation of, 47, 4897

cell receptors, 21-22Center for the Study of Human Polymorphism

collaborative efforts of, 8, 106, 143-144, 146, 149family pedigree data set, 58, 143, 146funding for, 106mission, 145

Centers for Disease Control, 103chicken, haploid DNA content, 25Chiles, Lawton, 97cholesterol, lowdensity lipoprotein, 58chromosome markers

for single-gene diseases, 57funding for research on, 6mapping through family inheritance patterns, 27maps, low-resolution, 4use for genetic linkage studies, 6, 27, 28, 44

chromosomesbanding patterns, 30, 33, 34-35, 40, 42, 45, 56crossing over (recombination), 26, 27deletion of, 25, 31, 32diploid number, 21Drosophila melanogasfer salivary gland, 30, 32duplication, 25E. Coli, 41gene assignment to, 31haploid number, 21hybrids, single, 31inversion, 26isolation techniques, 31-32, 43of clinical significance, 44phage lambda, 36, 39, 42, 56, 100polytene, 42sorting, 31-32, 37, 47, 56, 97, 100species similarities in, 34-35, 36translocation, 25-26, 31, 32yeast artificial, 36-37, 39, 43, 56

chromosomes, human1, 274, 286, 1487, 31, 44, 629, 33, 14810, 32, 3311, 3213, 3416, 31, 44, 100, 14817/ 31, 3319, 31, 44, 10021, 35, 44, 95, 100, 145-14622, 44, 145average size, 37number, 3resemblance to primate chromosomes, 34-35X, 24, 27, 30, 31, 44, 63, 100, 148Y, 24, 30, 31, 145

chronic granulomatous disease, 57, 59, 62Church, George, 44-45cloning/clones

access to and ownership of, 147

banding patterns, 40cDNA, 59-63disease-associated genes, 57, 59, 60of DNA fragments, 39, 72drug development through, 62-63E. coli, 41fingerprinting method for ordering, 42fruit fly chromosomes, 42gene isolation by, 31, 59libraries of, 39, 42, 59, 62-63; see also contig mapsmicrodissection, 42NIH grants for, 94ordering of, 38-40, 41, 42overlapping, 39, 42, 62, 100phage lambda chromosomes, 36, 39, 42, 56, 100repositories, 97, 101, 115S. cerevisiae, 42vectors, 35-37, 38-39, 42, 56, 67, 100yeast artificial, 36-37, 39, 43, 56, 157

Cold Spring Harbor Laboratories Conference, 6collaboration on genome research

by Australia, 148by Center for the Study of Human Polymorphism, 8,

106, 143, 146, 149, 157center-based vs. networking, 156databases and repositories, 8, 139, 158-159DOE, 157existing frameworks, 155-157European, 142, 150International Human Gene Mapping Workshops, 29,

157international] journals, 157-158organizational options, 152-155precedents for international scientific programs,

150-152views on, 152-153Washington University -RIKEN, 157

Collaborative Research, Inc.DNA probe development, 58, 108RFLP linkage map, 6, 30

collagen, 61, 67color-blindness, 21Columbia University

mapping of E. coli genome, 41mapping of human chromosome 21, 44, 100

Compton, Arthur Holly, 99computers, computational methods, and software

artificial intelligence, 96costs, 180-181for DNA sequencing, 65, 97gene mapping applications to, 57, 146networking, 156NIH funding for improvements in, 95, 96see also databases; informatics

Concertation Unit for Biotechnology in Europe, 140consortia

of Federal/private interests, authority for, 16, 121funding, 122goals, 121intellectual property rights, 122

Page 205: Mapping Our Genes - Genome Projects: How Big? How Fast?

209

Midwest Plant Biotechnology Consortium, 122national, to administer genome projects, 14-15, 121-123peer review, 122two-tiered system, 122

contig mapping/mapsconstruction, 39, 40correlation with large-fragment restriction maps, 42forward genetics applications, 61nematode, 42, 43reverse genetics applications, 62strategies, 43-44yeast, 42

controversial issuesBig Science vs. small science, 125-128DNA sequencing, extent of, 4, 6, 44, 57, 79, 81feasibility of genome mapping, 4quotes on, 126resolution of genome mapping, 3, 79, 81, 88see also ethical issues

corn, haploid DNA content, 25Coulson, Alan, 44-45, 147-148Crick, Francis H.C., 3, 21cysteine, codon, 23cystic fibrosis, 57, 58, 62, 149cytidine, 21-22

Dana Farber Cancer Institute, 97databases

access, to and ownership of, 12, 82, 102, 128, 134, 139,146; see also technology transfer

Cell Line Two-Dimensional Gel Electrophoresis, 97CODATA Hybridoma Databank, 141DNA Data Bank of Japan, 139, 158DNA fingerprints, 80European support of, 141funding for, 7, 12, 96-97, 141, 190Genatlas, 157GenBank”, 46, 96, 98, 109, 115, 139, 142, 154, 158,

190genetic maps, 24, 98, 106, 189-190government protection of, 87HHMI, 7, 8, 98, 106Human Gene Mapping Library, 106, 189-190importance, 4, 9international collaboration on, 8, 139, 158-159Japan Protein Information Database, 159linking of, 98management of, 12Martinsreid Institute for Protein Sequence data, 159MEDLARSiMEDLINE, 97mouse genetics, 106National Library of Medicine, 7, 8, 12, 96, 97needs for, 191-192nucleotide sequence data, 46, 96, 98, 141, 158, 190On-Line Mendelian Inheritance in Man, 24, 98, 106,

189-190Protein Data Bank, 190-191Protein Identification Resource, 97, 98, 158-159,

190-191Dausset, Jean, 145-146

DeLisi, Charles, 100, 153Denmark, national genome research efforts, 133, 143,

195deoxyribonucleic acid, see DNA listingsDepartment of Defense, biomedical research resources,

104Department of Energy

funding for genome projects, 7, 96, 100Health and Environmental Research Advisory Commit-

tee report, 101-102interest in massive sequencing, 9international research collaboration, 157as lead agency for genome projects, 12, 14, 116, 117mission, 7, 99-100Office of Health and Environmental Research, 99-101organization, 99, 117-118peer review, 101, 118recommendations for genome projects, 4, 11, 100research supported by, 100, 117workshops sponsored by, 6, 100see also Human Genome Initiative

determinism, effect of genome mapping on, 86development, see human physiology and developmentdiabetes, 62diseases

infectious, 64linking mapping and sequencing data to, 104see also genetic diseases; and specific diseases

DNAamount relative to organism complexity, 24-25C-value paradox, 24-25cloning in plasmids, 36-37, 39complementary, see cDNAdiscovery, 3electrophoretic separation of, 37-39expendable fraction, 25, 57fingerprints, 89; see also genetic screeningfragmentation of, 37-39mitochondrial, 71oldest human samples, 72polymerase, 45recombinant technology, see recombinant DNA tech-

nologyreplication process, 21-22structure, 3, 21-22transcription to mRNA, 23-24see also chromosomes

DNA markers, see chromosome markersDNA probes

automated synthesis of, 47, 48cDNA, 28-29, 32, 33, 59-61, 63companies developing, 58fluorescently labeled, 46-48for genetic disease diagnosis, 58-59in in situ hybridization, 33number needed to complete human linkage map, 29oligonucleotides, 48radioactively labeled, 28-29, 32-33, 40, 44-46, 58reliability, 58for RFLP markers, 28-29, 56, 58, 61-62

Page 206: Mapping Our Genes - Genome Projects: How Big? How Fast?

—. . . —

210

synthetic, 48, 59-60use to clone genes, 60

DNA Segment Library, 97DNA sequence/sequencing

automation of, 47-48commercialization, 82-83, 133, 138-139computer-assisted, 65controversies, 4, 6, 44, 79; see also ethical issuescosts, 6, 182-183database, 46, 96, 98, 190definition, 3, 21directly from genomic DNA, 45E. coli, 41, 100enhanced fluorescence detection method, 46, 47exons, 59, 61, 63, 65, 69expenditures, federal, 8facilities for, 13government role in, 87homeo box, 67-68importance, 9introns, 25, 30, 61, 65, 69-70longest stretch determined, 46of mitochondria, 71mukiplex, 44-46mutation detection applications, 56NIH funding for, 8rate, 46repeated, 25, 28, 43, 57RFLP mapping required for, 37-39scanning tunneling microscopy for, 46selective amplification without prior cloning, 45-46species comparisons, 68-70steps, 47strategies, 44-45technologies, 44-47variations, 28, 29VNTR, 29

Domestic Policy Council, 8, 105, 109, 119Donis-Keller, Helen, 30Down’s syndrome, 32, 35, 58, 95, 146Drosophila melanogaster

amount sequenced, 47genome mapping, 42-43genome size, 47salivary gland chromosomes, 30, 33

drugs and pharmaceuticals, development, 62-63Duchenne muscular dystrophy, 57-59, 61-63Duffy blood group, 27Dulbecco, Renatto, 100, 126, 145dwarfism, 64dystrophin, 63

EG&.G Bimolecular) automated DNA sequencer, 47E.I. du Pent de Nemours & Co., automated DNA se-

quencer, 47electrophoresis, see gel electrophoresisEngland, see United Kingdomenzymes

functions, 22see also specific enzymes

epidermal growth factor, 64Epstein-Barr virus, 46erythropoietin, 64Escherichia coli

amount sequenced, 47genome mapping, 41, 43, 100genome size, 47

ethical issuesacademic freedom, 87access to and ownership of databases and repositories,

16, 82, 88access to and use of genetic information, 79-80attitudes and perceptions of ourselves and others,

85-86commercialization, 16, 82-83, 133diagnostic/therapeutic gap, 83eugenics, 81, 84-85, 88, 143-144genetic fingerprinting, 80government role in mapping and sequencing, 87, 88international competitiveness, 87-88, 133physician practice, 83reproductive choices, 83-84, 88responsibility for considering, 123-124

eugenicsnegative, 85, 143-144of normalcy, 85positive, 84-85

eukaryotes, 70-71Europe, Eastern, interest in genome projects, 8, 133,

143, 195Europe, Western, genome sequencing and mapping

activities, 139-148; see also specific countries andorganizations

European Economic Community, genome research, 139-141, 153

European Molecular Biology Laboratory, 8, 139, 141-142,158

European Molecular Biology Organization, 8, 141European Research Coordination Agency, 142, 145-146European Science Foundation, 142, 156evolution, see molecular evolution

facilities for genome researchbioprocess engineering, 102data handling, European needs, 142DOE funding for, 100flow cytometry, 32, 97; see also specific national lab-

oratoriesneed for, 10, 128NSF biology centers, 8, 102-103, 109

factor IX, 61factor VII, 61factor VIII:C, 64familial hypercholesterolemia, 56, 58, 149family pedigree projects

CEPH data set on, 58, 136, 146Danish, 143Egyptian, 134, 136on mental illness, 156South African, 149

Page 207: Mapping Our Genes - Genome Projects: How Big? How Fast?

211

use in genetic linkage mapping, 27, 33, 58, 61Venezuelan, Huntington’s disease, 63-64, 134-136, 143,

146fatalism, 86Federal Advisory Committee Act, 124Federal Republic of Germany, genetics research, 8, 133,

143-144, 195fibroblast growth factor, 64Finland, national genome research effort, 133, 144flow cytometry

enhanced fluorescence detection in, for DNA sequenc-ing, 46

extraction of whole chromosomes by, 37facility, 32

FranceCenter for the Study of Human Polymorphism, 33, 58,

144-146genome projects, 8, 144-145published genome research, 133, 195

fruit flydevelopmental regulation in, 67Drosophila melanogaster, 30, 33, 42-43, 47genome mapping, 42-43haploid DNA content, 25human DNA sequences compared with, 68lethal mutations in larval stage, 42

funding for genome projectsadvisory body for determining, 124of consortia, 122databases, 7, 12, 96-97, 190determinants, 98determinants of congressional appropriations, 11-12DNA marker studies, 6DOE, 7, 96, 100, 118, 190European Economic Community, 139-143HHMI, 7, 190international, 8NIH, 6, 7, 94-98, 117, 155, 190NSF, 7, 8, 96, 190pluralism in, 13, 15, 119priority setting, 10private vs. federal, 79, 83recommendations, 4, 11-12, 107through a lead agency, effects of, 12-13through a national consortium, 14USDA, 190West German, 144

Gall, Joseph, 126Galton, Francis, 84gamma interferon, 64gel electrophoresis

database, 97DNA separation for phvsical mapping, 37, 39, 45polyacrylamide, 45 “pulsed-field, 37, 44, 56in RFLP mapping, 28, 37, 58

GenBank”, 46, 96, 98, 109, 115, 139, 142, 154, 158, 190gene expression

control of, 57

steps in, 23-24study centers, 144

gene productsfunctions of, 67with potential as therapeutic agents, 62, 64

gene therapy, 64, 141genes

biochemical identification, 62in a chromosome band, number, 33color-blindness, 21definition, 3, 21, 24dosage mapping, 34encoding ribosomal RNAs, detection, 33expressed, 24, 30families of, 25, 70functions, approaches to understanding, 66-67, 73homeotic, 68isolation techniques, 31, 33, 59-62largest, 63linked, 26, 34; see also genetic linkage mapping/mapsmapping, see genetic linkage mapsspecies similarities in, 34structure/function relationships, study of, 144see also human genes

genes, humanaldolase, 33chromosomal locations known, 4number of loci identified, 24, 30number per haploid genome, 24sizes, 61

genetic codedefinition, 21--24for amino acids, 23

genetic diseaseschromosomal locations of genes for, 4clinical services for, 100companies developing DNA probes for diagnosis of, 58correlating gross chromosomal abnormalities with, 32diagnostic information, physician handling of, 83family pedigree studies, 61, 63HHMI support of research on, 8isolation of genes associated with, 59-62mechanisms, 4not associated with biochemical defects, 61polygenic, 62RFLP markers for, 28, 56, 58single-gene, 57, 88see also specific diseases

genetic informationaccess to and use of, 79-80, 82, 84causes of changes in, 25-26insurer use of, 81, 83organization and function, 21-26

genetic linkage mapping/mapsautoradiography use in, 29autosomes, 27costs, 181-182databases, 24, 98, 106, 189-190disease diagnosis applications, 56, 58, 62distance measurements on, 27

Page 208: Mapping Our Genes - Genome Projects: How Big? How Fast?

212

early attempts, 4, 6, 21electrophoretic technology in, 28family pedigree data in, 27, 33, 58, 61HHMI funding for, 7medical applications, 56, 58, 62-64number of markers needed to complete, 29-30projects to link physical maps with, 181-182purpose, 26-27recombinant DNA technology use in, 28resolution, 62reverse genetics applications, 62of RFLP, 28-30, 62somatic cell hybridization for, 27X chromosome, 27

genetic locus, see chromosome markergenetic screening

ethical questions about, 80, 88for missing children, 80for proof of paternity, 80

genetic selection, see eugenicsgenetics

definition, 21forward, 59-61, 62-63HHMI funding for, 7molecular, NIH research resources activities related to,

97NIH funding for, 95population, 72-73reverse, 59, 61-62

Genetics Institute, robotic devices for DNA sequencing,48, 108

Genome Corp., physical mapping project, 108genome mapping

agricultural applications, 73application in developmental studies, 42, 65automation, 47determinism and, 86distance measurements in, 40evolutionary applications, 68-72facilities, 13importance, 9international efforts and cooperation, 8, 9, 150-159;

see also specific countriesresolution levels in, 56, 79see also genetic linkage mapping/maps; physical

mappinggenome mapping, human

commercialization, 82-83, 138-139controversies, 3, 4, 6, 9, 44, 55, 57, 102government role in, 87priorities for, 88scale of efforts, 5, 24strategies, 43-46

genome mapping, nonhumanbacteria, 4, 40, 41, 44fruit fly, 42-43, 44importance, 9, 44, 107international efforts, 8, 42nematodes, 4, 42, 44

plants, 73, 136, 149yeast, 4, 41-42, 44

genome projectsaccountability to Congress, 13, 14, 124administration of, 12-15, 115-123, 184advisory board structure for, 123-124appropriations for, see fundingbenefits, 11, 55, 56, 133, 172-174Big Science v. small science approach, 120, 125,

127-128center-based vs. networking, 156collaboration on, 150-159, see also collaboration on ge-

nome researchcommercialization potential, 82-83, 133, 138-139, 151,

165common features, 7component nature, 4, 6, 10congressional oversight, 15-17congressional role in, 11-17consortium structure for, 14-15, 121-122cooperation among agencies, 9, 15, 118-119costs, 11-12, 4, 47, 180-185definition, 4displacement of other research by, 102, 125duplication of efforts, 13, 82, 105early estimates of costs for, 184-186economic impacts, 165, 172ethical considerations, 79-88expenditures, Federal, 8facilities, 10, 13focus, 7-9funding, see funding for genome projectsinteragency coordination and communications, 8, 11,

123interagency task force oversight of, 14, 119-121international efforts on, 133-159; see also specific

countrieslead agency concept, 12-14, 115, 116-118legislation, 12, 14, 123manpower availability, 10medical applications, 56-64; see also disease; medicinemilitary applications, 174misconceptions about, 9-10national prestige associated with, 174objectives, 7, 9, 55organization of, 12-15organizations involved in, 6, 7policy development for, 134political interference with, 127-128quality control and reference standards, 103, 127, 183resource allocation for, 10; see also funding for ge-

nome projectsscope of, 10, 134training of personnel, 183-184U.S. competitiveness and, 11, 133see also genome mapping; DNA sequenceshequencing;

Human Genome Initiative; pilot projectsgenome, human

amount sequenced, 46-47

Page 209: Mapping Our Genes - Genome Projects: How Big? How Fast?

213

bibliometric analysis of research on, 133, 157-158,size, 24, 43, 46-47

genomesbacteriophage T4, 40definition, 3, 21Epstein-Barr virus, 46mitochondrial, 71organization, 21regeneration, 21size, 21, 24-25, 43smallest, 148

genomic library, 35Germany, see Federal Republic of GermanyGilbert, Walter, 44-46, 126, 153, 156glutamic acid, codon, 23glutamine, codon, 23glycine, codon, 23granulocyte colony stimulating factor, 64guanosine, 21-22Gusella, James, 136

47

Harvard University, DNA sequencing, 44, 100heart disease, 58, 62, 64hemophilia, 56, 57, 58, 64high-mobility group COA reductase, 61Hill, Lister, 97histidine, codon, 23Hitachi, Ltd., automated DNA sequencer,Hood, Leroy, 47, 126hormones, 21Howard Hughes Medical Institute

as lead agency for genome projects, 13budget, 8collaboration with CEPH, 146databases, 7, 8, 98, 106expenditures, 102, 106funding, 7, 105, 109genome initiatives, 8, 105-106, 109mission, 7RFLP mapping project, 6, 29university centers, 106

Hpa 1, 28Human Gene Mapping Library, 106, 189-190Human Gene Mapping Workshop, 29, 106, 144, 157Human Genetic Mutant Cell Repository, 31, 96, 190,

192-193Human Genome Initiative

budget, 7-8, 101expenditures, 7-8justification for, 102management, 6objectives, 6, 7, 14recommendations on, 101stages, 101workshops, 6

human growth hormone, 59, 62, 64human physiology and development

genome mapping applications to, 65NICHD-supported research, 95

195 Huntington’s disease, 28, 57, 58, 64, 83, 134-136, 146,149

hybridization, see in situ hybridization; somatic cellhybridization

hypercholesterolemia, 56, 58, 149hypertension, 64

Imperial Cancer Research Fund, 148in situ hybridization

cDNA mapping by, 30, 33-34in mapping genes to whole chromosomes, 56localization of fruit fly clones by, 42, 43

Index Medicus, 97Industrial Biotechnology Association, opinions on Federal

initiatives in mapping and sequencing, 108informatics

Advanced Informatics in Medicine, 141Bioinformatics: Collaborative European Programs and

Strategy, 141BIONETTM, 193Contextual Measures for R&D in Biotechnology,

140-141National Biotechnology Information Center, 193

infrastructure for genome projectsEuropean, 141Federal support, 8, 102resource allocation, 10

Institute for Medical Research, somatic cell hybrid linerepository, 31

insulin, 21, 59, 61, 62, 64Integrated Genetics, DNA probe development, 58, 108intellectual property, protection of, see patent and copy-

right policiesIntelliGenetics Corp., 96interleukin-2, 62, 64International efforts on genome projects

collaboration and cooperation, 150-159; see also col-laboration on genome research

see also specific countriesInternational Geophysical Year, 150-151isoleucine, codon, 23Italy, human genome research, 8, 133, 145-147, 195

Japanautomation of DNA sequencing equipment, 47-48, 137basic science expertise, 137collaboration on research, 157commercialization of mapping and sequencing technol-

ogies, 133, 138competitiveness with U. S., 133, 137-139cooperation with U.S., 139databases and repositories, 139expenditures on genome projects, 8-9, 138funding for genome research, 137grants program in genetics, 9Human Frontiers Science Program, 9, 137-138mapping and sequencing research, 136-138Ministry of Education, Science, and Culture, 9, 136-137Ministry of International Trade and Industry, 137-138

Page 210: Mapping Our Genes - Genome Projects: How Big? How Fast?

214

peer review, 136physical mapping of E. coli genome, 41policy development on genome research, 136-137published genome research, 133, 195robotics technology, 137Science and Technology Agency, 8-9, 137workshop on DNA sequencing technologies, 47

karyotypes, human female, 34karyotyping, 32-33, 56Kennedy, John, 97kidney diseases, 57, 58, 64Kirschstein, Ruth, 94 n.1, 117Koshland, Daniel, 126

Lalouel, Jean-Marc, 146Latin America, genome research, 149Lawrence Livermore National Laboratory

chromosome sorting, 100mapping of chromosome 19, 44, 100ordering of DNA clones, 100, 108

Lederberg, Joshua, 126Ieucine, codon, 23leukemia, 64Levinson, Rachel, 94 n.1libraries

of DNA fragments, construction of, 39of overlapping clones, 38-39see also repositories

life sciencesDOE funding for, 7HHMI funding for, 7NIH funding for, 7NSF funding for, 7, 8

Lifecodes, DNA probe development, 58ligase, 39lily, haploid DNA content, 25Lindberg, Donald A. B., 94 n.1Los Alamos National Laboratory

chromosome sorting at, 32, 100-101mapping of chromosome 16, 44, 100ordering of DNA clones, 100see also GenBank ”

Lovell, Joseph, 97lowdensity lipoprotein receptor, 58, 61lysine, codon, 23

microphage colony stimulating factor, 64Massachusetts Institute of Technology, bioprocess engi-

neering center, 102Maxam, Alan, 44-46Max Planck Society, 144McKusick, Victor, 24, 98medicine

diagnostic tool development, 56, 58-59drug development, 62-63human gene therapy prospects, 64isolation of genes associated with diseases, 59-62see also genetic diseases

meiosis, 26-27

Mendel, Gregor, 3, 73Mendelian Inheritance in Man, 24, 98, 106, 189Merriam, John, 42messenger RNA

function, 23size of human genes, 61translation into protein, 23, 30

methioninecodon, 23

mitochondrial genome, human origin clues from, 71molecular anthropology, 72molecular biology

Big Science vs. small science, 125, 126-127of human development, 65, 95manpower in, 10plant, genome mapping applications to, 73

molecular evolutiongenome mapping and DNA sequencing applications to,

57, 68-72human origins, 71primate, 70unanswered questions in, 69-70

monoamine oxidase, 86Moskowitz, Jay, 94 n.1Mount Sinai Medical Center Institute of Human Genomic

Studies, 100mouse

beta globin gene, 65cell hybridization, see somatic cell hybridizationgenetic similarities to humans, 34, 67genetics database, 106haploid DNA content, 25Mus musculus, amount sequenced and genome size,

47muscular dystrophy

Becker’s, 63Duchenne, 57-59, 61-63

mutationsartificially induced, 25cancer from, 25chromosome structural changes involved in, 25-26deletion, 25, 31detection of, 42, 56, 58, 59duplication, 25in fruit flies, 42, 68Human Genetic Mutant Cell Repository, 31, 96, 190,

192-193human rates, 72inversion, 26lethal, 42in nucleotide sequence, 23saturating screen technique for, 42in sex cells, 25in somatic cells, 25, 31translocation, 25-26, 31

National Academy of Sciencesrecommendations on genome projects, 107role in genome project oversight, 14, 15, 124views on international cooperation, 153see also National Research Council

Page 211: Mapping Our Genes - Genome Projects: How Big? How Fast?

National Aeronautics and Space Administration, 150-151National Biotechnology Information Center, 193National Bureau of Standards, 103National Cancer Institute, 93-94, 96, 98, 117National Center for Biotechnology Information Act of

1986, 97National Flow Cytometry Resource, 32, 97INational Institute of Allergy and Infectious Diseases,

93-94National Institute of Child Health and Human Develop-

ment, 93-94, 95,National Institute of General Medical Sciences, 93, 95, 94National Institute of Neurological and Communicative

Disorders and Stroke, 93-94, 95National Institute on Aging, 95National Institute on Mental Health, 95

National Institutes of Healthbudgets for genome projects, 8, 93databases, see National Library of Medicineexpenditures for genome projects, 8, 93, 109funding for genome projects, 6, 7, 14, 93-94, 95-97,

109, 155genome project objectives, 8, 93-95as lead agency for genome projects, 12, 13-14, 116-117mission, 7, 93organization, 93-94, 99, 116origin, 93peer review, 98-99, 109research infrastructure, 96-98research resources activities related to molecular

genetics, 97see also specific institutesworking group on human genome, 94

national laboratories, see specific laboratoriesNational Library of Medicine, 7, 8, 12, 97-98, 117National Research Council

physical mapping strategies, 44recommendations for genome projects, 4, 11, 107, 116sequencing strategies, 44

National Science Foundationas lead agency for genome projects, 14funding for genome-related research, 7, 8, 96, 102-103,

109mission, 7, 102role on genome projects, 116

nematodesCaenorhabditis elegans, 42, 47, 147-148genome mapping, 4, 42, 147-148haploid DNA content, 25human DNA sequences compared with, 68mutant, 67

neurofibromatosis, 149nucleotides

base order, 3, 21-22; see also DNA se-quence/sequencing

bonding between base pairs, 21dideoxynucleotide, 45mutations in sequence of, 23number of base pairs in human cells, 5substitution and recombination rates, 70total sequenced, 46

Office of Science and Technology PolicyBiotechnology Science Coordination Committee, 104Committee on Life Sciences, 15, 104interagency coordination of genome projects under, 8,

104, 109, 124mission, 104

Office of Management and Budget, role in genomeprojects, 105

Office of Technology Assessmentcost estimates for human genome projects, 180-185workshop on genome projects, 4

oncogenes, 67Organization of American States, 149Oudtshoorn skin disease, 134, 149

Palade, George, 94 n.1pandas, 36, 69parasites, 64parathJroid, 61Pardue, Mary Lou, 33Pasteur Institute, 145, 157patent and copyright policies, 16-17, 82, 88, 166-170Patent and Trademark Amendments of 1980, 16, 187peer review

at DOE, 101, 109, 118by Japan, 136in a national consortium, 122at NIH, 98-99, 109, 118

Pepper, Claude, 97Perkin-Elmer Cetus Instruments

DNA sequencing technology, 45-46, 47probe development, 58

phenylalanine, codon, 23phenylalanine hydroxylase, 61phenylketonuria, 57Philipson, Lennart, 142, 152physical mapping/maps

automation of, 47bacterial genome, 41bottom-up, 43chromosome sorting for, 31-32, 56comparative mapping of species, 34construction of libraries of DNA fragments for, 38-39contig, 30, 39, 40, 41, 43-44, 61costs, 181-182detail possible on, 37distance measurements, 27, 40forward genetics applications, 61fragmentation of DNA for, 37-39fruit fly, 42-43gene dosage technique, 34high-resolution, 30, 35-46, 56human genome, 35--41, 43-46in situ hybridization in, 33-34, 56karyotyping in, 32-33, 56linking of genetic maps with, 181-182low-resolution, 30-35, 41, 43, 56medical applications, 64nematode genome, 42NIH grants for, 84nonhuman genomes, 41--43

Page 212: Mapping Our Genes - Genome Projects: How Big? How Fast?

216

ordering of clones for, 39, 40private enterprises, 108purification of chromosomal DNA for, 37rapid, mass-analysis approach, 41of restriction enzyme sites, 37-41size determinants, 43, 81somatic cell hybridization in, 27, 30-32, 33, 56strategies, 43-44time required for, 40-41topdown, 43yeast genome, 41-42

physicians’ attitudes and practice, effects of genetic in-formation on, 83

physiology, see human physiology and developmentPickett, Betty, 94 n.1pilot programs

DNA sequencing, 44European, 140, 145, 147importance, 4NRC recommendations for, 107yeast chromosome sequencing, 140

plants, genome mapping of, 73polycystic kidney disease, 57, 58polymerase, 45population biology, genome mapping applications in, 72-

73, 134President’s Commission for the Study of Ethical Prob-

lems in Medicine and Biomedical and BehavioralResearch, 85

private sectorfoundation support, 109, 137, 147, 148funding of genome projects, 14-15, 108, 109genome mapping efforts, 9government formation of consortia with, 14-15role in genome projects, 107-108technology development, 108; see also automation; ro-

boticssee also specific companies

prokaryotes, lack of introns in, 69-70proline, codon, 23protein

classification by period of invention, 68-69coding sequences, identification of, 65, 67databases, 97, 98, 158-159, 190-191engineering, 63, 142, 148folding, 66functions, 21-24, 67kinase C, 61life cycle regu]ator, 67for single-gene diseases, identification of, 57, 59structure/function relationship, 63, 65synthesis, 23-24, 30see also amino acids; enzymes; and other specific

proteinspublications

international, 157-158, 195

quagga, 72

recombinant DNA technologycloning vector development through, 36, 38

definition, 24use to create genetic linkage maps, 3, 24, 28

repositories for research materialsaccess to, 58, 134American Type Culture Collection, 101, 141, 190, 192Armed Forces Institute of Pathology, 104Center for the Study of Human Polymorphism, 33, 58clones, 97, 101, 115costs, 182DNA Segment Library, 97family pedigrees, 58-59Federal support, 8, 12, 96Human DNA Probes and Libraries, 96Human Genetic Mutant Cell Library, 31, 96, 190,

192-193importance, 4, 9international collaboration on, 158tissue, 104

reproductive choices, ethical considerations in, 83-84, 85restriction enzyme cutting

automation of, 47infrequent, 41partial, 39

restriction enzyme cutting siteson bacteriophage T4 genomic map, 40high-resolution physical mapping with, 35, 37-41polymorphisms, see restriction fragment length poly-

morphismsrestriction enzymes

cDNA cutting with, 35Hpa I, 28Not I, 41

restriction fragment length polymorphisms (RFLP)allelic forms, 28diagnostic uses, 58, 59, 62DNA probe detection of, 28-29, 61-62linkage maps of, 28-30, 62mapping of, 28-29, 62, 73markers for genetic diseases, 28, 58, 62

retinoblastoma, 57, 59, 62ribonucleic acid, see RNAribosomes

functions, 23RIKEN-Washington University collaboration, 157RNA

functions, 23-24ribosomal, 23, 30, 33see also messenger RNA; transfer RNAstructure, 22

roboticsdevices for DNA sequencing, 48, 137DNA extraction devices, 108DOE initiative in, 7-8microchemical, 48, 137

Saccharomyces cerevisiaeamount sequenced, 47genome mapping, 41-42genome size, 47

salamander, haploid DNA content, 24-25Salmonella typhimurium, 45

Page 213: Mapping Our Genes - Genome Projects: How Big? How Fast?

217

Sanger, Fred, 44-45, 47scanning tunneling microscopy, for DNA sequencing, 46scleroderma, 64Seiko, automation of DNA sequencing, 48, 138Sendai virus, 31SeQ, Ltd., physical mapping project, 108serine, codon, 23sex cells

mutations, 25progenitors, 26

sickle cell disease, 4, 28, 56, 57, 58, 88Singer, Maxine, 126Sinsheimer, Robert, 100, 126Smith, Cassandra, 41Smith, Lloyd, 126somatic cell hybrid lines, 31 n.2, 37somatic cell hybridization

cDNA mapping by, 30genome mapping applications, 27, 30-31, 33, 56of single copies of chromosomes, 31

somatic cellsdefinition, 21mutations in, 25, 31 n.2

South Africa, genome research, 133, 149, 195Soviet Union, genome research, 133, 149, 195Sulston, John, 147-148superoxide dismutase, 64

Task Force for Biotechnology Information, 141Tay-Sachs disease, 4technology development

costs, 183NIH grants for, 94-95NRC recommendations for, 107private sector role in, 108see also automation; robotics

technology transferadvantages of national consortium for, 14-15, 121-122congressional encouragement of, 166economic implications, 165, 122, 172ethical issues, 87-88, 173-174international, 172-174military applications, 174national prestige issue, 174patent/copyright policies, 16-17, 82, 88, 122, 166-170strategies for improving, 16trade secrets, 170-172

Technology Transfer Act of 1986, 14, 16, 167thalassemias, 57, 134threonine, codon, 23thymidine, 21-22thyroglobulin, 61Tinoco, Ignacio, 101tissue p]asminogen activator, 64Trademark Clarification Act of 1984, 167transcription

of DNA into mRNA, 23-24, 25, 30of introns, 25, 30

transfer RNAfunctions, 23, 30structure, 23

translation of mRNA into protein, 23-24, 30Trivelpiece, Alvin, 101tryptophan, codon, 23tumor necrosis factor, 62, 64Turner’s syndrome, 32, 64tyrosine, codon, 23

United KingdomDNA sequencing, 44equipment development, 48, 147-148expenditures, 147Medical Research Council, 42, 44, 147national genome research effort, 147-148nonhuman genome mapping, 8, 42, 147published genome research, 133, 195

University of California at Los Angeles, clone library, 42University of California at Santa Cruz, workshop, 6, 100University of Copenhagen Institute of Medical Genetics,

143University of Helsinki, 144University of Manchester Institute of Science and Tech-

nology, 48, 148University of Wisconsin at Madison, physical mapping of

E. coli genome, 41

valine, codon, 23vectors

cloning of, 36, 38-39cosmid, 36-37, 38, 42, 56, 101plasmid, 36-37, 38, 47, 56, 67, 101robotic devices for producing, 47

viral infections, 64

Wada, Akiyoshi, 152, 156Walsh, James, 126Washington University

collaboration with RIKEN, 157physical mapping of yeast genome, 41

Watson, James D., 3, 21, 126, 152Weinberg, Robert, 126Wexler, Nancy, 135-136White, Raymond, 29, 126, 146Wilson, Allan, 126workshops

on automation of DNA sequencing, 47on collaboration for genome projects, 187on costs of genome projects, 188DOE-sponsored, 6, 97European Economic Community, 141International Human Gene Mapping, 29, 106, 144, 157information management system applications, 97on materials repositories and databases, 97Matrix of Biological Knowledge, 193-194NIH-supported, 96-97OTA, 187-188University of California at Santa Cruz, 6, 100

wound healing, 64Wyngaarden, James, 93

Page 214: Mapping Our Genes - Genome Projects: How Big? How Fast?

218

yeast chromosomes lack of introns in, 69artificial, 36, 39, 43, 56, 157 pilot project on sequencing, 140electrophoretic separation of, 37 Saccharomyces cerevisiae, 41-42, 47genome mapping, 4, 41-42, 157 similarities to other organisms, 67haploid DNA content, 25 use to isolate human gene functions, 67, 68