6

Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

Embed Size (px)

Citation preview

Page 1: Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

Talking One Genetic Language;The Need for a National Biotechnology Information Center

• Thick mucous clogs the little boy’slungs. The doctors suspect cysticfibrosis but hesitate to start difficulttherapies until they are sure of theirdiagnosis. Geneticists are called in todo a probe. By mixing a sample ofthe boy’s blood with a piece of DNA(deoxyribonucleic acid) they deter-mine that a crucial gene is missing.The test is positive; the diagnosis ismade; antibiotic and other treatmentcan begin.

What matters is the arrangement ofthese chemicals, or their exact se-quence in various combinations, alongthe backbone of the DNA molecule.For if the four different DNA bases arethe letters of nature’s alphabet, thechemicals’ arrangement in groups ofthree is a sort of genetic code—dotsand dashes that spell out all the in-structions the body needs to manufac-ture the myraids of proteins whichbuild our bones, our muscles, theenzymes which catalyze our metabolicprocesses, and all in all make us humanbeings marvelously different individuals.

If researchers can read and under-stand the language of heredity—if theycan learn the sequence of bases in agene—they can determine the makeupof the protein for which the gene isthe blueprint. And they can clone thatprotein outside the body. Or find outwhy certain genes are switched on, oroff during a lifetime. Or which onesare defective, or even missing.

When scientists first began translatinggenetic messages, they had a difficulttime. But in the past few years, theyhave gained powerful new automatedtools with which to decipher andanalyze, or sequence the messages ofheredity. And they have developedways systematically to change DNA andother important molecules, that is tomanufacture new genes and perhapsrepair defective ones, As they have

Biotechnology: NewGenetic MiraclesBecause of Biotechnology, a termcoined less than five years ago:• A man with diabetes is thriving on a

new, life sustaining medicine. He hadhad a serious, potentially fatal,allergic reaction to various forms ofinsulin (traditionally extracted from abeef or pork pancreas).

Now he is being treated with purehuman insulin, manufactured not inthe body, but in the laboratory. Tomake it, scientists inserted “good”human insulin genes into fast grow-ing bacteria, which then churned outgreat quantities of insulin hormone,

• New hope has come to a womanwith hairy cell leukemia, a rarecancer. Her doctors are giving her in-fusions of alpha interferon, anothernew substance manufactured in thelaboratory by splicing genes intobacteria. The results are amazing: theleukemia is simply disappearing.

And tumors in the bodies of over50 patients—including lung andcolon tumors notoriously resistant toconventional therapies—shrank by atleast half when they were treatedexperimentally, with “interleukin 2.”This genetically engineered substanceturns the body’s white blood cells —

often in short supply—into special-ized cancer killing cells,

Biotechnology: RootsNew natural drugs and vaccines. Lifesaving therapies. Pain-savinginterventions.

All these are the result of Biotech-nology-research and development in-volving the all important molecules thatcontrol our life processes—how ourbodies grow, how we age, whether wesuffer a host of mental and physicaldiseases. Its central focus is DNA, thelong, twisted threads in the nucleus ofeach of our ten trillion cells.

Scientists have known for somethirty years that genes are essentiallypieces, or chemical subunits, of DNAcarrying the messages of heredity inthe strands of the famous double helix.A string of thousands of these units,which themselves are groups of atomscomposed of four different nucleotidebases—A (adenine), T (thymine), C(cytosine), and G (guanine)—makes upa gene.

US. DEPARTMENT OF HEALTH AND HUMAN SERVICES • Public Health Service • National Institutes of Health

Page 2: Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

2

done so the sciences of molecularbiology and molecular genetics havebecome Biotechnology.

All this effort, all this informationwould be useless without computeriza-tion. There is no way even an Einsteincould analyze, store or manage it witha pencil and a human brain. As infor-mation has poured out of thelaboratories, factual research data baseshave been developed to store it, Thereare now about a dozen such data bases,set up for the most part by researcherswith an avocational interest in com-puters. The chart on the next pagedescribes these data bases.

For scientists depend on the databases actually to accomplish theirresearch. They must tap into a base likeGenBank to find matching sequences.

For example, they had known for adecade or so that oncogenes transformnormal cells into cancer cells. Wonder-ing why Nature would put the seeds ofour own possible destruction within usthey searched the data bases anddiscovered that the oncogenesequences matched those of a normalhuman growth gene. This suggests thatcancer may be caused by a normalgene being switched on at the wrongtime. And that they might some dayfind a way to switch it off.

But the different data bases use dif-ferent information systems—differentcomputer languages, and these haveresulted in a veritable Tower of Babel.No scientist in the world can knowenough about each of these computersystems to tap into all of them,

Even if such a genius appeared, thedata bases lag so far behind that he orshe might miss a discovery alreadymade. And society might miss a newdrug like Captoril, which controls highblood pressure—designed by knowingwhat a molecular target in the bodylooks like, and synthesizing a moleculethat attaches to that target. Or a just-announced test that uses geneticmaterial to detect the AIDS virus inblood samples.

How BiotechnologyInformation is UniqueThe complexity and size ofBiotechnology information astound theimagination and make it different fromother scientific information. And thisinformation is growing today at anamazing rate.

Within every one of us, tens ofthousands of individual human genescontrol special life processes. Threebillion units of DNA make up thehuman genome (all human genes takentogether), and only .01% of them havebeen sequenced. With currenttechnology, sequencing these threebillion units could consume 30,000person-years and upward of $2 billion.Fortunately, current technology doesnot stand still.

Technical advances—including waysto separate large molecules and indi-vidual chromosomes—have upped therate of analysis about ten times in thelast decade. A molecular biologist at atechnically advanced laboratory todaycan sequence about 300 DNA units aday. And now scientists at CalTech havereported the successful development ofan automatic DNA sequencing machinethat again speeds sequencing tenfold(and cuts the cost for each base inhalf—from one dollar to fifty cents).

The BiotechnologyInformation GapThe problem is simply stated but hardto solve. The data bases are swamped.Take the best known, GenBank, set upunder contract by NIH and co-fundedby a number of Federal agencies. Bymid-1986, only 54% of the datapublished in 1985 had made it into thedata base.

This is because the rate of publica-tion has grown so rapidly—from onesequence composed of 76 bases in1965 to a total of 11,552 sequencescomposed of 9,924,741 bases by 1986.The contents of GenBank by year ofpublication are in the table on page 4.

What is at stake here is not merelythe convenience of researchers search-ing for scientific papers. What is atstake is the progress of Biotechnologyaround the world—our understandingof life at a molecular level and henceour knowledge of health and disease.

Page 3: Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

3

BIOLOGY KNOWLEDGE BASES Needed: a NationalBiotechnologyInformation CenterThe whole biotechnological informationsystem is so overloaded that there is adanger scientific progress may grind toa halt. To prevent that from happening,and to move the field along faster, weneed a central repository for storingand sharing the information resultingfrom genetic research. With thisproblem in mind, Rep. Claude Pepperintroduced a bill (H.R. 5271, 99thCongress; reintroduced in the 100thCongress as H.R. 393) to create aNational Biotechnology InformationCenter, at the National Library ofMedicine.

Working with the laboratories fromwhich information comes, experts atsuch a center would seek to coordinatedata as it is accumulated—to store,process and make it available to theresearch community, nationwide.

At such a center, too, computerscientists—with the help of some ofthe world’s outstanding molecularresearchers working next door atNIH—would create new computerinformation systems so that investigatorsthroughout the country could ask ques-tions and get answers quickly. Theywould encourage consistent ter-minology for researchers and data basesso that research results entered intocomputer systems could be shared andbe made widely available.

Cells and TissuesATCC Cell/Tumor BankHybridoma Data Bank

Cell/Tissue Protein ArraysProteus Technologies, Inc.Protein Databases, Inc.

Chromosome LibrariesLos Alamos/Livermore Banks

Cytogenetic MapsCytogenetics Database

Genetic MapsHuman Gene Library (Yale)Human Gene Map (SHG)Mouse Map (Jackson Labs)Genetic Maps (NIH)

Restriction MapsGenetic Maps (NIH)

— Gene MapsGenetic Maps (NIH)

DNA SequencesGenBank (NIH)EMBL Bank (Europe)

mRNA SequencesGenBank (NIH)EMBL Bank (Europe)

Protein SequencesProtein Resource (NBRF)Japan Protein Bank

Protein StructuresBrookhaven X-Ray DatabankCrystallographic Data Centre

Mutagens/Carcinogens & Drugs:Interaction with DNA & Proteins

Page 4: Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

4

In the new information systems,research would be stored in such away that data retrieved from onesource could be linked to other relatedfindings, and to other research databases. So investigators could ask onecomputer a question and that computerwould automatically search for theanswer not only in its own knowledgebase but in other research bases aswell.

GenBank Contents by Selected Year of Publication(as of August 1986)

Why an InformationCenter at the NationalLibrary of Medicine?Developing such a national center atthe Library would, of course, preventduplication. It would enable scientiststo work in a more collaborativestyle—conferring frequently to avoidreinventing the wheel.

Most important, it would giveresearchers the means they need to dotheir job. Now each of them is essen-tially alone, working on his or her ownpiece of the genome puzzle. What theyneed is something they cannot get inany regional or university center: anational “board” on which they can fitthat incredibly complex puzzle together.

Starting a national center at theLibrary would be a cost effective,logical move. For it would build on theunquestioned leadership of the libraryin biomedical information. Nor wouldit be necessary to build a new facility,or to invest in extensive computerequipment to get fast results.

This is because the Library is alreadydeeply involved with the communica-tion of biotechnology research findings,through its biomedical literature andcomputer data bases. Over 97% of allpublished research containing DNA se-quences can already be found amongthe 6 million references in theMEDLINE (MEDical literature onLINE)system. The Library has twenty-twoyears of experience in building andmaintaining large computer files forbiomedical researchers and health careprofessionals.

Many of the tools and techniquesdeveloped by the Library to classify in-formation and biomedical literature canbe applied to the understanding of thelanguage of molecules. For example, the

computer methods used to search formatching words and phrases in themedical literature can be applieddirectly to the finding of matching pat-terns of molecules in different DNAgene sequences.

The Payoffs: TowardBetter Health andLess DiseaseThe functions proposed for theNational Biotechnology InformationCenter seem to be absolutely necessaryto do the genetic research job that hasto be done in the years ahead.

We need such a Center to preventduplication and to get more bang forour research bucks. We need it to fit

Year Bases Sequences Av Seq Len % of Published

1965*

76 1 76

1970*

249 3 83

1975*

6,160 54 114

1980*

439,717 852 516 90

1983 1,929,368 1,630 717 801984 2,569,480 2,475 1,038 751985 2,329,394 1,957 1,190 541986 290,708 161 1,806 4

Total(1965-1986)

9,924,741 11,552 859 82

Page 5: Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

5

together the pieces of the genetic puz-zle and acquire the knowledge thatwould benefit humankind in manyways.

With such knowledge, we coulddevelop deeper understanding of theroot causes of diseases that still plagueus. This is true of diseases we knoware inherited, of course, like Sickle CellAnemia or Thalassemia.

But scientists feel that our likelihoodof falling victim to the common andserious diseases, like heart disease,cancer, and arthritis, may well be in-flux need in complicated ways by theworkings of our genes. Researchers aimnow to learn more about these genes,and their relation to each other.

From this deeper understandingwould come exciting new cures andinterventions:• New vaccines to prevent diseases and

newer and more effective drugs totreat them;

• Gene therapy to cure even pre-viously fatal disorders;

• Early warning for such diseases asHuntington’s Chorea;

• Control of such dreaded conditionsas Alzheimer’s Disease;

• Better health generally through moreplentiful food supplies—-gained byinserting genes to make new plantspecies resistant to insects andherbicides.

Down the road a bit we can envision ascenario like this:

A single defective gene has left alittle girl with a horrible immunologicaldisorder called ADA (AdenosineDeaminase Deficiency). She isemaciated, and racked with pain.Because her body has absolutely nodefenses against infection, she has hadpneumonia almost all her life. Similar,luckier, patients have been helped bytransfusions from a brother or sister’sbone marrow that could be matchedwith their own. But only 3 out of 10patients have such siblings, and this girlis not one of them.

Now this patient is about to receivean exciting form of gene therapy.Building on successful research results,the doctors plan to insert a normalgene into her bone marrow cells. Theybelieve this treatment will produce anenzyme to correct her condition, andcure her.

Another scenario: A vigorous youngman has developed a cancer of thelymph nodes. A genetic probe showsthat the disease is due to the switchingon of an oncogene in his body. Doc-tors inject a medicine containing arepressor molecule, and this switchesthe oncogene off. He recoversimmediately.

Such miracles, such real cures, areexpected from biotechnology research.This is the work the NationalBiotechnology Information Center is toassist.

Page 6: Talking one genetic language - Digital Collections ... · tools withwhich to decipher and ... The chart on thenextpage ... Many ofthetools and techniques developedby the Library toclassify

NATIONAL LIBRARY OF MEDICINE8600 Rockville Pike

Bethesda, Maryland 20894