Functional Genomics in Plants.pdf

Embed Size (px)

Citation preview

  • 7/27/2019 Functional Genomics in Plants.pdf

    1/5

    Functional Genomics inPlantsJeffrey L Bennetzen, Purdue University, West Lafayette, Indiana, USA

    Functional genomics refers to a suite of genetic technologies that will contribute to a

    comprehensive understanding of plant gene function.

    Introduction

    The overall goal of genomics is to identify all of the genes inan organism, and then to determine the functions of thesegenes within the organism. Genomic technologies aremostly not new, but involve a highly increased scaleapplied to traditional genetic and molecular approaches.Structural genomics, for instance, involves identifying allof the genes within a single species by the sequencing of

    large collections of complementary DNAs (cDNAs) and/or total genome sequencing. Understanding the actionsand roles of all of the genes in an organism is a much moredaunting task that will occupy biologists for many decadesto come. Functional genomics refers to a suite of genetictechnologies that will contribute tremendously to acomprehensive understanding of gene function, as willconcurrent studies in other areas of biology (e.g. physiol-ogy, biochemistry, ecology, etc.). Many plant species arereceiving some genomic characterization. This array ofgenomic characterizations, and comparisons with resultsfrom other biological kingdoms, will allow a uniquelyvaluable set of insights into what genetic functions are

    shared by eukaryotes,which are shared only by plants, andwhich are unique to individual lineages or species.

    Expression Arrays

    Investigators from many different laboratories haveundertaken analysis of expressed genes by the comprehen-sive cloning and sequencing of cDNA copies of messengerRNA (mRNA) molecules. From a genomics perspective,these sequenced cDNAs can be considered as expressedsequence tags (ESTs) that provide the raw material for

    simultaneously analysing the expression of all of the genesin an organism. Because many genes are rarely or barelyexpressed, while others are expressed at very high levels insome tissues, the random cloning and sequencing of ESTsis not likely to yield all of the 25000 or more genes within adiploid plant species, even when several hundred thousandcDNAs have been sequenced. Genomic sequencing,followed by various pattern-recognition approaches togene identification within raw DNA sequence, can be usedto find genes that were missed by the EST approach. This

    approach to gene identification, eschewing the traditionaone-gene-at-a-time mindset of traditional genetics, predicated upon the idea that one should best study thexpression of genes after they are all identified. SignificanEST projects have been undertaken with a large number odifferent plant species, in both the public and privatsectors. Particularly comprehensive projects are nohighly advanced in soyabean, tomato, alfalfa, maize, rice

    wheat and a model weed, Arabidopsis thaliana.With respect to the study of gene expression, th

    techniques of structural genomics hope to identify th25 000 or more genes that are expected for a diploid planOnce sequenced, these 25 000 genes can be individuallattached to various types of structural supports, commonly a glass slide, by a robotic arraying device (Figure 1This slide then represents a unigene set, where fragmentrepresenting each of the genes within an organism can bused to measure the level of expression of that gene in antissue, at any time in development, and in response to aninternal or external signal. These slides are usually callemicroarrays. In plants, scores of different species are slate

    for microarray analysis. These studies will proceed first ithose species that have active EST projects, and academilabs will provide unigene sets as a service to the scientificommunity for several organisms, including Arabidopsand maize.

    Microarrays can be hybridized to labelled RNA, and thresults quantitated for each fragment on the sliderepresented as an individual spot. Various RNA labellinprocedures can be utilized, but the representation omRNAs by reverse transcription with a fluorescentllabelled deoxynucleotide is particularly useful. Versensitive microfluorimeters have been designed to scahybridized microarrays, allowing detection across thre

    orders of magnitude, with an ability to differentiattwofold differences in expression levels (Richmond anSomerville, 2000).

    Key advantages of a microarray system for measuringene expression are (1) all of the genes can be measured, iunison, in a single experiment, (2) the amount of samplRNA needed to prepare labelled probe is fairly low, so thasmall tissues or regions of tissues can be analysed, and (3the data can be quantitated with a relatively high level oaccuracy. However, there are also a large number o

    Article Contents

    Secondary article

    . Introduction

    . Expression Arrays

    . Reverse Genetics

    . Genomic Sequencing, Annotation and Comparative

    Analysis

    . Summary

    ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/27/2019 Functional Genomics in Plants.pdf

    2/5

    potential problems with microarrays, including the factthat multiple different genes (members of the same genefamily) will often cross-hybridize, thereby leading to asingle spot that hybridizes to more than one gene product.In order to avoid this problem, some projects have usedchips that contain thousands of short oligonucleotides,and single mismatch controls, which should be unique toindividual genes.

    Beyond the various technical difficulties, however,neither microarray nor DNA chip studies inform theinvestigator as to de novo synthesis rates, RNA turnover

    rates, precise tissue of expression, or the quality (e.g. size,degree of intron excision, etc.) of the mRNA from any ofthe genes that are being expressed. Hence, microarrays canbest be used to identify a comprehensive set of candidategenes whose expression can be more carefully measured insubsequent studies by nuclear run off, S1 protection, in situhybridization or other technologies. Moreover, the mostimportant question in understanding gene expression isknowing the actual tissues and times in which an activeprotein is present, a phenomenon that does not always

    mimic mRNA levels. Traditional studies of proteiexpression patterns and enzyme activity are needed tanswer this question, including the use of proteomics fothe high throughput identification of the proteins presenin a given tissue sample.

    The power of microarray studies has been demonstrateby the large number of genes that have been discovered tbe associated with a given biological process. Often thesprocesses had been so extensively investigated by varioudifferential cloning technologies that investigators thoughthey had found most or all of the associated genes, ye

    microarray studies uncovered numerous additional loc(Richmond and Somerville, 2000).

    A final challenge to microarray analysis lies in thinterpretation and display of the huge volume of data thacan come from these experiments. One approach has beeto lump together sets of genes that respond in similar timframes or tissue patterns to a particular signal or time idevelopment. Their similarity in response would suggestheir involvement in related processes and/or their activation by related signal transduction pathways. Another wa

    PCR amplification

    Label transcripts

    mRNAA

    AAAAAAATTTTTT

    T

    mRNAB

    AAAAAAATTTTTTT

    mRNAC AAAAAAA

    TTTTT

    TT

    AAAAAAA

    TTTTTTT

    mRNAB

    Hybridize

    Printing Microarray

    Fluorimetricanalysis

    AB

    C

    Figure 1 Microarray technology for the comprehensive assessment of gene expression. Individual plasmid clones containing different genes or genefragmentsat upperleft have their inserts amplifiedby thepolymerasechainreaction(PCR) andthe fragmentsare individuallydotted onto a glass slide by

    gridding robot. DifferentmessengerRNAs (mRNAs) in a totalRNA populationare labelledby reverse transcription usingfluorescently labelled nucleotidand an oligo dT primer to start the labelling reaction (lower left). The labelled cDNA copies of the mRNAs are then hybridized to the microarray slide.

    quantitative assessmentof mRNAamounts in the original sample is indicatedby the relative intensity of the hybridizationto the microarray.In the exampshown, the fragment homologous to mRNAB has twice the intensity because it was twice as abundant as the other two mRNAs in the sample. The othegenesrepresentedon thegridshowedno hybridization,indicatingthatthesegenes werenot expressedin thetissuethatwas thesource ofthe sampleRN

    Functional Genomics in Plants

    2 ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/27/2019 Functional Genomics in Plants.pdf

    3/5

    to investigate the degree of response-relatedness for a set ofgenes is to investigate how mutations in a particular geneaffect the expression of the other genes in the sameorganism. One can imagine a nearly infinite array ofexperiments that investigate these questions. It will beinteresting to see how many expression patterns and genesets come from these studies, and to what degree

    commonalities are observed between species and betweenenvironmental, hormonal or developmental signals.

    Reverse Genetics

    The only way in which a particular gene can be proven todetermine a particular phenotype is by finding thealteration in phenotype yielded by a mutation in that gene.Traditionally, geneticists started with a phenotype thatthey wished to study (for instance, flower development)and then used mutagenesis to find genes that were involved

    in the process. This approach could be very slow, but hadthe advantage that it required no additional informationabout the process. In some cases, no mutation was found,suggesting that the process may be determined by genesthat are redundant in function. The question of redun-dancy is particularly problematic in flowering plants,because many angiosperms are derived from recentlypolyploid parents. Hence,multiple nonallelicloci willoftenexist for any biological process, thereby making it difficultor impossible to identify a phenotype associated withinactivation of only one locus.

    In the last few years, plant researchers in the public andprivate sectors have begun to use reverse genetic ap-

    proaches to identify mutations in candidate genes thatappear likely to be involved in a particular process. Forinstance, these candidate genes might be identified by amicroarray analysis as genes that are only expressed duringflower development. If so, then it is likely that those genesplay a role in flower development, and that mutations inany suchgene would affect the phenotype of the developingand/or mature flower. Instead of the random mutagenesisand careful screening of traditional genetics, reversegenetics technology uses specific mutagenesis of a targetgene, followed by even more careful screening for anypossible resultant phenotype. In plants, the two majorapproaches to reverse genetics involve tagging with mobile

    DNA insertions or epigenetic inactivation with homo-logous sequences.

    Transposable elements have long been useful tools inplant genetics, since their discovery by Dr BarbaraMcClintock in the 1940s. Although transposable elementsdiffer in the degree and nature of their insertion specifi-cities, a few (like Mutator of maize) appear to insert inessentially any gene, at fairly similar frequencies. Hence, alarge population of independent Mutator maize lines islikely to contain individuals with insertions in essentially

    any gene. The challenge is to find the plant that has ainsertion in your particular candidate gene. Figure 2 depictone strategy that is used to find a specific insertionamutation. Oligonucleotide primers are made to an end othe transposable element and to the candidate genGenomic DNA from pools of different Mutator lines arscreened by polymerase chain reaction (PCR) using thes

    two primers, under conditions where an amplificatioproduct is seen only if the two primers are within 12 kb oeach other, in opposed orientation. The first pools macontain, for instance, aliquots of DNA from 100 differenplants, thereby making it likely that a Mutator insertion ithat gene willbe found inone out of about every 100 poolAliquots of DNA from individual plants within the poocan then be screened to see which contains the insertionThe size of the DNA amplification product also indicatethe approximate location of the insertion within (or neathe gene, and additional primers can be used to screen foinsertions at different sites within a large gene. Once maize line is identified with an insertion in the candidat

    gene, then seed from this line can be planted, and thinvestigator can look to see if any phenotype in this lincosegregates with the insertional mutation. AlthougMutator of maize was the first system used for insertionareverse genetics (Martienssen, 1998), the T-DNA transferred fromthe bacterium Agrobacteriumtumefaciens to iplant host has also been very useful, particularly iArabidopsis (Krysan et al., 1999), and several other reversgenetic systems of this type are currently under development in several plant species.

    In dealing with redundant genes, reverse genetics caidentify mutations in a single gene with a high enougfrequency that an investigator can eventually find inde

    pendent mutations in each member of a gene family. Thesdifferent mutant lines can then be crossed to generatindividuals that are homozygous for insertional inactivations in most or all gene family members, thus indicatinthe phenotype of such a general inactivation. Anotheapproach to determining the function of a gene can be ttest the phenotype of plants that overexpress the gene, oexpress it in the wrong tissue and/or at the wrong time idevelopment. This can be accomplished by a type oinsertional mutation as well, using mobile DNAs thacontain a strong promoter or enhancer that activateadjacent genes (Weigel etal., 2000). Alternatively, this typof phenotype test can be conducted by construction of

    transgenic plant that contains the targeted gene engineerewith a promoter from a gene with a different transcriptional activity.

    Expression of a standard gene sequence from thinappropriate, or antisense, strand can be simply accomplished by transforming into a plant a structural gene thacontains a new promoter engineered to transcribe in thopposite direction, starting at the normal 3 terminus of thgene. Antisense expression has been shown to decrease thamount of mRNA that is now available for translatio

    Functional Genomics in Plants

    ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/27/2019 Functional Genomics in Plants.pdf

    4/5

    from the wild-type gene in the same nucleus, largely byleading to the formation of double-stranded RNAs(dsRNAs) that are rapidly degraded. In both plants andanimals, it has also been observed that overexpression of asense transgene (i.e. one transcribed in the normaldirection) can decrease final mRNA levels of all genes inthe same nucleus that have extensive sequence homologywith the transgene. This so-called sense suppressionappears to occur at both an RNA level, inducing apparentdsRNA production and subsequent dsRNA turnover, andat the DNA level, associated with DNA methylation and

    decreased transcription of the nuclear genes. Theseepigenetic changes, although not an actual mutationalchange in DNA sequence, provide a phenotypic copy(phenocopy) of a mutation because they decrease the geneproduct that is produced. In practice, the investigator cantransform sense or antisense constructs of their targetedgene into a plant, and then determine which progeny havelower final levels of the candidate genes mRNA. Then,these plants can be scored for a new phenotype to see whateffect that mRNA change may have had and, hence, the

    role of the gene. As in the case for reverse genetics binsertional mutation, the investigator must also determinthat the phenotype cosegregates with the lowered mRNAlevel, to be sure that the phenotype is due to the actuaepigenetic change that has been engineered.

    Viral vectors have recently been developed that allowefficient epigenetic inactivation without the need fogerminal transformation (Ruiz et al., 1998). Infection witan engineered virus that has homology to a normal cellulagene can lead to a loss of translated mRNA for that genfrom any tissue that the virus infects. Like germina

    suppression by a transgene, this approach can (in theorylead to loss of function by several homologous genes in thsame family within the plant.

    A third technique for gene inactivation involves actuagene replacement, using homologous recombination andor DNA repair to replace a wild-type version of a gene wita mutant version that has been engineered in vitroAlthough some promising avenues are being investigatedthis is not yet a workable general approach in plants. Athis stage, the large amount of total nuclear DNA in plant

    Line 1

    A B C

    Line 2

    Line 3

    Line 4

    Line 5

    Line 6

    PCR

    1 2 3 4 5 6

    Gel analysis

    Find homozygousprogeny in

    lines 2 and 5

    Find homozygousprogeny in

    lines 2 and 5

    Scorephenotype

    Figure 2 Detection of an insertional mutation for use in reverse genetic analysis of gene function. Using short oligonucleotides (small black half arrowfrom oneend of an insertional DNAand onefrom insidethe targeted gene (gene B) as primers, polymerase chain reaction (PCR) is performedon poolsplantDNA from a population inwhichnewinsertionsby this mobileDNAoccurat a reasonable frequency. Even with a very activemobileDNA,insertions

    anyparticulargeneoutofthe25000ormoreinaplantwillbequiterare,soonlyafewpoolswillshowanamplificationproduct.ThesizeofthePCRproducdetermined by gelanalysis, alsoindicates wherethe insertionhas occurred. Once a pool is found with an insertion, then subpools or individual plantsfrothepoolare testedforthe insertionby thesamePCRprocedure.Oncean individual mutantplant is found,theinvestigator canrequestseedof this line fro

    theappropriate stock centreand canlookat progeny to seewhetherany mutantphenotypecosegregates with theinsertionalmutation. If itdoes,thenthinvestigator canuse eithercomplementation by transformation or thesimilar biologiesof differentinsertions in thesame gene to prove that the detecte

    phenotypeis causedby themutationin thecandidate gene.In theexample shown, a primeris employed for gene B, anddetects insertions (red boxes)lines 2 and 5. The insertion in gene C in line 4 is not detected because the distance between the primers is too great to allow PCR amplification.

    Functional Genomics in Plants

    4 ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net

  • 7/27/2019 Functional Genomics in Plants.pdf

    5/5

    has made homologous events very rare compared withnonhomologous (e.g. random) events.

    Genomic Sequencing, Annotation andComparative Analysis

    Over the last 10 years, the amount of DNA sequenceinformation available to any researcher has increasedexponentially. This rate of increase shows no signs ofslowing. Various databases contain genomic DNA se-quence from scores of organisms, large arrays of EST/cDNA sequence, and predicted protein sequences. In thisera, the first step researchers take toward predicting thefunction of a gene they have identified and cloned is tocompare the sequence of the gene to sequences present inthese databases. This homology scanning is so routine andeasy that most investigators do not stop to think that theyare performing a functional genomic test by comparative

    analysis. When a homologous sequence is found, then theresearchers have acquired an approximation of a potentialrole for this gene, if a function is known for thehomologous gene.

    Very few genes in any species are unique to that species,at least by a homology criterion. In comparisons of maizeand rice, for instance, two species that have diverged forabout 50 million years since their descent from a commonancestor, over 95% of the genes have homologues in eachspecies. This does not mean, however, that the genesperform exactly the same function in each species. At leasta few of these genes, although still perhaps very similar insequence, are responsible for the genetic differences that

    make each species physiologically and developmentallyunique. Small changes, particularly in gene regulation, canhave major eventual effects on phenotype. Perhaps themost interesting question in allof biology will be to identifythe genes, and the evolved changes therein, that areresponsible for the significant differences between anytwo species.

    Given this commonality of gene content and commonsimilarity in gene function, discovery of a close homologuein one species can provide very useful information to allother researchers interested in the same gene family.However, it is also possible to greatly misinterpret thisinformation. For instance, a research team could find that

    their newly discovered rice gene shows its highesthomology to a predicted protein kinase gene fromDrosophila that has been associated with response to cold.This does not mean, though, that the rice gene encodes aprotein kinase (although that is a testable hypothesis) orthat it is involved in any response to cold (also testable). Insome cases, a plant gene might be annotated as mostsimilar to a kinase gene (for instance) from another plant

    species, which was annotated by its similarity to anothegene, etc.It may be several steps of similarity (andtentativannotation) before any gene with a known function iactually found. In these cases, each additional annotatioshould be taken as being a bit more tentative. Only direcfunctional tests can determine the role that a genperforms, and all similarities to other genes only provid

    predictions of possible function. Of course, the morclosely related a homologous gene is in sequence and iorganism of origin, then the more likely that it will performa similar function.

    Beyond sequence analysis, comparative mapping haprovided a new tool to comparative genomics. If two genewith high sequence homology also map to colinealocations in their genomes, then it is much more likelthat they are directly descended from the same ancestragene, and hence have a similar function (Devos and Gale2000).

    Summary

    Gene identification, comprehensive gene expression, geninactivation or activation, and comparative analyseprovide a powerful set of tools for identifying the functionof plant genes. All of these tools are universal, and all argrowing synergisticallyin power as information is added tthe field. Because so many different plant species are beininvestigated, functional genomics will provide a uniquelbroad understanding of functional evolution in plantPerhaps the greatest challenge will be in developing ways t

    present and interpret the mountains of data that will bgenerated.Although we will not know the precise functionof all the genes in any plant species the ultimate goal oplant genomics for a very long time, our level oknowledge will continue to expand at unprecedented ratefor the foreseeable future.

    References

    Devos KM and Gale MD (2000) Genome relationships: the grass mod

    in current research. The Plant Cell12: 637646.

    KrysanPJ, Young JCand Sussman MR(1999) T-DNAas an insertion

    mutagen in Arabidopsis. The Plant Cell11: 22832290.

    Martienssen RA (1998) Functional genomics: probing plant ge

    function and expressionwith transposons.Proceedings of the NationAcademy of Sciences of the USA 95: 20212026.

    Richmond T and Somerville S (2000) Chasing the dream: plant ES

    microarrays. Current Opinion in Plant Biology 3: 108116.

    Ruiz MT, Voinnet O and Baulcombe DC (1998) Initiation an

    maintenance of virus-induced gene silencing.The Plant Cell10: 937

    946.

    Weigel D, Ahn JH, Blazquez MA et al. (2000) Activation tagging

    Arabidopsis. Plant Physiology 122: 10031014.

    Functional Genomics in Plants

    ENCYCLOPEDIA OF LIFE SCIENCES / & 2002 Macmillan Publishers Ltd, Nature Publishing Group / www.els.net