6
Method Genome-wide identication of osmotic stress response gene in Arabidopsis thaliana Yong Li a, 1 , Yanming Zhu a, , 1 , Yang Liu b , Yongjun Shu a , Fanjiang Meng c , Yanmin Lu c , Xi Bai a , Bei Liu c , Dianjing Guo b, a Plant Bioengineering Laboratory, Northeast Agricultural University, Harbin, China b State Key Lab of Agrobiotechnology and Department of Biology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong c Department of Computer Science, Northeast Agricultural University, Harbin, China abstract article info Article history: Received 5 May 2008 Accepted 18 August 2008 Available online 1 October 2008 Keywords: Articial neural network Cis-regulatory element Gene nding Osmotic stress Arabidopsis thaliana In this paper, we present a cis-regulatory element based computational approach to genome-wide identication of genes putatively responding to various osmotic stresses in Arabidopsis thaliana. The rationale of our method is that gene expression is largely controlled at the transcriptional level through the interactions between transcription factors and cis-regulatory elements. Using cis-regulatory motifs known to regulate osmotic stress response, we therefore built an articial neural network model to identify other functionally relevant genes involved in the same process. We performed Gene Ontology enrichment analysis on the 500 top-scoring predictions and found that, except for un-annotated ORFs (40%), 91.3% of the enriched GO classication was related to stress response and ABA response. Publicly available gene expression proling data of Arabidopsis under various stresses were used for cross validation. We also conducted RT-PCR analysis to experimentally verify selected predictions. According to our results, transcript levels of 27 out of 41 top-ranked genes (65.8%) altered under various osmotic stress treatments. We believe that a similar approach could be extensively adopted elsewhere to infer gene function in various cellular processes from different species. © 2008 Elsevier Inc. All rights reserved. Introduction Identication of genes that specically respond to internal and external cues remains one of the most compelling yet elusive areas in computational genomics. Currently, the commonly used gene nding approach is a consensus-based comparative analysis that relies on sequence homology among genes in closely related species [1]. However, this method has limited application because a considerable proportion of sequenced genomes still remain uncharacterized. Furthermore, the consensus-based method may not be an efcient way of identifying genes that are induced under specic environ- mental stimuli. Gene expression is largely controlled at the transcriptional level, where the interactions between transcription factors (TFs) and cis- regulatory elements in the promoter region of a gene play crucial roles [2]. Previous research suggests that functionally related genes tend to be co- regulated by similar sets of transcription factors. It is therefore theoretically possible to use cis-regulatory motifs known to regulate gene expression in a particular cellular process to identify other functionally relevant genes involved in the same process. When combined with experimental verication, this has been demonstrated to be an effective approach to genome-wide targeted gene identication [3]. Drought, high salinity, and low temperature are three major osmotic stresses that adversely affect plant growth, development, or productivity. Since all these three types of stresses can cause water decit, they are collectively termed osmotic stress. Osmotic stress elicits a dehydration response in plants that shares many common elements and interacting signaling pathways [24], which have been suggested to be Abscisic Acid (ABA) dependent [5]. Subsequent analysis of the ABA-regulated gene promoter region has led to the identication of several ABA-responsive elements (ABREs) [6,7]. Zhang et al. [3] reported a computational approach to identify putative ABA-responsive genes using conserved ABA-responsive element (ABRE) and its coupling element (CE). Using a similar cis-element based approach, other researchers have used promoters containing well-dened TF binding motifs for targeted gene nding in Drosophila melanogaster [8] and C. elegans [9]. Although this approach has undoubtedly been successful, researchers have so far used only one or two specically dened motifs for gene screening. In fact, a growing body of evidence suggests that functionally related genes tend to be regulated by a common set of regulatory proteins to form so-called transcription regulatory modules, which respond to internal and external signals. By organizing the genome into such modules, a living cell can coordinate the activities of many genes and carry out complex functions [10]. To trace gene function inference in complex cellular process such as stress response, more sophisticated approaches are required. Genomics 92 (2008) 488493 Corresponding authors. Fax: +852 2603 5745. E-mail addresses: [email protected] (Y. Zhu), [email protected] (D. Guo). 1 These authors contributed equally to this work. 0888-7543/$ see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2008.08.011 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno

Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

  • Upload
    yong-li

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

Genomics 92 (2008) 488–493

Contents lists available at ScienceDirect

Genomics

j ourna l homepage: www.e lsev ie r.com/ locate /ygeno

Method

Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

Yong Li a,1, Yanming Zhu a,⁎,1, Yang Liu b, Yongjun Shu a, Fanjiang Meng c, Yanmin Lu c, Xi Bai a,Bei Liu c, Dianjing Guo b,⁎a Plant Bioengineering Laboratory, Northeast Agricultural University, Harbin, Chinab State Key Lab of Agrobiotechnology and Department of Biology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kongc Department of Computer Science, Northeast Agricultural University, Harbin, China

⁎ Corresponding authors. Fax: +852 2603 5745.E-mail addresses: [email protected] (Y. Zhu)

1 These authors contributed equally to this work.

0888-7543/$ – see front matter © 2008 Elsevier Inc. Aldoi:10.1016/j.ygeno.2008.08.011

a b s t r a c t

a r t i c l e i n f o

Article history:

In this paper, we present Received 5 May 2008Accepted 18 August 2008Available online 1 October 2008

Keywords:Artificial neural networkCis-regulatory elementGene findingOsmotic stressArabidopsis thaliana

a cis-regulatory element based computational approach to genome-wideidentification of genes putatively responding to various osmotic stresses in Arabidopsis thaliana. The rationaleof our method is that gene expression is largely controlled at the transcriptional level through theinteractions between transcription factors and cis-regulatory elements. Using cis-regulatory motifs known toregulate osmotic stress response, we therefore built an artificial neural network model to identify otherfunctionally relevant genes involved in the same process. We performed Gene Ontology enrichment analysison the 500 top-scoring predictions and found that, except for un-annotated ORFs (∼40%), 91.3% of theenriched GO classification was related to stress response and ABA response. Publicly available geneexpression profiling data of Arabidopsis under various stresses were used for cross validation. We alsoconducted RT-PCR analysis to experimentally verify selected predictions. According to our results, transcriptlevels of 27 out of 41 top-ranked genes (65.8%) altered under various osmotic stress treatments. We believethat a similar approach could be extensively adopted elsewhere to infer gene function in various cellularprocesses from different species.

© 2008 Elsevier Inc. All rights reserved.

Introduction

Identification of genes that specifically respond to internal andexternal cues remains one of the most compelling yet elusive areas incomputational genomics. Currently, the commonly used gene findingapproach is a consensus-based comparative analysis that relies onsequence homology among genes in closely related species [1].However, this method has limited application because a considerableproportion of sequenced genomes still remain uncharacterized.Furthermore, the consensus-based method may not be an efficientway of identifying genes that are induced under specific environ-mental stimuli.

Gene expression is largely controlled at the transcriptional level,where the interactions between transcription factors (TFs) and cis-regulatoryelements in thepromoter region of a gene play crucial roles [2].Previous research suggests that functionally related genes tend to be co-regulated by similar sets of transcription factors. It is thereforetheoretically possible to use cis-regulatory motifs known to regulategene expression in a particular cellular process to identify otherfunctionally relevantgenes involved in the sameprocess.Whencombined

, [email protected] (D. Guo).

l rights reserved.

with experimental verification, this has been demonstrated to be aneffective approach to genome-wide targeted gene identification [3].

Drought, high salinity, and low temperature are three major osmoticstresses that adversely affect plant growth, development, or productivity.Since all these three types of stresses can cause water deficit, they arecollectively termed osmotic stress. Osmotic stress elicits a dehydrationresponse in plants that shares many common elements and interactingsignaling pathways [2–4], which have been suggested to be Abscisic Acid(ABA) dependent [5]. Subsequent analysis of the ABA-regulated genepromoter region has led to the identification of several ABA-responsiveelements (ABREs) [6,7]. Zhang et al. [3] reported a computationalapproach to identify putative ABA-responsive genes using conservedABA-responsive element (ABRE) and its coupling element (CE). Using asimilar cis-element based approach, other researchers have usedpromoters containing well-defined TF binding motifs for targeted genefinding in Drosophila melanogaster [8] and C. elegans [9]. Although thisapproach has undoubtedly been successful, researchers have so far usedonly one or two specifically defined motifs for gene screening. In fact, agrowingbodyof evidence suggests that functionally related genes tend tobe regulated by a common set of regulatory proteins to form so-calledtranscription regulatorymodules,which respond to internal and externalsignals. By organizing the genome into such modules, a living cell cancoordinate the activities of many genes and carry out complex functions[10]. To trace gene function inference in complex cellular process such asstress response, more sophisticated approaches are required.

Page 2: Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

Table 1GOBP description of top predictions using ANN computational model

GOBP term No. of gene P-value

Cold acclimation 11 2.49E−04Response to abscisic acid stimulus 18 6.45E−04Response to water 14 0.006Response to temperature stimulus 22 0.007Response to cold 16 0.008Response to water deprivation 7 0.013Development 28 0.015Response to hormone stimulus 18 0.015Response to abiotic stimulus 34 0.020Response to salt stress 5 0.034Seed development 7 0.035Response to chemical stimulus 25 0.035Reproductive structure development 7 0.039Actin cytoskeleton organization and biogenesis 4 0.041Response to stimulus 44 0.042

489Y. Li et al. / Genomics 92 (2008) 488–493

The Artificial Neural Network (ANN) is an information processingparadigm that takes its inspiration from the way in which biologicalnervous systems process information. It has been previously used fortranscription start site prediction [6,11,12] and translation initiationsite prediction [13]. In this study, we extend the ANN modelingapproach to plant functional genomics by identification of genesresponding to drought, cold, and salinity stress in A. thaliana. Wedemonstrate its efficacy by Gene Ontology enrichment analysis as wellas by RT-PCR analysis.

Results and discussion

In silico gene identification and gene ontology analysis

Using cis-regulatory motifs known to regulate osmotic stressresponse, we built an artificial neural network model to identify otherrelevant genes involved in the same process. The trainedmodel (Fig.1)was able to distinguish between genes that do and do not respond tostress, based on the motif patterns of gene promoters in the trainingdataset. We then applied the model to the candidate dataset to inferthe function of the unknown genes.

According to network theory [14,15], genes in a co-expressioncontext often share conserved biological functions. To investigate thesignificant functional annotation of our predictions, we selected the 500top-scoring predictions (Supplementary file 3, 4) and performed GeneOntology (GO) enrichment analysis (Tables 1 and 2). GO provides acontrolled vocabulary for describing genes and gene products in livingorganisms [16]. We used terms from “Biological Process” (GOBP), whichis one of the three broad GO categories (the other two being “MolecularFunction” and “Cellular Component”), to represent gene function. GOBPterms are organized into a directed acyclic graph (DAG) to reflect thehierarchical relationships between the terms. Parent GOBP terms aresubdivided into increasingly specific child GOBP terms. The GOBP term“response to stress” has 19 child terms, includingGO:0009409 [responseto cold], GO:0009408 [response to heat], and GO:0009414 [response to

Fig. 1. Training of ANN model. Bayesian regularization is a network training functionthat updates the weight and bias values according to Levenberg–Marquardtoptimization. It minimizes a combination of squared errors and weights, and thendetermines the correct combination so as to produce a network that generalizeswell. A. Error curves generated during network training. Validation set error (green)line and test set error (red) line follow a similar pattern, indicating a reliabletraining result. B. Weight curve stabilizes with the training, indicating that a stablenetwork topology is achieved. C. Change of parameters formed from a combinationof training error and weight during the training process.

water deprivation]. The GO enrichment analysis compares the annota-tion composition in the analyzed gene list to that of a population ofbackground genes. We used the DAVID default population backgroundin our enrichment calculation, representing the corresponding genome-wide genes with at least one annotation in the analyzing categories. Thedefault background is a goodchoice for studies in genome-wide scope orclose to genome-wide scope.

According to our GO enrichment analysis (Table 1), once un-annotated ORFs (∼40%) were discarded, more than 90% (73/80=91.3%)of the significantly enriched GO classification was related to stressresponse or ABA response. ABA is known to play a protective role inplant response to osmotic stress [17–19], and a large number of genesresponding to abiotic stress are also inducible by ABA treatment [19].The GO enrichment analysis clearly demonstrated that our cis-regulatory motif based computational model is a reliable method ofinferring gene function at the genome level.

Cross validation using gene expression profiling data

Since many functionally related genes display coordinated tran-scriptional regulation, large-scale gene expression measurements can

Table 2Top-scoring osmotic stress-inducible genes and experimental verification

Locus ID Description Treatment Foldchange

At3g29575 Similar to TMAC2 Salt +2.2At3g22380 Nucleus-acting plant-specific clock regulator Cold +5.6At2g40970 Myb family transcription factor Drought +6.3At2g39530 Integral membrane protein Drought −8.7At2g27990 BEL1-like homeobox gene Drought +10At2g26720 Plastocyanin-like gene Salt +2.5At2g25620 Protein phosphatase Drought +5.3At2g20370 Exostosin family protein Drought +6.7At5g64780 Hypothetical Salt −3.2At5g50920 Similar to ATP-dependent Clp protease Salt −2.6At4g28703 Hypothetical Cold +2.4At4g25520 SLK3 (SEUSS-LIKE 3); transcription regulator Salt −9.4At4g20990 Carbonic anhydrase family protein Salt −4.8At4g16155 Dihydrolipoamide dehydrogenase Cold +1.6At4g04870 Cardiolipin synthase Cold −5.7At2g01720 Ribophorin I family protein Cold −2.2At3g07390 Hypothetical Cold −3.6At4g21020 Hypothetical Salt +4.6At3g28380 ABC transporter Salt −2.8At5g32482 Similar to CCHC-type zinc knuckle family protein Cold +2.3At3g57280 Hypothetical Cold −3.6At3g45600 TETRASPANIN family member Cold +4.4At3g44730 Kinesin motor protein-related Cold −5.8At3g44410 Similar to disease resistence protein Salt +8.9At1g16540 Molybdenum cofactor sulfurase Salt +2.6At3g20720 Hypothetical Cold +3.1

Page 3: Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

490 Y. Li et al. / Genomics 92 (2008) 488–493

serve as a check point for our in silico prediction. Comparison ofour prediction with stress microarray data from AtGenExpressrevealed that about 30% of the top-scoring gene transcripts weresignificantly changed upon stress treatment (P-value≈0). However,microarray gene expression data, although a powerful tool for

Fig. 2. Comparative E-northern analysis of top-scoring genes vs. randomly selected genes. Grandomly selected gene). Rows represent genes and columns represent experimental conditlog2 of the ratio of the average of replicate treatments relative to the average of correspond

providing global transcriptome information, is notoriously noisyand discontinuous. Some genes that are rapidly and transientlyinduced by stress may not be detected by microarray analysis.At1g16540, which encodes molybdenum cofactor sulfurase, waspredicted as a stress response gene by our method. Although direct

enes were labeled by different IDs (“At———” for high scoring genes and “at————” forions, such as high salinity, cold, or drought stress etc. The color at a point represents theing controls.

Page 4: Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

Fig. 3. RT-PCR analysis of high scoring predictions. Four-week-old Arabidopsis seedlingswere subjected to high salinity, cold, or drought for 4 h, 48 h or 72 h respectively. RT-PCRanalysis was performed to measure the transcript abundance with Actin 2 as internalcontrol.

491Y. Li et al. / Genomics 92 (2008) 488–493

measurement of its transcript abundance is lacking, previousstudies have indicated that this protein may play an importantrole in regulating several genes relevant to stress, including RD29A,COR15, COR47, RD22, and P5CS [20,21]. However, published Arabi-dopsis stress microarray data have failed to record the differentialexpression of this gene. Another top-scoring gene At2g23430 (GO:0009737), which encodes a cyclin-dependent kinase inhibitorknown to respond to ABA stimulus [22], also failed to be detectedin the stress microarray data. Furthermore, gene expression at themRNA level does not always reflect the protein function due topost-transcriptional and post-translational regulation. Our methodmay serve as a complementary approach to gene expressionanalysis in stress response gene identification.

To provide visual evidence to support our computationalprediction, we generated an Electronic-Northern (E-northern) heatmap (Fig. 2) for 42 top-scoring genes (upper part of the image) and42 randomly selected genes beyond the prediction list (lower part ofthe image), using the Expression Browser tool of the Botany ArrayResource (BAR) [22] and the AtGenExpress stress data. From E-Northern analysis, we observed up-regulation and down-regulationof more than 60% (28 out of 42) of our predictions. Strong up- anddown-regulation among the genes in the training data was alsoobserved (data not shown). By contrast, randomly selected genesshow much less change upon stress treatment. Since a gene thatshows significant alteration at transcript level under stress condi-tions is likely to be involved in response to stress, we conclude thatthe regulatory motif in the promoter region is highly indicative ofgene function and can therefore be used for in silico geneidentification. The rapid development of cis-elements databasessuch as TRANSFAC [23], PLACE [24,25] and PlantCARE [26,27]provides a valuable resource that can be used to identify morecis-regulatory motifs relevant to cellular response to various internaland external stimuli.

Experimental validation and comparison with other methods

To further assess the effectiveness of our method, the Arabi-dopsis seedlings were subjected to various osmotic stress treat-ments (see Materials and methods) and the transcript abundance ofselected high scoring genes were monitored by RT-PCR analysis(Fig. 3). Gene-specific primers were designed based on the cDNAfull-length sequence. Overall, 27 of 41 tested genes, includingAt1g16540, which encodes molybdenum cofactor sulfurase, showedaltered level of expression compared to the control, giving aprediction accuracy of 65.8% for the top-scoring predictions(Supplementary figure 1). Table 2 summarizes the RT-PCR resultsof osmotic stress-inducible genes. Some of the tested genesexhibited varied expression patterns under different types of stresstreatment (data not shown), suggesting possible synergistic effectsof multiple transcription factors involved in various types ofosmotic stress. We believe that the identification of distinctiveclasses of transcription-factor binding motifs under specific stresscondition will facilitate more efficient computational discovery ofstress relevant genes. Knowledge of this kind will also play animportant part in helping us to understand the cross-talk betweendistinctive signaling pathways and the underlying regulatorymachinery of cellular stress response.

Cis-regulatory element based gene identification has beenreported in three previous studies [3,8,9]. The researchers con-cerned utilized one or two well-defined cis-regulatory motifs tosearch for functionally related genes and achieved various degreesof prediction accuracy (ranging from 34% to 72%). As a comparison,we used a set of diverse regulatory motifs (55 in total, supplemen-tary file 1) identified from promoters of both experimentallyvalidated and computationally predicted stress responsive genes totrain an ANN model. The learned model achieved a comparable

prediction accuracy of 67.8% for the top-scoring predictions in ourstudy. Our results, together with those from other groups, furtherdemonstrate that cis-regulatory motifs are highly indicative of genefunction. As more information on transcription factors and theirDNA binding information become freely available, we believe thatthis computational approach could be extensively adopted else-where to infer gene function in various cellular processes fromdifferent organisms.

Materials and methods

Stress responsive genes and cis-regulatory elements

Cis-regulatory elements in the promoter region of drought, salinity,and/or cold stress responsive genes were collected from publicdatabase PLACE [24,25], PlantCARE [26,27], and DoOP [28]. Othermotifs were collected by reference to previous studies in this field. Theredundant motifs were eliminated and a total of 55 cis-actingelements were collected for further analysis. A Bioperl module wasused to search for significant motifs occurring in the promoter region.P-value was calculated to confirm the significance of motif detection(Poisson distribution [16]).

Promoter sequences

Arabidopsis genome sequences were downloaded from TAIR [29].Transcription start sites (TSSs) were predicted using TSSP-TCMsoftware from Shahmuradov's group. When multiple TSSs werepredicted, the one closest to the ORF was chosen. For each givenTSS, we retrieved a segment from 500 bases upstream to 20 basesdownstream of the TSS for motif analysis (Supplementary file 2). Intotal, the TSSs of 18061 ORFs were retrieved.

Page 5: Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

492 Y. Li et al. / Genomics 92 (2008) 488–493

Scoring algorithm

A Bioperl module was used to search for significant motifsoccurring in the promoter region of reported stress responsivegenes. P-value was calculated to confirm the significance of motifdetection. The ANN toolkit in Matlab was used to establish a feed-forward cascade neural network model. For network training andsimulation, we retrieved the promoter region of 362 genesannotated as “response to drought, high salinity, or cold stress”according to Gene Ontology terminology [30,31] and used these as apositive dataset. The promoter sequences of a group of randomlyselected 1086 ORFs (3 fold of positive dataset) from the rest of thegene pool (not annotated as “response to stress or ABA treatment”according to GO) were used as a negative dataset. The number oftimes each cis-regulatory element appears in the promoter regionand the ratio of cis-element length to promoter length (we definedit as coverage) were taken as inputs for the network training.Principle component analysis was conducted to eliminate the inputnode with least effect.

Gene expression data analysis and GO enrichment

Microarray gene expression data was collected from AtGenEx-press [32]. The dataset includes global Arabidopsis transcriptomeprofile change over UV-B light, high salinity, drought and coldstress responses. The raw data was normalized using RMAExpress[32,33] and differentially expressed genes were detected using BRBArrayTools [34] (Pb0.05). The subset of differentially expressedgenes contains 3276 gene transcripts identifiers with significantlyhigher abundance at least once per treatment and per time-point.We performed GO term enrichment analysis using the softwaresuite DAVID 2007 (The Database for Annotation, Visualization andIntegrated Discovery 2007) [35]. The enrichment analysis comparesthe annotation composition in the analyzed gene list to that of apopulation of background genes. We used the DAVID defaultpopulation background, representing the corresponding genome-wide genes with at least one annotation in the analyzingcategories, in our enrichment calculation.

Plant materials, stress treatment, and RT-PCR analysis

Four-week-old seedlings were grown in a controlled environ-ment growth chamber under a 16 h light/8 h dark period, a photofluency rate of 3000 lx, and a temperature of 22 °C. For salinitystress treatment, seedlings were subjected to 250 mM NaCl andsamples were collected at 4 h, 12 h and 24 h time pointsrespectively. For cold stress and drought stress, seedlings wereincubated at 4 °C and 22 °C respectively in the dark and sampleswere collected at 24 h , 48 h and 72 h . Samples were also collectedfrom control plants grown under the same conditions for parallelcomparison.

Total RNA was extracted from whole seedlings of the controlplants and stressed plants using RNeasy Plant Mini Kit (Qiagen,Germany). First-strand cDNA was synthesized from 1.5 μg of totalRNA using super script II Kit (Invitrogen) (Invitrogen) with Oligo(dT)12−18 mer, in accordance with the manufacturer's instructions. PCRwas performed using 0.5 to 2 μL of the cDNA in a total of 25 μLreaction volume. The PCR conditions were 2 min at 94 °C, 1 min at94 °C for 29 cycles, 1 min at 58 °C, 1 min at 72 °C, followed by 5 minat 72 °C. These conditions were chosen because none of the samplesanalyzed reached a plateau at the end of the amplification (i.e. theywere at the exponential phase of the amplification). Actin was usedas a loading control, and loading was estimated by staining the gelwith ethidium bromide. Expression analysis of each gene wasconfirmed in at least 2 independent RT-reactions using forward andreverse primers. Scanned Molecular Dynamics gel images were

quantitatively analyzed using the AFLP-Quantar-ProTM imageanalysis software (Keygene, Belgium).

Acknowledgments

This project was supported by a grant from the National NaturalScience Foundation of China (30570990), and in part by grants fromthe Science and Technology Department of Heilongjiang province andthe University Grants Committee, Hong Kong UGC AoE plantAgricultural Biotechnology Project (AoE B-07/09).

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.ygeno.2008.08.011.

References

[1] M. Zhang, Computational prediction of eukaryotic protein-coding genes, Nat. Rev.Genet. 3 (2002) 698–709.

[2] A. Brivanlou, J. Darnell, Signal transduction and the control of gene expression,Science 295 (2002) 813–818.

[3] W.X. Zhang, J.H. Ruan, T.H. Ho, Y.S. You, T.T. Yu, R.S. Quatrano, Cis-regulatoryelement based targeted gene finding: genome-wide identification of abscisicacid-and abiotic stress-responsive genes in Arabidopsis thaliana, Bioinformatics 21 (14)(2005) 3074–3081.

[4] E.A. Bray, Molecular responses to water deficit, Plant Physiol. 103 (1993)1035–1040.

[5] K. Shinozaki, K. Yamaguchi-Shinozaki, Molecular responses to dehydration andlow temperature: difference and cross-talk between two stress signaling path-ways, Curr. Opin. Plant Biol. 3 (2000) 217–223.

[6] I. Mahadevan, I. Ghosh, Analysis of E. coli promoter structures using neuralnetworks, Nucleic Acids Res. 22 (1994) 2158–2165.

[7] P.K. Busk, M. Pages, Regulation of abscisic acid-induced transcription, Plant Mol.Biol. 37 (1998) 425–435.

[8] M. Markstein, et al., Genome-wide analysis of clustered Dorsal binding sitesidentifies putative target genes in the Drosophila embryo, Proc. Natl. Acad. Sci.U. S. A. 22 (2002) 763–768.

[9] A. Wenick, O. Hobert, Genomic cis-regulatory architecture and trans-actingregulators of a single interneuron-specific gene battery in C. elegans, Cell Dev.(2004).

[10] W.S. Wu, W.H. Li, B.S. Chen, Computational reconstruction of transcriptionalregulatory modules of the yeast cell cycle BMC, Bioinformatics 7 (2006) 421.

[11] M.G. Reese, Application of a time-delay neural network to promoter annotation inthe Drosophila melanogaster genome, Comput. Chem. 26 (1) (2001) 51–56.

[12] B. Demeler, G. Zhou, Neural network optimization for E. coli promoter prediction,Nucleic Acids Res. 19 (1991) 1593–1599.

[13] A.G. Pedersen, H. Nielsen, Neural network prediction of translation initiation sitesin eukaryotes, ISMB 5 (1997) 226–233.

[14] A.L. Barabasi, Z.N. Oltvai, Network biology: understanding the cell's functionalorganization, Nat. Rev. Genet. 5 (2) (2004) 101–113.

[15] A. Ubramanian, P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette,A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, J.P. Mesirov, From the cover:gene set enrichment analysis: a knowledge-based approach for interpretinggenome-wide expression profiles, Proc. Natl. Acad. Sci. 102 (43) (2005)15545–15550.

[16] N.H. Shah, D.C. King, P.N. Shah, N.V. Fedoroff, A tool-kit for cDNA microarray andpromoter analysis, Bioinformatics 19 (14) (2003) 1846–1848.

[17] L. Xiong, J.K. Zhu, Molecular and genetic aspects of plant responses to osmoticstress, Plant Cell Environ. 25 (2002) 131–139.

[18] K. Shinozaki, K. Yamaguchi-Shinozaki, M. Seki, Regulatory network of geneexpression in the drought and cold stress responses, Curr. Opin. Plant Biol. 6(2003) 410–417.

[19] Y. Narusaka, K. Nakashima, Z.K. Shinwari, Y. Sakuma, T. Furihata, H. Abe, M.Narusaka, K. Shinozaki, K. Yamaguchi-Shinozaki, Interaction between two cis-acting elements, ABRE and DRE, in ABA-dependent expression of Arabidopsisrd29A gene in response to dehydration and high-salinity stresses, Plant J. 34(2003) 137–148.

[20] F. Bittner, M. Oreb, R.M. Ralf, ABA3 is a molybdenum cofactor sulfurase requiredfor activation of aldehyde oxidase and xanthine dehydrogenase in Arabidopsisthaliana, J. Biol. Chem. 276 (44) (2001) 40381–40384.

[21] J. Kilian, D. Whitehead, J. Horak, D. Wanke, S. Weinl, O. Batistic, C. D'Angel, E.B.Bauer, J. Kudla, K. Harter, The AtGenExpress global stress expression dataset:protocols, evaluation and model data analysis of UV-B light, drought and coldstress responses, Plant J. 50 (2) (2007) 347–363.

[22] K. Toufighi, S.M. Brady, R. Austin, E. Ly, N.J. Provart, The Botany Array Resource: e-Northerns, Expression Angling, and promoter analyses, Plant J. 43 (1) (2005)153–163.

[23] V. Matys, TRANSFAC: transcriptional regulation, from patterns to profiles, NucleicAcids Res. 31 (2003) 374–378.

Page 6: Genome-wide identification of osmotic stress response gene in Arabidopsis thaliana

493Y. Li et al. / Genomics 92 (2008) 488–493

[24] K. Higo, Y. Ugawa, M. Iwamoto, T. Korenaga, Plant cis-acting regulatory DNAelements (PLACE) database, Nucleic Acids Res. 27 (1999) 297–300.

[25] A Database of Plant Cis-acting Regulatory DNA Elements: http://www.dna.affrc.go.jp/PLACE/.

[26] S. Rombauts, P. Déhais, M. Van Montagu, P. Rouzé, PlantCARE, a plant cis-actingregulatory element database, Nucleic Acids Res. 27 (1) (1999) 295–296.

[27] PlantCARE: http://bioinformatics.psb.ugent.be/webtools/plantcare/html/.[28] E. Barta, E. Sebestyén, T.B. Pálfy, G. Tóth, C.P. Ortutay, L. Patthy, DoOP: databases of

orthologous promoters, collections of clusters of orthologous upstream sequencesfrom chordates and plants, Nucleic Acids Res. 33 (2005) D86–D90.

[29] The Arabidopsis Information Resource (TAIR): http://www.arabidopsis.org.

[30] The Gene Ontology Consortium, The Gene Ontology Consortium. Gene Ontology:tool for the unification of biology, Nature Genet. 25 (2000) 25–29.

[31] Gene Ontology: http://www.geneontology.org/.[32] B.M. Bolstad, R.A. Irizarry, M. Astrand, T.P. Speed, A comparison of normalization

methods for high density oligonucleotide array data based on bias and variance,Bioinformatics 19 (2) (2003) 185–193.

[33] R.A. Irizarry, B.M. Bolstad, F. Collin, L.M. Cope, B. Hobbs, T.P. Speed,Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Res. 31(4) (2003) e15.

[34] BRB ArrayTools: http://linus.nci.nih.gov/BRB-ArrayTools.html.[35] DAVID: http://david.abcc.ncifcrf.gov/.