Data Mining MetaAnalysis

International Journal of Systems Biology and Biomedical Technologies, 1(3), 1-39, July-September 2012 1

Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Keywords: BiologicalInformation,DataMining,GeneNetworks,Meta-Analysis,Microarray

INTRODUCTION

The ability to investigate an organism’s entire genomic sequence has revolutionized biological sciences. One aspect of this phenomenon was the fabrication of gene microarrays in the late 1980s (Fodor et al., 1991). Array based high-throughput gene expression analysis is widely used in many research fields; gene expression microarrays have been used in numerous

applications, including the identification of novel genes associated with diseases, most notably cancers (Lee, 2006; Kim et al., 2005; Al Moustafa et al., 2002; Lancaster et al., 2006), the tumors classification (Perez-Diez, Morgun, & Shulzhenko, 2007; Nguyen & Rocke, 2002; Ray, 2011; Dagliyan, Uney-Yuksektepe, Kavakli, & Turkay, 2011; Best et al., 2003) and the predic-tion of patient outcome (Mischel, Cloughesy, & Nelson, 2004; Simon, 2003; Futschik, Sullivan, Reeve, & Kasabov, 2003; Michiels, Koscielny, & Hill, 2005; Liu, Li, & Wong, 2005), as well

Data Mining and Meta-Analysis on DNA Microarray Data

TriantafyllosPaparountas,BiomedicalSciencesResearchCenter“AlexanderFleming,”Greece

MariaNefeliNikolaidou-Katsaridou,BiomedicalSciencesResearchCenter“AlexanderFleming,”Greece

GabriellaRustici,EuropeanMolecularBiologyLaboratory-EuropeanBioinformaticsInstitute,UK

VasilisAidinis,BiomedicalSciencesResearchCenter“AlexanderFleming,”Greece

ABSTRACTMicroarray technology enables high-throughput parallel gene expression analysis, and use has grownexponentiallythankstothedevelopmentofavarietyofapplicationsforexpression,geneticsandepigeneticstudies.Awealthofdataisnowavailablefrompublicrepositories,providingunprecedentedopportunitiesformeta-analysisapproaches,whichcouldgeneratenewbiologicalinformation,unrelatedtotheoriginalscopeofindividualstudies.Thisstudyprovidesaguidelineforidentificationofbiologicalsignificanceofthestatistically-selecteddifferentially-expressedgenesderivedfromgeneexpressionarraysaswellastosuggestfurtheranalysispathways.Theauthorsreviewtheprerequisitesfordata-miningandmeta-analysis,summarizetheconceptualmethodstoderivebiologicalinformationfrommicroarraydataandsuggestsoftwareforeachcategoryofdataminingormeta-analysis.

DOI: 10.4018/ijsbbt.2012070101

2 International Journal of Systems Biology and Biomedical Technologies, 1(3), 1-39, July-September 2012


as the -cell line related- drug chemosensitivity identification (Amundson et al., 2000; Dan et al., 2002; Kikuchi et al., 2003; Sax & El-Deiry, 2003; Ikeda, Jinno, & Shirane, 2007; Baggerly & Coombes, 2009; Ory et al., 2011).

Typically, a microarray experiment gener-ates a list of genes that have been identified as statistically significant differentially expressed (DEGs). Following this ensues the real chal-lenge of assigning biological significance to the results and reconstructing pathways of in-teractions among DEGs. Several software tools for pathway analysis, gene ontology analysis and gene prioritization are routinely used for identifying common features in lists of DEGs.

As the quantity and size of microarray datasets continues to grow (Table 2, Microarray repositories), researchers are provided with a rich data resource, but also face interoperabil-ity and data management issues. The primary data should be stored in a MIAME (Minimum Information About Microarray Expression) compliant format, which is a set of guidelines outlining the minimum information that should be included when describing a microarray experiment. It is required in order to facilitate the interpretation of the experimental results unambiguously and to potentially reproduce the experiment (Brazma et al., 2001). Compli-mentary to the standardization of data storage, workflows (School of Computer Science, 2008) (Table 3, Holistic Approaches) offer a solution to data management and analysis issues as they enable the automated and systematic use of distributed bioinformatics data and applica-tions from the scientist’s desktop. In order to address reliability concerns as well as other performance, quality, and data analysis issues, the National Center for Toxicological Research, NCTR, has initiated the MAQC, MicroArray Quality Control project, (Shi et al., 2006, 2010), in response to the FDA’s (U.S. Food and Drug Administration, n.d.) Critical Path Initiative (Coons, 2009; Mahajan & Gupta, 2010; Wood-cock & Woosley, 2008). The main target of this initiative is to develop guidelines for microarray data analysis and provide the public with large reference datasets.

1. PREREQUISITES FOR DATA MINING

Generating high quality microarray data requires applying stringent quality control measures and best practices at each individual step of the process, starting with choosing the most appropriate experimental design for the study, the correct experimental platform, the protocols for sample preparation, processing, and ultimately ending with the data analysis ap-proach for normalization and statistical analysis. (Chuaqui et al., 2002) provides a short review on the validation of primary analysis methods, (Allison, Cui, Page, & Sabripour, 2006; Dupuy & Simon, 2007; Ioannidis et al., 2009; Shi et al., 2010) inform on reasons of result discrepancies after reanalysis of raw data across different teams, while (Troester, Millikan, & Perou, 2009) provide a short list of guidelines for statistical analysis and reporting of microarray studies.

1.1. Experimental Design

Experimental design is one of the most important aspects of a successful experiment related to the identification of differential gene expression patterns. Proper experimental design is crucial to ensure that the biological questions of inter-est can be answered and that this can be done accurately. Appropriate experimental design (Churchill, 2002; Festing & Altman, 2002; Qiu, 2007; Shaw, Festing, Peers, & Furlong, 2002) allows a more accurate identification of DEGs and prediction of false positives (Ben-jamini & Hochberg, 1995; Reiner, Yekutieli, & Benjamini, 2003; Wolfinger et al., 2001). Fundamental principles of experimental design are simplicity, replication & statistical power (Festing & Altman, 2002) and bias prevention through randomization & blocking (Damaraju, 2005; Johnson & Besselsen, 2002).

1.1.1. Replication

The effects of the: Treatment-group, subject, sample, gene, probe and noise are the major sources of variability in microarray experiments. Ideally to estimate the statistically significant



changes, while accounting for the noise intro-duced and unwanted variance factors, replica-tion should be done at the level of the group, the subject and the probe.

Replication safeguards against Type I errors (False positive) and thus ensures results of high statistical significance (Rao, 2009).

Issues that should be taken into consider-ation when designing an experiment are: the aim of the experiment, the finances governing the number of slides and the amount of biologi-cal material required, design extensibility, and validation method. These factors determine the number of biological replicates or, in the case of few biological replicates, the number of technical replicates that should be used in the experiment (Wei, Li, & Bumgarner, 2004) (Figure 1). The number of replicates (Dobbin & Simon, 2005) depends on the type of array technology chosen (Irizarry et al., 2005), the dye bias (Dobbin, Kawasaki, Petersen, & Simon, 2005), the quality of manufacturing (Mecham et al., 2004), the specific number of arrayed genes and the tolerance level of false positives (Wang, Hessner, Wu, Pati, & Ghosh, 2003). When high variance within group signal is expected (de Reynies et al., 2006), higher numbers of replicates per group are needed, to account for false negatives (see statistical power).

The term ‘technical replicates’ refers to multiple arrays hybridized, with RNA isolated from a single sample, or multiple replicates of a single gene on the surface of an array. The term “biological replicates” refers to RNA samples isolated from multiple individuals of a popula-tion treatment and/or group, each hybridized to a different microarray or a different array in the case of multi-welled chips. Technical replicates are used mainly as quality control and reproducibility of the method, whereas biological replicates are used to strengthen the statistical power to detect significantly DEGs.

1.1.2. Statistical Power

Statistical power refers to the adequacy of a statistical test to avoid a Type II error (False negative). The evaluation of the power of a

design, referred as ‘power analysis,’ allows the calculation of the minimum number of replicates that are needed to detect an effect of a given size (Festing & Altman, 2002). Experiments utilizing subjects with homogenous genetic background need fewer subjects to achieve a good statistical power. This equals to ability of detection of smaller treatment responses with fewer animals (Festing & Altman, 2002). Use-ful software to calculate power are G*power (Faul, Erdfelder, Buchner, & Lang, 2009; Faul, Erdfelder, Lang, & Buchner, 2007) and NCSS PASS (NCSS inc. Utah, USA).

On his article (Churchill, 2002) described a simple way to calculate statistical power. The method has evolved since but this approach still holds value, mainly due to its simplicity. According to Churchill, analysis can be carried out by determination of the ‘degrees of freedom’ or ‘Df.’ ‘Df’ may be calculated in the follow-ing way: first count the number of independent units; in case of multiple treatment factors all combinations that occur should be calculated. From this sum subtract the number of distinct treatments to identify the ‘Df.’ The ‘Df’ score should be more than 5 in order to ensure that the experiment has enough statistical power to efficiently do analysis based on biological variance.

1.1.3. Randomization

Randomization in microarray experiments is related to: a. the randomization of samples hybridization and b. the probe placement on the arrays. In the first case randomization ac-counts for bias in expression levels because of the batch processing effect (for a microarray allowing one sample placed on one array) or the position effect (for a microarray allowing multiple samples placed on one array) (Rao, 2009). Randomization during the positioning of the probes on each array on the other hand ensures no propagation of spatial effects dur-ing intensity measurement. If the placement of probes is not randomized, measurements from the training stage to validation stage may have different biases (Verdugo, Deschepper, Munoz,



Pomp, & Churchill, 2009; Barnes, Freudenberg, Thompson, Aronow, & Pavlidis, 2005). It should be noted in their assessment whether such probe-transcript mapping influences expres-sions reported by the same platform (Kitchen et al., 2011) allege that no such correlation was observed.

1.1.4. Blocking and Block Randomization

Extraneous factors may affect the gene ex-pression that is quantified through the array platforms. The phenomenon that occurs when it is not possible to disentangle the effects of two or more extraneous factors is referred as confounding (Everitt, 2007; Pearl, 1998). The two effects are usually referred to as aliases. Common examples of confounding factors are gender and age in epidemiological studies, where a trait can also be attributed to the age or gender and not only on the treatment. In the case of microarrays the technology behind array construction may as well be a confounding fac-tor. During an experiment there are two stages when confounding factors can be accounted for; the first during the experimental design, by achieving better experiment control (Johnson & Besselsen, 2002) over the entangled factors (better factor separation during grouping) and the second during statistical analysis, by ap-

plication of statistical methods to account for confounding and thus avoid related Type I errors.

A technique applied during experimental design to isolate and, if necessary, eliminate variability due to extraneous causes (Everitt, 2007), and thus produce a better estimate of treatment effects, is termed (randomized) block-ing (Damaraju, 2005; Festing & Altman, 2002). Under this design strategy, samples are divided in subgroups called ‘blocks’ so that variability within blocks is less than variability between blocks. Multi-arrayed chips, like NimbleGen 12-well arrays, are especially useful to apply the randomized blocking technique. In the case of utilizing a –one chip per sample- strategy, on chips with standardized placement of probes and with no (or minimal) replication of probe sets like Affymetrix MOE 133A2, HG-U95 or HG-U133 chips, it is impossible to separate ar-ray to array variability from sample to sample variability (Rao, 2009).

Attempts to correct for confounded effects by statistical modeling alone reduce power of detection for true differential expression thus leading to increased rate of false-positive results in the confounded design. Proper normalization (see normalization) improves differential expression testing in both experi-ments (confounded or not) but randomization has been proven to be the most important fac-

Figure1.Elucidationofdifferencesbetweentechnicalandbiologicalreplicates



tor for establishing accurate results (Verdugo et al., 2009).

1.2. Choice of Microarray Platform

The choice of a microarray platform (Table 4, Microarray suppliers) should be based, apart from the cost, on the chip availability for the species under analysis, on genome coverage, the starting amount of RNA needed, quality of array manufacturing, the validity and avail-ability of software tools for image analysis, the quality of gene annotation combined with assured company support in the future, and intra platform variability. Intra-platform variability and reproducibility have been used as measures of data quality (Yauk & Berndt, 2007). Experi-ments have been carried out to determine the effective differences in accuracy (proximity to true value) (de Reynies et al., 2006), sensitiv-ity (ability to accurately detect changes at low concentrations), and specificity (to hybridize to the correct gene) among the technologies (Draghici, Khatri, Eklund, & Szallasi, 2006; Hardiman, 2004; van Bakel & Holstege, 2004).

1.3. Quality Controls

Quality controls have been established to ensure the quality of the sample both before and after hybridization and provide crucial information on whether to utilize or not a sample or an array for downstream analysis. Quality controls are divided into two broad categories; biological and software. The choice of the method to apply depends entirely on the step of the experiment. Biological quality controls, which are carried out prior to hybridization, aim at controlling the quality of the prepared RNA sample. Instru-ments like the Agilent 2100 Bioanalyzer™ and Nanodrop™ spectrophotometer (NB: nanodrop can only check the quantity) offer users the ability to assess the quality and quantity of the RNA samples (Kiewe et al., 2009; Thompson & Hackett, 2008). Another measure of quality used at this step is the ‘Frequency Of Incor-poration’ (FOI). The FOI is a measure of the level of dye incorporation into a labeled nucleic acid sample. FOI measurements are important

to check labeling consistence, and to provide a guide as to how much probe is required for hybridization. FOI requires prior determination of DNA or RNA product yield and the amount of dye attached to it. The picomoles of dye present are calculated from the dyes extinction coef-ficient, and through this the FOI is determined (Promega Inc., 2012).

Following hybridization, software quality controls come into play. This type of quality control is reliant on image analysis. In example control of the uniformity of the hybridization, e.g., border element control plots in the case of Affymetrix chips (Affymetrix Inc., 2004). Based on software quality controls, pre-filtering/masking and/or background/signal adjustment are applied to edit out portions of the array image or balance intensities of areas with high or low signal. Masking refers to applications of microarray signal correction that account for cross hybridization (Naef, Lim, Patil, & Magnasco, 2002; Naef & Magnasco, 2003), array scratches, improper scanner configura-tion (Shi et al., 2005; Timlin, 2006), spot light saturation and washing issues (Yauk, Berndt, Williams, & Douglas, 2005) that may have occurred (Speed, 2003). Masking blocks the normalization algorithm from parsing signals of ruled out areas. A number of different DNA microarray platforms use spiked-in targets to check the performance of the sample prepara-tion and hybridization.

1.4. Normalization

Normalization is performed to correct for systematic differences between samples on the same slide, or between slides, which do not rep-resent true biological variation but are the result of biases introduced throughout the procedure. Normalization is fundamental for experiments to be combined and/or compared. It focuses on adjusting the individual hybridization intensi-ties in order to balance them appropriately so that meaningful biological comparisons can be made (Quackenbush, 2002). Signal scaling factors are utilized for assessing the overall signal quality of the arrays. Apart from the



low number of biological replicates, that can affect the strength of the statistical analysis, poor quality of chip construction influences negatively the analysis of differential expres-sion. The signal is adjusted so that the estimated expression values will fall on proper scale. There are a number of reasons why data must be normalized: to remove systematic biases, which include sample preparation, variability in hybridization, spatial effects, scanner set-tings, experimenter bias (Mecham, Nelson, & Storey, 2010; Argyropoulos et al., 2006). The decision as to which normalization method is appropriate may depend on the biological nature of the dataset examined. For each microarray technology there is a preferred normalization method (Argyropoulos et al., 2006; Bolstad, Irizarry, Astrand, & Speed, 2003; Wu, Xing, Myers, Mian, & Bissell, 2005). Typical nor-malization methods include the global mean or median normalization (Bilban, Buehler, Head, Desoye, & Quaranta, 2002), rank invariant normalization (Tseng, Oh, Rohlin, Liao, & Wong, 2001), quantile (Bolstad et al., 2003), contrast (Astrand, 2003), LOWESS/LOESS methods (Cleveland, Grosse, & Shyu, 1991) and cyclic loess (Dudoit, Yang, Speed, & Callow, 2002). For many types of commercial arrays, R-Bioconductor (Team, 2008; Gentleman et al., 2004) packages can be used to do background adjustment and data normalization (Bolstad et al., 2003), including RMA (Robust Multi-Array Average expression measure) (Irizarry et al., 2003), GCRMA (Robust Multi-Array Average expression measure using sequence information) (Wu, Irizarry, Gentleman, Marti-nez-Murillo, & Spencer, 2004), VSN (Variance Stabilization and Normalization) (Huber, von Heydebreck, Sultmann, Poustka, & Vingron, 2002) and Li and Wong (2001). Data from spike-in experiments, where the mRNA-ratios of a set of artificial clones are known, may be used to determine the relative merits of a set of analysis methods (Ryden et al., 2006).

1.5. Missing Values

Missing values are a serious issue for further concern innate to the technology behind the

manufacturing of many microarrays. Two color arrays suffer more from missing values in comparison to other microarray platforms (e.g., array scratches, scanner improper configuration, spot light saturation etc.) (Jornsten, Ouyang, & Wang, 2007). In case of opting for a platform that does have missing values innate to the ar-ray creation, one possible solution is to exclude whole slides that appear problematic. However, this solution is impractical since usually no slide is perfect and modern arrays contain tens of thousands of probes making measurements more sensitive to artifacts. Imputation of miss-ing values (Donders, van der Heijden, Stijnen, & Moons, 2006) is best done either using many replicates within the same logical set (Jornsten et al., 2007) or by intra-chip probe replication (Du, 2010; Lin, Du, Huber, & Kibbe, 2008), especially helpful in case of custom built arrays (MYcroarray.com, 2011) .

1.6. Statistical Selection

Statistical selection is applied to identify the list of statistically significant differentially expressed genes, out of the total set of genes found on the arrays. Several statistical selection methods are currently available to test the hy-pothesis of a gene being differentially expressed. The two main categories are as follows: (i) the parametric tests, like t-test or ANOVA, for ex-periments that compare more than two factors at the same time (Cui & Churchill, 2003; Dudoit et al., 2002; Ideker, Thorsson, Siegel, & Hood, 2000; Kerr, Martin, & Churchill, 2000; Park et al., 2003) and (ii) the non-parametric tests, like Wilcoxon sign-rank and Kruskal-Wallis (Conover, 1980), which both can be applied to cDNA or oligonucleotide arrays (Affymetrix Inc.) (Tusher, Tibshirani, & Chu, 2001).

These tests result in each gene being given a statistical significance score (p-value). A threshold is then applied to the score to deter-mine, together with the fold change difference of each gene, the DEGs. A common problem with this approach is that while a strict p-value threshold would provide assurance on the sta-tistical significance, many genes do not reach this threshold resulting in a limited number



of statistical significant genes and even fewer DEGs; this is often due to few replicates being tested and the best decision is then to use Rank Products (Breitling, Armengaud, Amtmann, & Herzyk, 2004; Breitling & Herzyk, 2005). Another issue is the multiple comparisons problem. This means that with an increasingly high number of individual tests, the likelihood of data observation satisfying the acceptance criterion, by chance alone, is amplified. Meth-ods to minimize this problem include the false discovery error rate (Benjamini & Hochberg, 1995; Efron & Tibshirani, 2002; Jung, 2005; Keselman, Cribbie, & Holland, 2002; Reiner et al., 2003; Shedden et al., 2005; Storey, 2002; Tibshirani, 2006; van den Oord & Sullivan, 2003; Yang, Yang, McIndoe, & She, 2003) and the Bonferroni corrections (Holm, 1979). The R library ‘limma’ is considered to be the most widely utilized package for statistical selection of microarray analysis (Smyth, 2004), and is based on a linear modeling approach to fit microarray intensity data.

1.7. Annotation

Annotation is required to proceed to data min-ing. Primary annotation uses X,Y map coordi-nates to link the position of the signal on the microarray surface to the probe ID (Affymetrix Inc.). At a second step, probe sequence associ-ated annotation retrieval is achieved through reference databases (Draghici, Sellamuthu, & Khatri, 2006; Durinck et al., 2005; Haider et al., 2009; Smedley et al., 2009) (Table 6, Gene ID conversion). These steps produce information from the list of DEGs that will be used to extract knowledge through data mining (see data-mining). The importance of updating the annotation prior to data-mining cannot be stressed enough (Barbosa-Morais et al., 2010; Liu et al., 2007; Lu, Lee, Salit, & Cam, 2007; Sandberg & Larsson, 2007) the main reasons being that certain probes may be mis-targeting or deprecated, or new information, related to the biology behind the coded oligonucleotide sequence, may have been recently uncovered. Annotation update prior to data mining has

proven to lead to results of higher quality (Dai et al., 2005; Gautier, Moller, Friis-Hansen, & Knudsen, 2004; Sandberg & Larsson, 2007; Elo et al., 2005), better biological interpreta-tion of the DEGs list, and has also aided the comparative analysis of datasets (Tzouvelekis et al., 2007) by providing an orthologuous genes map between species.

2. DATA-MINING: DERIVING BIOLOGICAL INFORMATION FROM MICROARRAY EXPERIMENTS

A successful primary analysis of a microarray experiment leads to a list of statistically sig-nificant DEGs. DNA microarray studies often implicate hundreds of genes in the pathogenesis of complex diseases, affecting many different mechanisms and pathways. How can such complexity be understood? How can hypoth-eses be formulated and tested? To extract the biological information from the lists of DEGs we need to apply methods to build or identify gene networks that interconnect the DEGs to common functions, biological pathways, regulatory elements, similarly expressed genes, existing literature, previous experimental data suggesting high specificity roles, mutation and disease related information. An updated list of links of software, currently available for the extraction of biological information, can be found at http://www.bioinformatics.gr (Leung, 2007; Paparountas, 2007).

2.1. Clustering

As a first step of data mining, clustering analysis can help in the identification of gene expression patterns by providing a graphical representation of experimental data. Clustering analysis can be divided in two categories: (i) supervised and (ii) unsupervised. In a supervised approach, the classes (clusters) are predefined whereas in the unsupervised data, classes are unknown. It is common practice that clustering of microarray data, is performed after pre-processing of the



data (normalize, filter, impute missing values and standardize) (Figure 2).

Several clustering methods exist (Table 5, Clustering methods) (Yeung, Haynor, & Ruzzo, 2001). Clustering can be conducted per sample and per gene or by a combination of the two and it relies on direct comparison of gene expression (normalized intensity levels) to identify patterns of co-expression. Per gene clustering is especially useful as it provides organized data groups which are non-biased by a working hypothesis. It can be performed on the DEGs lists to identify common clusters of genes and differences between groups. The sublists-results of this method can fuel fur-ther data mining that will be presented in the following sections. Briefly, after retrieval of annotation related to the identified subgroups of genes, we can make hypothesis on genes’ function (e.g., same protein family or same cel-lular pathway), their transcriptional regulation (transcription regulatory factors, miRNA) and on genes with unknown function based on the role of the genes they co-cluster with (‘guilt by association’) (Quackenbush, 2003; Stuart, Segal, Koller, & Kim, 2003; Wolfe, Kohane, & Butte, 2005). Clustering per sample is useful to identify sub-classification, for example to predict groups of patients, forming a primary indicator of condition outcome or treatments with inhibitors/small molecules.

2.2. Knowledge Based Analysis

For this type of analysis, information stored in databases is retrieved and combined to support the formulation of a hypothesis, which describes the biological relation between the genes cur-rently found in the DEGs list. This type of analysis combines annotation and functional analysis tools.

The knowledge-based analysis can be a one or a two step approach, primarily depending on the design complexity of the experiment. The first step is to retrieve and combine information based on relational or semantic databases. This step is the maximum that may be applied to the

A vs B type experimental design (Churchill, 2002) – single Venn-diagram. The second step is applied when needed to cross compare multiple Venn-diagrams. This enables identification of common or unique traits between conditions i.e., common KEGG pathways or common transcription factors, even when the compared DEGs sets do not contain the same probes. A following step is the identification of the gene-culprits behind the common traits.

2.2.1. Data-Mining Related to Relational Databases

In the relational databases, non-complex infor-mation is retrieved to provide basic information related to the genes, for example showing the common transcription factors binding domains present in the regulatory regions upstream the DEGs start sites, microRNA binding sites, etc. Here we provide methods that connect the genes to information related to their common regula-tory elements (Tables 7 and 8) drug toxicity analysis (Table 9), mutation and disease (Table 9), existing literature (Table 10), functions (Ta-ble 11), biological pathways (Tables 11 and 12), similarly expressed genes (Table 12), previous experimental data suggesting high specificity roles, and similar protein products (Table 13). Furthermore we suggest integrative approaches that provide information based simultaneously on more than one category. It is important at all times to have a good understanding of what each tool does and the probability of error based on each separate error discovery procedure utilized (Gold, Miecznikowski, & Liu, 2009).

2.2.1.1. Transcription Factor Analysis & Motif Analysis SoftwareStatistically significant genes or genes derived from the co-expression analysis are parsed through software that identifies common tran-scription factors - binding sites in upstream regions. By identifying transcription factors binding sites in common between the DEGs it is possible to formulate hypothesis on common gene control mechanisms (or in some cases hub



genes), which might be responsible for gene co-regulation. Regulatory regions are generally conserved across species, and this principle has led to development of positional prediction tools (Pavlidis, Furey, Liberto, Haussler, & Grundy, 2001). Currently there is a plethora of available string search tools (Table 7, Transcription Factor and motif analysis) each with its own approach and true positive detection potency.

2.2.1.2. MicroRNA DiscoverySoftware parsers may uncover common hidden binding sites of miRNA’s (Lee, Feinbaum, & Ambros, 1993; Ruvkun, 2001). Each miRNA is processed from a primary transcript, known

as pri-miRNA, to a short stem-loop structure called pre-miRNA and finally to the functional miRNA. Experimentally derived miRNA se-quences are often used as training sets in order to identify miRNA sequences across species with high evolutionary conservation. Some characteristic features are the stem-loop hairpin structure found on the pre-miRNAs, the conservation of sequence and secondary structure of the hairpin across species and also the clustering of miRNAs within close proximity to one another. A list of available search tools is provided (Table 8, miRNA) each utilizing its own database to search of common miRNAs.

Figure2.Layoutofthemainexperimentanalysis,data-miningandmeta-analysisprocedures



2.2.1.3. Drug Toxicity Analysis and Bioentity AnalysisSpecialized databanks for the identification of chemical substances that may target the iden-tified genes or their products can be found by utilizing drug toxicity analysis tools (Table 9, Disease/Toxicity). The principle behind this method is to enrich gene lists with drugs or toxic agents that are known to affect the expression or the downstream regulation of the identified genes. This knowledge environment includes data derived from small molecules and small-molecule screens, and resources for studying the data so that biological and medical insights can be gained. There are a number of different databanks that store an increasingly varied set of cell measurements derived from, among other biological objects, cell lines treated with small molecules. Pharmaceutical companies have their own databanks and analysis tools that allow the relationships between cell states, cell measurements and small molecules to be determined. Database access through com-mercial entities permit conditional utilization of such data.

2.2.1.4. Genetic Linkage AnalysisGenetic linkage relates to genetic loci or alleles of genes that are inherited jointly. Genetic loci on the same chromosome are physically con-nected and tend to segregate together during meiosis. Maps of the genetically linked regions that show the position of known genes and/or genetic markers relative to each other in terms of recombination frequency, rather than as specific physical distance along each chromosome, are built in order to facilitate linkage mapping. This is critical for identifying the location of genes that cause genetic diseases.

In an attempt to combine gene expression analysis with genetic linkage analysis, all dif-ferentially expressed genes are mapped to the chromosomes together with the known quantita-tive trait loci (QTL, chromosomal regions/genes segregating with a quantitative trait) (Aidinis et al., 2005; Tzouvelekis et al., 2007).

2.2.2. Semantic-Ontology Data Mining

Ontologies provide controlled vocabularies to describe concepts and relationships between them, thereby enabling knowledge sharing (Gru-ber, 1993). Utilization of information, stored in semantic-ontology databases, is considered as the second subtype of knowledge-based-data-mining and facilitates the performance of a higher level search among the individual genes constituting the list of DEGs. This analysis is based on the theory that networks in nature are often characterized by a small number of highly connected nodes, while the majority of nodes have few connections. The highly connected nodes serve as hubs that affect many other nodes. The process identifies such hubs that have key roles in the network. In other words aims at an-notating the results by reducing the complexity, so a large number of genes are transformed into a shorter list of biological themes (Larsson, Wennmalm, & Sandberg, 2006).

Currently there are many such structured vocabularies (Jegga, 2006), used to represent biological entities and functions, each though is specialized in a certain field of biomedical science. OBO foundry is an initiative for the development of new biomedical ontologies that establishes the set of principles for ontology formation (Smith et al., 2007). OLS “Ontology Lookup Service” (Cote, Jones, Apweiler, & Hermjakob, 2006) (EBI) provides a web service interface to query multiple ontologies from a single location with a unified output format. BioPortal (http://bioportal.bioontology.org/) is a Web-based application for accessing and sharing biomedical ontologies. Three major types of ontology analysis are the (i) literature analysis, (ii) the functional analysis and (iii) the pathways analysis.

2.2.2.1. Literature AnalysisIt aims at finding associations between genes according to information found in the literature. The simplest way is to find the defined terms of search inside the literature by text-mining. An advancement of the method is to create gene



networks based on the amount of times that this relationship has been referred in the literature (Table 10, Literature analysis software). Seman-tic approach of literature analysis is by utilizing the ontology related to the MeSH terminology of Medline repository. The MeSH vocabulary is a distinctive feature of the MEDLINE data-base produced by the United State’s National Library of Medicine.

2.2.2.2. Functional AnalysisFunctional analysis aims at storing informa-tion related to gene or gene products location, function and interaction. Functional analysis provides a biological interpretation for the data obtained from the primary analysis. A refer-ence to the most often used tools is discussed in this paper.

The most widely accepted method for functional analysis is based on Gene Ontol-ogy (GO) terms (Aidinis et al., 2005). The GO project (Ashburner et al., 2000) captures and organizes the increasing knowledge on gene properties into three controlled vocabularies de-scribing a gene product in terms of its associated biological processes, cellular components and molecular functions in a species-independent manner. GO terms, enriched among a list of DEGs, can provide insight into the biological processes and provide a link between biological knowledge and either gene expression profiles or proteomics data (GO-Slim). Additionally, by using this technique it is possible to map GO terms and incorporate manual GO annotation into own databases to enhance a given dataset or to validate automated ways of deriving in-formation about gene function (text-mining) (Table 11, Gene ontology analysis software).

2.2.2.3. Pathway AnalysisThis approach aims at identifying metabolic pathways which might be over-represented among members of a given gene list. One of the most commonly used resource for pathway enrichment analysis is the KEGG database (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa, Goto, Kawashima, Okuno, & Hat-tori, 2004). Assessment as to whether a pathway

has been activated or not can be carried out in two ways: either by examining the ratio of the active genes divided by the total number of genes known for their role in that pathway, or by identifying whether certain pathways have statistically significant over-representation of active genes according to the results of the hypergeometric test. The additional ability to overlay gene expression details can significantly promote biological interpretation especially in kinetics based microarray experiments (Table 12, Pathway analysis software).

2.2.3. Integrative Data-Mining

Another approach to knowledge based analysis is to combine the findings of the two types men-tioned above to produce results in a top down (minimal detailed information) or a bottom up approach (maximization of detailed combined information).

2.2.3.1. Gene PrioritizationGene prioritization is a process to identify and prioritize genes of interest, according to their similarity to a custom made list of genes, which is known apriori to be involved in a particular disease or phenotype. Currently two software suites excel in this field namely Endeavor (Aerts et al., 2006) and GeneWanderer (Kohler, Bauer, Horn, & Robinson, 2008). Endeavor uses a number of different data sources including both vocabulary-based (such as GO) as well as other data sources (such as BLAST and microarray databases). The ranking of a test gene for a given data source is calculated based on its similarity with the training genes, while the final priori-tization is calculated based on order statistics of the individual rankings. GeneWanderer utilizes retrieved interaction data from major databases of protein interactions (HPRD, (Peri et al., 2004) BIND (Alfarano et al., 2005) and BioGrid (Stark et al., 2006), IntACT (Kerrien et al., 2007), and DIP (Salwinski et al., 2004) to create a protein-protein interaction (PPI) net-work. Gene prioritization is achieved by ranking each of the genes of interest according to (i) the relative position of a test gene to a training



gene and (ii) the number of interactions of a test gene to different training genes. The main difference of the two suites is that Endeavour utilizes methods of “shortest path” and “direct interaction” that identify local properties to rank candidate genes, while GeneWanderer utilizes an algorithm for random walk or diffusion kernel that identifies global characteristics of the interaction network.

2.2.3.2. Gene Set Enrichment Analysis (GSEA)Genes of certain groups may be the controlling factors for phenotypes; still the individual genes of those groups may not be directly related to the phenotype under analysis. Gene groupings are made according to biological function, chromo-somal location, or regulation. The advantages of this approach are two (i) GSEA provides a way to integrate multiple data-mining tests and (ii) apart from over-representation analysis it provides the option to take into account the expression levels of the DEGs list, so that a 10x expression will weigh more than a 2x expression after over representation analysis, which the current software for GO, miRNA, transcription factors analyses and pathways analyses do not provide. The main inhibiting factor for this kind of analysis is the non-controllable quality and the amount of information that is available for each individual gene, common problem in all data mining software, while a second one is the fact that GSEA does not integrate a wide variety of data sources. Characteristic software are (GSEA) (Subramanian et al., 2005), PAGE (Kim & Volsky, 2005) and GeneTrail (Backes et al., 2007).

2.2.3.3. Information Retrieval of Disease and ProteinThe retrieval of detailed gene information and related proteins/diseases at an early stage of the analysis, may lead to the formation of biological hypotheses that might influence downstream interpretation. This information can be utilized in order to better understand human biology, to predict potential disease risks, and to stimulate the development of new therapies to prevent and

treat these diseases. DNA microarray studies of gene-interaction networks of complex diseases may contain modules of co-regulated or interact-ing genes that have distinct biological functions. Such modules may be linked to specific gene polymorphisms, transcription factors, cellular functions and disease mechanisms. Genes that are reliably active only in the context of their modules can be considered markers for particular modules and may thus be promising candidates for biomarkers or therapeutic targets (Benson & Breitling, 2006).

Diseases are often linked to proteins; therefore a better understanding of the protein interaction is essential. Protein-protein interac-tions are key determinants of protein function. Protein-protein interaction maps can serve as a suitable base to anchor genomics/gene expres-sion, small interfering and microRNAs (siRNA/miRNA), protein function and post-translational modifications, metabolic/signaling pathways and genetics/clinically-relevant information, as previously demonstrated by the maps generated for model organisms, such as H.Pylori (Rain et al., 2001), yeast (Uetz et al., 2000; Gavin et al., 2002; Han et al., 2004), C.elegans (Li et al., 2004), and Drosophila (Giot et al., 2003). These maps can represent an entire organism, a particular cell type or a tissue or an organ such as the mammalian brain (Choudhary & Grant, 2004) (Table 13, Protein-protein interactions).

3. META-ANALYSIS

Decisions about the validity of a hypothesis cannot be based on the results of a single study, due to intrinsic variability. Rather, a mechanism is needed to integrate data across studies. Meta-analysis is the statistical proce-dure for combining data from multiple studies. Meta-analysis aims to minimize systematic variations due to technical reasons such as lab effect and microarray platform, or biological factors such as circadic rhythm, the stress or species specific intricacies, while enabling recognition of real differences, and extraction of valid cross-experiment information. A first



target of such analyses is the biological inter-pretation of a group of data; when the effect of a treatment is consistent from one study to the next, meta-analysis can be used to identify this common effect. When the effect varies from one study to the next, meta-analysis may be used to identify the reason for the variation.

Apart from the biological interpretation of a group of data, the second target of a meta-analysis is biomarker identification. Biomarkers are genes which, when recognized as being selectively highly expressed in a pathological condition during a gene expression analysis, help in the direct recognition of diseases.

The first rule governing a meta-analysis is the retrieval of datasets from databases con-taining high quality raw datasets. The retrieved datasets must be updated with the latest annota-tion, (same IDs and same build, preferably latest version) (Eszlinger, Krohn, Kukulska, Jarzab, & Paschke, 2007; Sandberg & Larsson, 2007). Furthermore the selected experiments should have good annotation that provide information about the datasets (metadata). Experimental metadata should include information about protocols, microarray platform, sample char-acteristics, and experimental design, including sample and data relationships.

The availability of the raw data and metadata ensures the conduct of high quality analysis and is the primary concern behind the formulation of the MIAME (Brazma et al., 2001) standard. Compliance to the standard is required in order to facilitate the interpretation of the experimental results unambiguously and to potentially reproduce the experiment.

The type of meta-analysis that we will be discussing produces a list of genes, that is either supported by the findings of the constitutive experiments or new hypotheses may be drawn based on further exploratory analysis. This list of genes (considered to be of higher quality in comparison to the individual constitutive experiments) can be thereafter fed into the data mining techniques, hence, providing the best way to create a complete statistically supported biological interpretation of the condition(s) under question.

A presentation of meta-analysis in refer-ence to the dataset complexity and comparison models has been discussed in past studies (Larsson et al., 2006), while others (Yauk & Berndt, 2007) have reviewed the cross platform comparability of results.

Comparative expression profiling is a way to exploit previously collected data in relation to the list of statistically significant genes. For this method the expression profiles of the genes of interest from past and current experiments are compared. In most cases the past study results are stored as flat files or in platform-specific databases, the most prominent among them being: GEO (Barrett et al., 2005) and ArrayExpress (Parkinson et al., 2005). Certain repositories databases T1D db (Hulbert et al., 2007), GEO (Barrett et al., 2005), and related database related tools (Adler et al., 2009; Ka-pushesky et al., 2010; Rhodes et al., 2007; Wu et al., 2009) provide the option to compare the normalized raw data of past experiments from the graphical user interface, which permits the direct comparison of expression levels across experiments, thus enabling basic comparative expression profiling analysis. Databases provide expression profiling over many experiments and organisms of specific genes, most often related to a certain disease or field of study (Table 14, Meta-analysis software).

3.1. Integrative Data-Mining and Meta-Analysis

The heterogeneous mix of data and information from the field of Genome Sciences includes functional descriptions of the DNA sequence, molecular interactions, images of molecules or phenotype of a microbe, plant or animal, and details about the environment in which these organisms live. The advent of the grid computing era has made holistic approaches to relate these data sources with high through-put biology technologies, such as microarray and next generation sequencing, achievable. Knowledgebases such as Facebase (Hochheiser et al., 2011) and KBase (Energy, 2010) are drawing near or have started producing actual



results (Wolfson, 2008). The ultimate goal behind these multi-million dollar endeavors encompassing many fields of science is the predictive understanding of biological systems. The significant value of these projects is better recognized by (i) the development of freely available frameworks for software-database integration (The NCI Center for Bioinformatics, 2011; Hull et al., 2006; Oster et al., 2007) (ii) the hardware infrastructure to run analyses (Dinh, 2011; Fox, 2011; Halligan, Geiger, Vallejos, Greene, & Twigger, 2009; Kabachinski, 2011; Schatz, Langmead, & Salzberg, 2010) (iii) novel software tools (Blankenberg et al., 2010; Goecks, Nekrutenko, & Taylor, 2010) that are able to fully utilize the grid (iv) training of new scientists on cutting edge technology to further accelerate scientific research.

4. CONCLUSION

The aforementioned techniques demonstrate the extent of the application of microarray technology. The introduction of the annotation based approaches in data mining and meta-analysis marks a tremendous leap forward, from discovery driven analysis to hypothesis driven analysis, indicative of the potential gene discoveries of the immediate future. The gathering of all information for each particular experiment forms a snapshot of information for the individual tissue/disease that the microar-ray experiment aims to analyze. Combination of individual experimental results of different

metabolic and microarray studies will lead to model changes throughout the system of the organism under question. Whole organism biology modeling could provide patients with individual customized medical treatment, which constitutes the scientific target in the field of systems biology. Summary points are listed in Table 1.

ACKNOWLEDGMENT

Grateful acknowledgement for proofreading goes to Dr. Elisa Cesarini, Research Assistant at Istituto di Biologia Cellulare e Neurobiologia, CNR, Rome. This work was supported by the Hellenic Ministry for Development GSRT-PENED-136 grant

REFERENCES

Adler, P., Kolde, R., Kull, M., Tkachenko, A., Pe-terson, H., Reimand, J., & Vilo, J. (2009). Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods. Genome Biology, 10(12), R139. doi:10.1186/gb-2009-10-12-r139

Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., & De Smet, F. (2006). Gene prioritiza-tion through genomic data fusion. NatureBiotechnol-ogy, 24(5), 537–544. doi:10.1038/nbt1203

Affymetrix Inc. (2004). Expressionanalysistechnicalmanual. Retrieved from http://www.affymetrix.com/support/technical/manual/expression_manual.affx

Table1.Summarypointsofarticle

Summary Points• Microarray experiments may be flawed due to, non-optimal sample size, RNA/DNA quality and quantity, inef-ficient hybridization and normalization, ability to analyze the data. • There is no global guide for microarray analysis. Data mining depends on the needs and requirements of each individual experiment • Vast amount of microarray data and a number of different repositories are publicly available. • We suggest software for each different aspect of biological information extraction, and combination of data across different datasets. • The databases suggested in this paper can be utilized for biological information extraction from SNP, CGH, FISH, SAGE, RNA-Seq, Chip-Seq experiments.



Affymetrix Inc. (2006a). Affymetrixdataanalysisfundamentals. Retrieved from http://www.affyme-trix.com/support/downloads/manuals/data_analy-sis_fundamentals_manual.pdf

Affymetrix Inc. (2006b). AffymetrixNetAFFX. Re-trieved from http://www.affymetrix.com/analysis/index.affx

Aidinis, V., Carninci, P., Armaka, M., Witke, W., Harokopos, V., & Pavelka, N. (2005). Cytoskeletal rearrangements in synovial fibroblasts as a novel pathophysiological determinant of modeled rheuma-toid arthritis. PLOSGenetics, 1(4), e48. doi:10.1371/journal.pgen.0010048

Al Moustafa, A. E., Alaoui-Jamali, M. A., Batist, G., Hernandez-Perez, M., Serruya, C., & Alpert, L. (2002). Identification of genes associated with head and neck carcinogenesis by cDNA microarray com-parison between matched primary normal epithelial and squamous carcinoma cells. Oncogene, 21(17), 2634–2640. doi:10.1038/sj.onc.1205351

Alfarano, C., Andrade, C. E., Anthony, K., Bahroos, N., Bajec, M., & Bantoft, K. (2005). The biomolecu-lar interaction network database and related tools 2005 update. NucleicAcidsResearch, 33, 418–424. doi:10.1093/nar/gki051

Allison, D. B., Cui, X., Page, G. P., & Sabripour, M. (2006). Microarray data analysis: From disarray to consolidation and consensus. NatureReviews.Genet-ics, 7(1), 55–65. doi:10.1038/nrg1749

Amundson, S. A., Myers, T. G., Scudiero, D., Kitada, S., Reed, J. C., & Fornace, A. J. Jr. (2000). An informatics approach identifying markers of chemosensitivity in human cancer cell lines. CancerResearch, 60(21), 6101–6110.

Argyropoulos, C., Chatziioannou, A. A., Nikiforidis, G., Moustakas, A., Kollias, G., & Aidinis, V. (2006). Operational criteria for selecting a cDNA microarray data normalization algorithm. Oncology Reports, 15, 983–996.

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., & Cherry, J. M. (2000). Gene ontol-ogy: Tool for the unification of biology. The gene ontology consortium. NatureGenetics, 25(1), 25–29. doi:10.1038/75556

Astrand, M. (2003). Contrast normalization of oligo-nucleotide arrays. JournalofComputationalBiology, 10(1), 95–102. doi:10.1089/106652703763255697

Backes, C., Keller, A., Kuentzer, J., Kneissl, B., Comtesse, N., & Elnakady, Y. A. (2007). GeneTrail--Advanced gene set enrichment analysis. NucleicAc-idsResearch, 3, 186–192. doi:10.1093/nar/gkm323

Baggerly, K. A., & Coombes, K. R. (2009). Deriving chemosensitivity from cell lines: Forensic bioinfor-matics and reproducible research in high-throughput biology. Ann.Appl.Stat., 3(4), 25. doi:10.1214/09-AOAS291

Barbosa-Morais, N. L., Dunning, M. J., Samarajiwa, S. A., Darot, J. F., Ritchie, M. E., Lynch, A. G., & Tavare, S. (2010). A re-annotation pipeline for Il-lumina BeadArrays: Improving the interpretation of gene expression data. NucleicAcidsResearch, 38(3), e17. doi:10.1093/nar/gkp942

Barnes, M., Freudenberg, J., Thompson, S., Aronow, B., & Pavlidis, P. (2005). Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. NucleicAcidsRe-search, 33(18), 5914–5923. doi:10.1093/nar/gki890

Barrett, T., Suzek, T. O., Troup, D. B., Wilhite, S. E., Ngau, W. C., & Ledoux, P. (2005). NCBI GEO: Mining millions of expression profiles--database and tools. Nucleic Acids Research, 33, 562–566. doi:10.1093/nar/gki022

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. JournaloftheRoyalStatisticalSociety.SeriesB.Methodological, 57, 11.

Benson, M., & Breitling, R. (2006). Network theory to understand microarray studies of complex dis-eases. CurrentMolecularMedicine, 6(6), 695–701. doi:10.2174/156652406778195044

Best, C. J., Leiva, I. M., Chuaqui, R. F., Gillespie, J. W., Duray, P. H., & Murgai, M. (2003). Molecular differentiation of high- and moderate-grade human prostate cancer by cDNA microarray analysis. Diagnostic Molecular Pathology, 12(2), 63–70. doi:10.1097/00019606-200306000-00001

Bilban, M., Buehler, L. K., Head, S., Desoye, G., & Quaranta, V. (2002). Normalizing DNA microarray data. Current Issues in Molecular Biology, 4(2), 57–64.

Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., & Mangan, M. …Taylor, J. (2010). Galaxy: A web-based genome analysis tool for experimentalists. In F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J. G. Seidman, J. A. Smith et al. (Eds.), Currentprotocolsinmolecularbiology (Ch. 19, pp. 1-21). New York, NY: John Wiley & Sons.



Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics(Oxford,England), 19(2), 185–193. doi:10.1093/bioinformat-ics/19.2.185

Brazma, A., Hingamp, P., Quackenbush, J., Sher-lock, G., Spellman, P., & Stoeckert, C. (2001). Minimum information about a microarray experi-ment (MIAME)-toward standards for microarray data. NatureGenetics, 29(4), 365–371. doi:10.1038/ng1201-365

Breitling, R., Armengaud, P., Amtmann, A., & Her-zyk, P. (2004). Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBSLetters, 573(1-3), 83–92. doi:10.1016/j.febslet.2004.07.055

Breitling, R., & Herzyk, P. (2005). Rank-based meth-ods as a non-parametric alternative of the T-statistic for the analysis of biological microarray data. JournalofBioinformaticsandComputationalBiology, 3(5), 1171–1189. doi:10.1142/S0219720005001442

Choudhary, J., & Grant, S. G. (2004). Proteomics in postgenomic neuroscience: the end of the beginning. NatureNeuroscience, 7(5), 440–445. doi:10.1038/nn1240

Chuaqui, R. F., Bonner, R. F., Best, C. J., Gillespie, J. W., Flaig, M. J., & Hewitt, S. M. (2002). Post-analysis follow-up and validation of microarray experiments. NatureGenetics, 32, 509–514. doi:10.1038/ng1034

Churchill, G. A. (2002). Fundamentals of experimen-tal design for cDNA microarrays. NatureGenetics, 32, 490–495. doi:10.1038/ng1031

Cleveland, W. S., Grosse, E., & Shyu, W. M. (1991). Local regression models. In Chambers, J. M., & Has-tie, T. (Eds.), StatisticalmodelsinS (pp. 309–376). New York, NY: Chapman & Hall.

Conover, W. (1980). Practicalnonparametricstatis-tics. New York, NY: John Wiley & Sons.

Coons, S. J. (2009). The FDA’s critical path initiative: A brief introduction. ClinicalTherapeutics, 31(11), 2572–2573. doi:10.1016/j.clinthera.2009.11.035

Cote, R. G., Jones, P., Apweiler, R., & Hermjakob, H. (2006). The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary que-ries. BMCBioinformatics, 7, 97. doi:10.1186/1471-2105-7-97

Cui, X., & Churchill, G. A. (2003). Statistical tests for differential expression in cDNA microarray ex-periments. GenomeBiology, 4(4), 210. doi:10.1186/gb-2003-4-4-210

Dagliyan, O., Uney-Yuksektepe, F., Kavakli, I. H., & Turkay, M. (2011). Optimization based tumor classification from microarray gene expression data. PLoSONE, 6(2), e14579. doi:10.1371/journal.pone.0014579

Dai, M., Wang, P., Boyd, A. D., Kostov, G., Athey, B., & Jones, E. G. (2005). Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Research, 33(20), e175. doi:10.1093/nar/gni179

Damaraju, R., & Lakshmi, V. P. (2005). Blockdesigns:Analysis,combinatoricsandapplications. Singapore: World Scientific.

Dan, S., Tsunoda, T., Kitahara, O., Yanagawa, R., Zembutsu, H., & Katagiri, T. (2002). An integrated database of chemosensitivity to 55 anticancer drugs and gene expression profiles of 39 human cancer cell lines. CancerResearch, 62(4), 1139–1147.

de Reynies, A., Geromin, D., Cayuela, J. M., Petel, F., Dessen, P., Sigaux, F., & Rickman, D. S. (2006). Comparison of the latest commercial short and long oligonucleotide microarray technologies. BMCGe-nomics, 7, 51. doi:10.1186/1471-2164-7-51

Dinh, A. K. (2011). Cloud computing 101. JournalofAmericanHealthInformationManagementAs-sociation, 82(4), 36–37, 44.

Dobbin, K., & Simon, R. (2005). Sample size determination in microarray experiments for class comparison and prognostic classification. Biosta-tistics(Oxford,England), 6(1), 27–38. doi:10.1093/biostatistics/kxh015

Dobbin, K. K., Kawasaki, E. S., Petersen, D. W., & Simon, R. M. (2005). Characterizing dye bias in microarray experiments. Bioinformatics (Oxford,England), 21(10), 2430–2437. doi:10.1093/bioin-formatics/bti378

Donders, A. R., van der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). Review: A gentle introduction to imputation of missing values. JournalofClinicalEpidemiology, 59(10), 1087–1091. doi:10.1016/j.jclinepi.2006.01.014

Draghici, S., Khatri, P., Eklund, A. C., & Szallasi, Z. (2006). Reliability and reproducibility issues in DNA microarray measurements. TrendsinGenet-ics, 22(2), 101–109. doi:10.1016/j.tig.2005.12.005



Draghici, S., Sellamuthu, S., & Khatri, P. (2006). Babel’s tower revisited: A universal resource for cross-referencing across annotation databases. Bio-informatics(Oxford,England), 22(23), 2934–2939. doi:10.1093/bioinformatics/btl372

Du, P. (2010). PreprocessAffymetrix data by in-tegratingVSTwithRMAmethod (Version lumi v.1.8.3). Retrieved from http://svitsrv25.epfl.ch/R-doc/library/lumi/html/affyVstRma.html

Dudoit, S., Yang, Y. H., Speed, T., & Callow, M. J. (2002). Statistical methods for identifying differen-tially expressed genes in replicated cDNA microarray experiments. StatisticaSinica, 12, 18.

Dupuy, A., & Simon, R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, 99(2), 147–157. doi:10.1093/jnci/djk018

Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A., & Huber, W. (2005). BioMart and Bioconductor: A powerful link between biologi-cal databases and microarray data analysis. Bioin-formatics (Oxford,England), 21(16), 3439–3440. doi:10.1093/bioinformatics/bti525

Efron, B., & Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. GeneticEpidemiology, 23(1), 70–86. doi:10.1002/gepi.1124

Elo, L. L., Lahti, L., Skottman, H., Kylaniemi, M., Lahesmaa, R., & Aittokallio, T. (2005). Integrating probe-level expression changes across generations of Affymetrix arrays. NucleicAcidsResearch, 33(22), e193. doi:10.1093/nar/gni193

Eszlinger, M., Krohn, K., Kukulska, A., Jarzab, B., & Paschke, R. (2007). Perspectives and limitations of microarray-based gene expression profiling of thyroid tumors. EndocrineReviews, 28(3), 322–338. doi:10.1210/er.2006-0047

Everitt, B. S. (2007). Medicalstatistics fromAtoZ:Aguideforcliniciansandmedicalstudents (2nd ed.). Cambridge, UK: Cambridge University Press.

Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. doi:10.3758/BRM.41.4.1149

Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. BehaviorResearchMethods, 39(2), 175–191. doi:10.3758/BF03193146

Festing, M. F., & Altman, D. G. (2002). Guidelines for the design and statistical analysis of experiments using laboratory animals. TheInstituteforLaboratoryAnimalResearchJournal, 43(4), 244–258.

Fodor, S. P., Read, J. L., Pirrung, M. C., Stryer, L., Lu, A. T., & Solas, D. (1991). Light-directed, spatially addressable parallel chemical synthesis. Science, 251(4995), 767–773. doi:10.1126/science.1990438

Fox, A. (2011). Computer science. Cloud computing--what’s in it for me as a scientist? Science, 331(6016), 406–407. doi:10.1126/science.1198981

Futschik, M. E., Sullivan, M., Reeve, A., & Kas-abov, N. (2003). Prediction of clinical behaviour and treatment for cancers. AppliedBioinformatics, 2(3), 53–58.

Gautier, L., Moller, M., Friis-Hansen, L., & Knudsen, S. (2004). Alternative mapping of probes to genes for Affymetrix chips. BMCBioinformatics, 5, 111. doi:10.1186/1471-2105-5-111

Gavin, A. C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., & Bauer, A. (2002). Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415(6868), 141–147. doi:10.1038/415141a

Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., & Dudoit, S. (2004). Bioconduc-tor: Open software development for computational biology and bioinformatics. GenomeBiology, 5(10), R80. doi:10.1186/gb-2004-5-10-r80

Giot, L., Bader, J. S., Brouwer, C., Chaudhuri, A., Kuang, B., & Li, Y. (2003). A protein interaction map of Drosophila melanogaster. Science, 302(5651), 1727–1736. doi:10.1126/science.1090289

Goecks, J., Nekrutenko, A., & Taylor, J. (2010). Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computa-tional research in the life sciences. GenomeBiology, 11(8), R86. doi:10.1186/gb-2010-11-8-r86

Gold, D. L., Miecznikowski, J. C., & Liu, S. (2009). Error control variability in pathway-based microarray analysis. Bioinformatics(Oxford,England), 25(17), 2216–2221. doi:10.1093/bioinformatics/btp385



Gruber, T. R. (1993). A translation approach to por-table ontology specifications. KnowledgeAcquisi-tion, 5(2), 2. doi:10.1006/knac.1993.1008

Haider, S., Ballester, B., Smedley, D., Zhang, J., Rice, P., & Kasprzyk, A. (2009). BioMart Central Portal--unified access to biological data. NucleicAcidsResearch, 37, 23–27. doi:10.1093/nar/gkp265

Halligan, B. D., Geiger, J. F., Vallejos, A. K., Greene, A. S., & Twigger, S. N. (2009). Low cost, scalable proteomics data analysis using Amazon’s cloud com-puting services and open source search algorithms. Journal of ProteomeResearch, 8(6), 3148–3153. doi:10.1021/pr800970z

Han, J. D., Bertin, N., Hao, T., Goldberg, D. S., Berriz, G. F., & Zhang, L. V. (2004). Evidence for dynamically organized modularity in the yeast pro-tein-protein interaction network. Nature, 430(6995), 88–93. doi:10.1038/nature02555

Hardiman, G. (2004). Microarray platforms--com-parisons and contrasts. Pharmacogenomics, 5(5), 487–502. doi:10.1517/14622416.5.5.487

Hochheiser, H., Aronow, B. J., Artinger, K., Beaty, T. H., Brinkley, J. F., & Chai, Y. (2011). The FaceBase Consortium: A comprehensive program to facilitate craniofacial research. Developmental Biology, 355(2), 175–182. doi:10.1016/j.ydbio.2011.02.033

Holm, S. (1979). A simple sequentially rejective Bonferroni test procedure. Scandinavian JournalofStatistics, 6, 65–70.

Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., & Vingron, M. (2002). Variance sta-bilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics(Oxford,England), 18(1), 96–104. doi:10.1093/bioinformatics/18.suppl_1.S96

Hulbert, E. M., Smink, L. J., Adlem, E. C., Allen, J. E., Burdick, D. B., & Burren, O. S. (2007). T1DBase: Integration and presentation of complex data for type 1 diabetes research. NucleicAcidsResearch, 35(1), 742–746. doi:10.1093/nar/gkl933

Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M. R., Li, P., & Oinn, T. (2006). Taverna: A tool for building and running workflows of services. NucleicAcidsResearch, 34, 729–732. doi:10.1093/nar/gkl320

Ideker, T., Thorsson, V., Siegel, A. F., & Hood, L. E. (2000). Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. JournalofComputationalBiology, 7(6), 805–817. doi:10.1089/10665270050514945

Ikeda, T., Jinno, H., & Shirane, M. (2007). Chemo-sensitivity-related genes of breast cancer detected by DNA microarray. AnticancerResearch, 27(4C), 2649–2655.

Ioannidis, J. P., Allison, D. B., Ball, C. A., Coulibaly, I., Cui, X., & Culhane, A. C. (2009). Repeatability of published microarray gene expression analyses. Na-tureGenetics, 41(2), 149–155. doi:10.1038/ng.295

Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., & Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics(Oxford,England), 4(2), 249–264. doi:10.1093/biostatistics/4.2.249

Irizarry, R. A., Warren, D., Spencer, F., Kim, I. F., Biswal, S., & Frank, B. C. (2005). Multiple-laboratory comparison of microarray platforms. NatureMeth-ods, 2(5), 345–350. doi:10.1038/nmeth756

Jegga, A. (2006). Bio-Ontologies:Alistoflinks. Re-trieved from http://anil.cchmc.org/Bio-Ontologies.html

Johnson, P. D., & Besselsen, D. G. (2002). Practical aspects of experimental design in animal research. TheInstituteforLaboratoryAnimalResearchJour-nal, 43(4), 202–206.

Jornsten, R., Ouyang, M., & Wang, H. Y. (2007). A meta-data based method for DNA microarray imputa-tion. BMCBioinformatics, 8, 109. doi:10.1186/1471-2105-8-109

Jung, S. H. (2005). Sample size for FDR-control in microarray data analysis. Bioinformatics (Oxford,England), 21(14), 3097–3104. doi:10.1093/bioin-formatics/bti456

Kabachinski, J. (2011). What’s the forecast for cloud computing in healthcare? Biomedical In-strumentation & Technology, 45(2), 146–150. doi:10.2345/0899-8205-45.2.146

Kanehisa, M. (1995). KEGG:Kyotoencyclopediaof genes and genomes. Kyoto, Japan: Kanehisa Laboratories. doi:10.1093/nar/28.1.27

Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., & Hattori, M. (2004). The KEGG resource for deciphering the genome. NucleicAcidsResearch, 32, 277–280. doi:10.1093/nar/gkh063

Kapushesky, M., Emam, I., Holloway, E., Kurnosov, P., Zorin, A., & Malone, J. (2010). Gene expres-sion atlas at the European bioinformatics institute. NucleicAcidsResearch, 38, 690–698. doi:10.1093/nar/gkp936



Kerr, M. K., Martin, M., & Churchill, G. A. (2000). Analysis of variance for gene expression microar-ray data. JournalofComputationalBiology, 7(6), 819–837. doi:10.1089/10665270050514954

Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., & Derow, C. (2007). IntAct–open source resource for molecular interaction data. NucleicAcidsResearch, 35, 561–565. doi:10.1093/nar/gkl958

Keselman, H. J., Cribbie, R., & Holland, B. (2002). Controlling the rate of Type I error over a large set of statistical tests. The British Journal of Math-ematicalandStatisticalPsychology, 55(1), 27–39. doi:10.1348/000711002159680

Kiewe, P., Gueller, S., Komor, M., Stroux, A., Thiel, E., & Hofmann, W. K. (2009). Prediction of qualitative outcome of oligonucleotide microarray hybridization by measurement of RNA integrity us-ing the 2100 Bioanalyzer capillary electrophoresis system. AnnalsofHematology, 88(12), 1177–1183. doi:10.1007/s00277-009-0751-5

Kikuchi, T., Daigo, Y., Katagiri, T., Tsunoda, T., Okada, K., & Kakiuchi, S. (2003). Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs. Oncogene,22(14).

Kim, J. M., Sohn, H. Y., Yoon, S. Y., Oh, J. H., Yang, J. O., Kim, J. H.,…Kim, N. S. (2005). Identifica-tion of gastric cancer-related genes using a cDNA microarray containing novel expressed sequence tags expressed in gastric cancer cells. ClinicalCancerResearch,11(2).

Kim, S. Y., & Volsky, D. J. (2005). PAGE: Parametric analysis of gene set enrichment. BMCBioinformat-ics, 6, 144.

Kitchen, R. R., Sabine, V. S., Simen, A. A., Dixon, J. M., Bartlett, J. M., & Sims, A. H. (2011). Rela-tive impact of key sources of systematic noise in Affymetrix and Illumina gene-expression mi-croarray experiments. BMC Genomics, 12, 589. doi:10.1186/1471-2164-12-589

Kohler, S., Bauer, S., Horn, D., & Robinson, P. N. (2008). Walking the interactome for prioritization of candidate disease genes. American Journal ofHuman Genetics, 82(4), 949–958. doi:10.1016/j.ajhg.2008.02.013

Lancaster, J. M., Dressman, H. K., Clarke, J. P., Sayer, R. A., Martino, M. A., & Cragun, J. M. (2006). Identification of genes associated with ovarian cancer metastasis using microarray expression analysis. In-ternationalJournalofGynecologicalCancer, 16(5), 1733–1745. doi:10.1111/j.1525-1438.2006.00660.x

Larsson, O., Wennmalm, K., & Sandberg, R. (2006). Comparative microarray analysis. OMICS:AJournalofIntegrativeBiology, 10(3), 381–397. doi:10.1089/omi.2006.10.381

Lee, R. C., Feinbaum, R. L., & Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 75(5), 843–854. doi:10.1016/0092-8674(93)90529-Y

Lee, Z.-J., Lin, S. W., Hsu, C.-C. V., & Huang, Y.-P. (2006, November 14-17). Gene extraction and identification tumor/cancer for microarray data of ovarian cancer. In ProceedingsoftheIEEERegion10Conference (pp. 1-3).

Leung, Y. F. (2007). Functionalgenomics. Retrieved from http://genomicshome.com/

Li, C., & Hung Wong, W. (2001). Model-based analysis of oligonucleotide arrays: Model valida-tion, design issues and standard error application. GenomeBiology, 2(8), Li, S., Armstrong, C. M., Bertin, N., Ge, H., Milstein, S., Boxem, M.,…Vidal, M. (2004). A map of the interactome network of the metazoan C. elegans. Science, 303(5657), 540–543. doi:10.1126/science.1091403

Lin, S. M., Du, P., Huber, W., & Kibbe, W. A. (2008). Model-based variance-stabilizing transformation for Illumina microarray data. NucleicAcidsResearch, 36(2), e11. doi:10.1093/nar/gkm1075

Liu, H., Li, J., & Wong, L. (2005). Use of extreme patient samples for outcome prediction from gene expression data. Bioinformatics(Oxford,England), 21(16), 3377–3384. doi:10.1093/bioinformatics/bti544

Liu, H., Zeeberg, B. R., Qu, G., Koru, A. G., Fer-rucci, A., & Kahn, A. (2007). AffyProbeMiner: A web resource for computing or retrieving accurately redefined Affymetrix probe sets. Bioinformatics(Oxford,England), 23(18), 2385–2390. doi:10.1093/bioinformatics/btm360

Lu, J., Lee, J. C., Salit, M. L., & Cam, M. C. (2007). Transcript-based redefinition of grouped oligonucle-otide probe sets using AceView: High-resolution annotation for microarrays. BMCBioinformatics, 8, 108. doi:10.1186/1471-2105-8-108



Mahajan, R., & Gupta, K. (2010). Food and drug administration’s critical path initiative and innova-tions in drug development paradigm: Challenges, progress, and controversies. JournalofPharmacyandBioalliedScience, 2(4), 307–313. doi:10.4103/0975-7406.72130

Mecham, B. H., Nelson, P. S., & Storey, J. D. (2010). Supervised normalization of microarrays. Bioin-formatics (Oxford,England), 26(10), 1308–1315. doi:10.1093/bioinformatics/btq118

Mecham, B. H., Wetmore, D. Z., Szallasi, Z., Sa-dovsky, Y., Kohane, I., & Mariani, T. J. (2004). In-creased measurement accuracy for sequence-verified microarray probes. PhysiologicalGenomics, 18(3), 308–315. doi:10.1152/physiolgenomics.00066.2004

Michiels, S., Koscielny, S., & Hill, C. (2005). Predic-tion of cancer outcome with microarrays: a multiple random validation strategy. Lancet, 365(9458), 488–492. doi:10.1016/S0140-6736(05)17866-0

Mischel, P. S., Cloughesy, T. F., & Nelson, S. F. (2004). DNA-microarray analysis of brain cancer: molecular classification for therapy. NatureReviews.Neuroscience, 5(10), 782–792. doi:10.1038/nrn1518

MYcroarray.com. (2011). Custommicroarraysandcapture bail libraries. Retrieved July 10, 2011, from http://www.mycroarray.com/mycroarray/cust_arrays.html

Naef, F., Lim, D. A., Patil, N., & Magnasco, M. (2002). DNA hybridization to mismatched tem-plates: A chip study. PhysicalReviewE:Statistical,Nonlinear,andSoftMatterPhysics, 65(4), 040902. doi:10.1103/PhysRevE.65.040902

Naef, F., & Magnasco, M. O. (2003). Solving the rid-dle of the bright mismatches: Labeling and effective binding in oligonucleotide arrays. PhysicalReviewE:Statistical,Nonlinear,andSoftMatterPhysics, 68(1), 011906. doi:10.1103/PhysRevE.68.011906

Nguyen, D. V., & Rocke, D. M. (2002). Tumor classi-fication by partial least squares using microarray gene expression data. Bioinformatics(Oxford,England), 18(1), 39–50. doi:10.1093/bioinformatics/18.1.39

Ory, B., Ramsey, M. R., Wilson, C., Vadysirisack, D. D., Forster, N., & Rocco, J. W. (2011). A microRNA-dependent program controls p53-independent sur-vival and chemosensitivity in human and murine squamous cell carcinoma. TheJournalofClinicalIn-vestigation, 121(2), 809–820. doi:10.1172/JCI43897

Oster, S., Langella, S., Hastings, S., Ervin, D., Mad-duri, R., & Kurc, T. …Saltz, J. (2007). caGrid 1.0: A Grid enterprise architecture for cancer research. In Proceedings of the AMIA Annual Symposium (pp. 573-577).

Paparountas, T. (2007). Bioinformatics-Biostatisticsandcomputationalbiologyresources. Retrieved June 16, 2007, from http://www.bioinformatics.gr

Park, T., Yi, S. G., Lee, S., Lee, S. Y., Yoo, D. H., Ahn, J. I., & Lee, Y. S. (2003). Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics(Oxford, England), 19(6), 694–703. doi:10.1093/bioinformatics/btg068

Parkinson, H., Sarkans, U., Shojatalab, M., Abeygu-nawardena, N., Contrino, S., & Coulson, R. (2005). ArrayExpress--A public repository for microarray gene expression data at the EBI. NucleicAcidsRe-search, 33, 553–555. doi:10.1093/nar/gki056

Pavlidis, P., Furey, T. S., Liberto, M., Haussler, D., & Grundy, W. N. (2001). Promoter region-based classification of genes. In ProceedingsofthePacificSymposiumonBiocomputing (pp. 151-163).

Pearl, J. (1998). Whythereisnostatisticaltestforconfounding,whymanythinkthereis,andwhytheyarealmost right (Department, C. S., Trans.). Los Angeles, CA: UCLA University.

Perez-Diez, A., Morgun, A., & Shulzhenko, N. (2007). Microarrays for cancer diagnosis and classifi-cation. AdvancesinExperimentalMedicineandBiol-ogy, 593, 74–85. doi:10.1007/978-0-387-39978-2_8

Peri, S., Navarro, J. D., Kristiansen, T. Z., Amanchy, R., Surendranath, V., & Muthusamy, B. (2004). Human protein reference database as a discovery resource for proteomics. NucleicAcidsResearch, 32, 497–501. doi:10.1093/nar/gkh070

Promega Inc. (2012). Base: Dye Ratio Calcula-tor. Retrieved from http://probes.invitrogen.com/resources/calc/basedyeratio.html

Qiu, W. L., Lee, M. T., & Whitmore, G. A. (2007). Samplesizeandpowercalculation inmicroarraystudiesusingthesizepowerpackageforr-biocon-ductor. Retrieved from http://rss.acs.unt.edu/Rdoc/library/sizepower/doc/index.html

Quackenbush, J. (2002). Microarray data normal-ization and transformation. Nature Genetics, 32, 496–501. doi:10.1038/ng1032

Quackenbush, J. (2003). Genomics. Microarrays--guilt by association. Science, 302(5643), 240–241. doi:10.1126/science.1090887



Rain, J. C., Selig, L., De Reuse, H., Battaglia, V., Reverdy, C., & Simon, S. (2001). The protein-protein interaction map of Helicobacter pylori. Nature, 409(6817), 211–215. doi:10.1038/35051615

Rao, Y. (2009). Statisticalanalysisofmicroarrayexperiments in pharmacogenomics. Athens, OH: Ohio University.

Ray, C. (2011). Cancer identification and gene clas-sification using DNA microarray gene expression patterns. InternationalJournalofComputerScienceIssues, 8(2).

Reiner, A., Yekutieli, D., & Benjamini, Y. (2003). Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformat-ics(Oxford,England), 19(3), 368–375. doi:10.1093/bioinformatics/btf877

Rhodes, D. R., Kalyana-Sundaram, S., Mahavisno, V., Varambally, R., Yu, J., & Briggs, B. B. (2007). Oncomine 3.0: Genes, pathways, and networks in a collection of 18,000 cancer gene expression pro-files. Neoplasia (NewYork,N.Y.), 9(2), 166–180. doi:10.1593/neo.07112

Ruvkun, G. (2001). Molecular biology. Glimpses of a tiny RNA world. Science, 294(5543), 797–799. doi:10.1126/science.1066315

Ryden, P., Andersson, H., Landfors, M., Naslund, L., Hartmanova, B., Noppa, L., & Sjostedt, A. (2006). Evaluation of microarray data normalization proce-dures using spike-in experiments. BMCBioinformat-ics, 7, 300. doi:10.1186/1471-2105-7-300

Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U., & Eisenberg, D. (2004). The database of interacting proteins: 2004 update. NucleicAcidsResearch, 32, 449–451. doi:10.1093/nar/gkh086

Sandberg, R., & Larsson, O. (2007). Improved pre-cision and accuracy for microarrays using updated probe set definitions. BMCBioinformatics, 8, 48. doi:10.1186/1471-2105-8-48

Sax, J. K., & El-Deiry, W. S. (2003). p53 down-stream targets and chemosensitivity. Cell DeathandDifferentiation, 10(4), 413–417. doi:10.1038/sj.cdd.4401227

Schatz, M. C., Langmead, B., & Salzberg, S. L. (2010). Cloud computing and the DNA data race. NatureBiotechnology, 28(7), 691–693. doi:10.1038/nbt0710-691

School of Computer Science. (2008). What is aworkflow. Retrieved from http://www.mygrid.org.uk/tools/taverna/what-is-a-workflow/

Shaw, R., Festing, M. F., Peers, I., & Furlong, L. (2002). Use of factorial designs to optimize animal experiments and reduce animal use. InstituteforLab-oratoryAnimalResearchJournal, 43(4), 223–232.

Shedden, K., Chen, W., Kuick, R., Ghosh, D., Mac-donald, J., & Cho, K. R. (2005). Comparison of seven methods for producing Affymetrix expression scores based on false discovery rates in disease profiling data. BMCBioinformatics, 6, 26. doi:10.1186/1471-2105-6-26

Shi, L., Campbell, G., Jones, W. D., Campagne, F., Wen, Z., & Walker, S. J. (2010). The MicroAr-ray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Bio-technology, 28(8), 827–838. doi:10.1038/nbt.1665

Shi, L., Reid, L. H., Jones, W. D., Shippy, R., War-rington, J. A., & Baker, S. C. (2006). The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression mea-surements. NatureBiotechnology, 24(9), 1151–1161. doi:10.1038/nbt1239

Shi, L., Tong, W., Fang, H., Scherf, U., Han, J., & Puri, R. K. (2005). Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMCBioinformatics, 6(2), 12. doi:10.1186/1471-2105-6-S2-S12

Simon, R. (2003). Using DNA microarrays for diagnostic and prognostic prediction. Expert Re-view of Molecular Diagnostics, 3(5), 587–595. doi:10.1586/14737159.3.5.587

Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., & Kasprzyk, A. (2009). BioMart--Biological queries made easy. BMCGe-nomics, 10, 22. doi:10.1186/1471-2164-10-22

Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., & Ceusters, W. (2007). The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration. NatureBiotechnology, 25(11), 1251–1255. doi:10.1038/nbt1346

Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applica-tions in Genetics and Molecular Biology, 3, 3. doi:10.2202/1544-6115.1027

Speed, T. (2003). Statisticalanalysisofgeneexpres-sionmicroarraydata. Boca Raton, FL: Chapman & Hall/CRC.



Stark, C., Breitkreutz, B. J., Reguly, T., Boucher, L., Breitkreutz, A., & Tyers, M. (2006). BioGRID: A gen-eral repository for interaction datasets. NucleicAcidsResearch, 34, 535–539. doi:10.1093/nar/gkj109

Storey, J. D. (2002). A direct approach to false dis-covery rates. JournaloftheRoyalStatisticalSociety.SeriesB.Methodological, 64, 19. doi:10.1111/1467-9868.00346

Stuart, J. M., Segal, E., Koller, D., & Kim, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643), 249–255. doi:10.1126/science.1087447

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., & Gillette, M. A. …Mesirov, J. P. (2005). Gene set enrichment analy-sis: A knowledge-based approach for interpreting genome-wide expression profiles. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,102(43), 15545-15550.

Team, R. D. C. (2008). R:Alanguageandenviron-mentforstatisticalcomputing. Vienna, Austria: R Foundation for Statistical Computing.

The NCI Center for Bioinformatics. (2011). caInte-grator:Web-basedsoftwarepackage(version1.3). Retrieved from https://cabig.nci.nih.gov/tools/caIntegrator

Thompson, K. L., & Hackett, J. (2008). Quality con-trol of microarray assays for toxicogenomic and in vitro diagnostic applications. MethodsinMolecularBiology(Clifton,N.J.), 460, 45–68. doi:10.1007/978-1-60327-048-9_3

Tibshirani, R. (2006). A simple method for assessing sample sizes in microarray experiments. BMCBio-informatics, 7, 106. doi:10.1186/1471-2105-7-106

Timlin, J. A. (2006). Scanning microarrays: Cur-rent methods and future directions. Methods inEnzymology, 411, 79–98. doi:10.1016/S0076-6879(06)11006-X

Troester, M. A., Millikan, R. C., & Perou, C. M. (2009). Microarrays and epidemiology: Ensuring the impact and accessibility of research findings. Cancer Epidemiology, Biomarkers&Prevention, 18(1), 1–4. doi:10.1158/1055-9965.EPI-08-0867

Tseng, G. C., Oh, M. K., Rohlin, L., Liao, J. C., & Wong, W. H. (2001). Issues in cDNA microarray analysis: Quality filtering, channel normalization, models of variations and assessment of gene ef-fects. NucleicAcidsResearch, 29(12), 2549–2557. doi:10.1093/nar/29.12.2549

Tusher, V. G., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings ofthe National Academy of Sciences of the UnitedStatesofAmerica, 98(9), 5116–5121. doi:10.1073/pnas.091062498

Tzouvelekis, A., Harokopos, V., Paparountas, T., Oikonomou, N., Chatziioannou, A., & Vilaras, G. (2007). Comparative expression profiling in pul-monary fibrosis suggests a role of hypoxia inducible factor 1a in disease pathogenesis. AmericanJournalof Respiratory and Critical CareMedicine, 176, 1108–1119. doi:10.1164/rccm.200705-683OC

Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Jud-son, R. S., & Knight, J. R. (2000). A comprehensive analysis of protein-protein interactions in Saccha-romyces cerevisiae. Nature, 403(6770), 623–627. doi:10.1038/35001009

United States Department of Energy. (2010). DOEsystems biology knowledgebase implementationplan. Retrieved June, 10, 2011, from http://genom-icscience.energy.gov/compbio/kbase_plan/index.shtml#page=news

U.S. Food and Drug Administration. (n.d.). Micro-arrayQualitycontrol(MAQC)Project. from http://www.fda.gov/nctr/science/centers/toxicoinformat-ics/maqc/

van Bakel, H., & Holstege, F. C. (2004). In control: Systematic assessment of microarray performance. EuropeanMolecularBiologyOrganization, 5(10), 964–969.

van den Oord, E. J., & Sullivan, P. F. (2003). False discoveries and models for gene discovery. Trendsin Genetics, 19(10), 537–542. doi:10.1016/j.tig.2003.08.003

Verdugo, R. A., Deschepper, C. F., Munoz, G., Pomp, D., & Churchill, G. A. (2009). Importance of ran-domization in microarray experimental designs with Illumina platforms. NucleicAcidsResearch, 37(17), 5610–5618. doi:10.1093/nar/gkp573

Wang, X., Hessner, M. J., Wu, Y., Pati, N., & Ghosh, S. (2003). Quantitative quality control in microarray experiments and the application in data filtering, normalization and false positive rate prediction. Bio-informatics(Oxford,England), 19(11), 1341–1347. doi:10.1093/bioinformatics/btg154

Wei, C., Li, J., & Bumgarner, R. E. (2004). Sample size for detecting differentially expressed genes in microarray experiments. BMCGenomics, 5(1), 87. doi:10.1186/1471-2164-5-87



Wolfe, C. J., Kohane, I. S., & Butte, A. J. (2005). Systematic survey reveals general applicability of “guilt-by-association” within gene coexpres-sion networks. BMC Bioinformatics, 6, 227. doi:10.1186/1471-2105-6-227

Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., & Bushel, P. (2001). Assessing gene significance from cDNA micro-array expression data via mixed models. Jour-nal of Computational Biology, 8(6), 625–637. doi:10.1089/106652701753307520

Wolfson, W. (2008). caBIG: Seeking cancer cures by bits and bytes. Chemistry&Biology, 15(6), 521–522. doi:10.1016/j.chembiol.2008.06.003

Woodcock, J., & Woosley, R. (2008). The FDA critical path initiative and its influence on new drug development. AnnualReviewofMedicine, 59, 1–12. doi:10.1146/annurev.med.59.090506.155819

Wu, C., Orozco, C., Boyer, J., Leglise, M., Goodale, J., & Batalov, S. (2009). BioGPS: An extensible and customizable portal for querying and organizing gene annotation resources. GenomeBiology, 10(11), R130. doi:10.1186/gb-2009-10-11-r130

Wu, W., Xing, E. P., Myers, C., Mian, I. S., & Bissell, M. J. (2005). Evaluation of normalization methods for cDNA microarray data by k-NN classification. BMC Bioinformatics, 6, 191. doi:10.1186/1471-2105-6-191

Wu, Z., Irizarry, R. A., Gentleman, R., Martinez-Mu-rillo, F., & Spencer, F. (2004). A model-based back-ground adjustment for oligonucleotide expression arrays. JournaloftheAmericanStatisticalAssocia-tion, 99(468), 8. doi:10.1198/016214504000000683

Yang, M. C., Yang, J. J., McIndoe, R. A., & She, J. X. (2003). Microarray experimental design: power and sample size considerations. PhysiologicalGe-nomics, 16(1), 24–28. doi:10.1152/physiolgenom-ics.00037.2003

Yauk, C. L., Berndt, L., Williams, A., & Douglas, G. R. (2005). Automation of cDNA microarray hybrid-ization and washing yields improved data quality. JournalofBiochemicalandBiophysicalMethods, 64(1), 69–75. doi:10.1016/j.jbbm.2005.06.002

Yauk, C. L., & Berndt, M. L. (2007). Reviewoftheliterature examining the correlation amongDNAmicroarray technologies. Environmental and Mo-lecular Mutagensis. doi:10.1002/em.20290

TriantafyllosPaparountasBSc inBiochemistryandMolecularMedicinewithHnrs. (2000),FacultyofBiologicalSciences,UniversityofEssexUK ,MSc inBioinformatics,FacultyofContemporarySciences,UniversityofAbertayDundeeUK(2002),PhDinBioinformatics,Sec-torII,NationalTechnologicalUniversityofAthensGreece(2009),TraineeInstituteforGenomeSciencesUniversityofMaryland,USA(2010),MScMedicalStatistics,AthensUniversityofEconomics&Business(2012,underway).PostDocinBioinformaticsattheBRFAA(Bioacad-emy.gr)AthensGreece(2011),PostDocinBioinformaticsattheDulbeccoTelethonInstitute,EpigeneticsandGenomeReprogramminglab,RomaItaly(2012,currently).Hehaspublished4articlesinInternationalpeerreviewedjournals.Researchinterests:AdvancementofstatisticalanalysismethodsinMicroarraysandSequencingTechnologies.

Maria Nefeli Nikolaidou-Katsaridou, BSc Biochemistry&AppliedMolecular Biology withHnrs.,UMIST,Manchester,U.K.(2001),MScBiomedicalSciencesResearch,King’sCollege,London,U.K.(2002),PhDinMicrobialGenetics,UniversityofEastAnglia,Norwich,U.K.(2008),AdvancedResearchAssistantattheWellcomeTrust,SangerInstitute,Cambridge,U.K.atthepathogenmicroarraysteam(2003)Currentposition:Post-doctorateresearcheratDr.V.Aidinislab,InstituteofImmunology(BBSRC).Researchinterests:Autotaxinexpressionanditsroleinhealthanddisease.Shehaspublished4papers.



GabriellaRustici,BScBiologywithHnrs.,UniversityofTurin,Italy(1999);PhDinGenetics,UniversityofCambridge,UK(2004);Post-doctorateatNationalCancerInstitute,NCI-NIH,Bethesda, USA (2005-2007). Current position: Research and Training Coordinator in theFunctionalGenomicsGroupattheEuropeanBioinformaticsInstitute(EBI),Cambridge,UK.Researchinterests:functionalgenomicsdataanalysisandvisualization.

VassilisAidinis,BScBiology,UniversityofPatras,Greece(1987).PhDinMolecularBiology,UniversityofAthens(1994).Mandatorymilitaryserviceatthepathologydepartment,NavalHospitalofAthens(1994-96).Post-doctoralresearchassociateatMountSinaiMedicalCenter,NYC,USA(1996-1999).Post-doctoralresearchassociateattheHellenicPasteurInstitute(1999-2000).ResearchergradeB(eq.AssistantProfessor)attheInstituteofImmunology,BSRCFlem-ing(2001-2006).ResearchergradeB(eq.AssociateProfessor)attheInstituteofImmunology,BSRCFleming(2006-present).Technologyinterests:expressionprofiling,mousedatabases,bioinformatics.Researchinterests:phospholipidsignalinginhealthanddisease.



APPENDIX

Abbreviations: NCTR: National Center for Toxicological Research; MAQC: MicroArray Qual-ity Control; FDA: US Food and Drug Administration; MIAME (Minimum Information About Microarray Expression) Df: Degrees of freedom; FOI: Frequency Of Incorporation; ANOVA: Analysis Of Variance; MIAME: Minimum Information About Microarray Expression; LOW-ESS: Locally Weighted Regression; FDR: False Discovery Rate; Chip-on-ChIP: Chromatin Immunoprecipitation on-ChIP; SNP: Single Nucleotide Polymorphism; CGH: Comparative Genomic Hybridization; FISH: Fluorescent in Situ Hybridization; SAGE: Sequential Analysis of Gene Expression; NCTR: National Center for Toxicological Research.

SUPPLEMENTARY TABLES

Table2.Microarrayrepositories(Allfreemayneedregistration)

Name Main Web Page Initial Web Page

Alliance for Cellular Signaling (AfCS) Data Center.

http://www.signaling-gateway.org/data/

http://www.signaling-gateway.org/data/micro/cgi-bin/micro.cgi

ArrayExpress http://www.ebi.ac.uk/microarray-as/ae/

caArray https://caarraydb.nci.nih.gov/caarray/

CEBShttp://cebs.niehs.nih.gov/cebs-brows-er/cebsHome.do;jsessionid=B9B6C8E67C55832D1CB72C4DB6A7A436

http://cebs.niehs.nih.gov/microarray/manager

Cibex Japan Array Database http://cibex.nig.ac.jp/index.jsp

CleanEx http://www.cleanex.isb-sib.ch/

CycleBase http://www.cyclebase.org/

EPConDB -Endocrine pancreas consortium database

http://www.cbil.upenn.edu/ep-condb42/

EpoDB -Erythropoiesis Database http://www.cbil.upenn.edu/EpoDB/

ExpressDB - A relational database containing yeast and E. coli RNA expression data

http://arep.med.harvard.edu/Ex-pressDB/

FLIGHT - Drosophila database http://flight.licr.org/

Gene Aging Nexus http://gan.usc.edu/public/index.jsp

Genevestigator https://www.genevestigator.ethz.ch/gv/index.jsp

Genopolis Microarray Database http://www.genopolis.it/index.php https://gc-lab32.btbs.unimib.it/genopo-lisDB/html/users.php

GEO - Gene Expression Omnibus (NCBI) http://www.ncbi.nlm.nih.gov/geo/

GEOSS (GeneX-Va) http://genes.med.virginia.edu

GermOnline http://www.germonline.org/

GPX-General http://www.pathwaymedicine.ed.ac.uk/GPX

http://ebola.gti.ed.ac.uk/GPX/cgi-bin/gpx.cgi

continuedonthefollowingpage



Table2.Continued

GPX-Macrophage http://www.pathwaymedicine.ed.ac.uk/GPX

http://ebola.gti.ed.ac.uk:8090/GPX/ht-docs/index.html

HPMR - Human Plasma Membrane Recep-tome http://www.receptome.org/HPMR/

ITTACA http://bioinfo-out.curie.fr/ittaca/ http://bioinfo-out.curie.fr/

L2L Microarray Database (L2L MDB) http://depts.washington.edu/l2l/data-base.html

LOLA (only DEGs are stored) List Of Lists Annotated (LOLA) http://www.lola.gwu.edu/

M3D http://m3d.bu.edu/cgi-bin/web/array/index.pl?section=home

Madb http://nciarray.nci.nih.gov/

M-CHiPS (Multi-Conditional Hybridization Intensity Processing System)

http://www.dkfz-heidelberg.de/mchips/

MSigDB http://www.broad.mit.edu/gsea/index.jsp

http://www.broad.mit.edu/gsea/msigdb/genesets.jsp

Table3.Holisticapproaches

Name Free (Y/N) Website

caGEDA Y http://bioinformatics.upmc.edu/GE2/GEDA.html

taverna Y http://taverna.sourceforge.net/

G-pipe/Pise Y http://gene3.ciat.cgiar.org/Pise/5.a/gpipe.html

wildfire Y http://wildfire.bii.a-star.edu.sg/

spotfire Y http://spotfire.tibco.com/index.cfm

Isys Y http://www.ncgr.org/isys/

Agilent Gene-spring N http://www.chem.agilent.com/en-US/Products/software/lifesciencesinfor-

matics/genespringgx

Rosetta Resolver System N http://www.rosettabio.com/products/resolver

MeV Y http://www.tm4.org/mev.html

JExpress Y http://www.molmine.com

GenePattern Y http://www.broadinstitute.org/cancer/software/genepattern/index.html


Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Table4.Microarraysuppliers

Name Website

Affymetrix http://www.affymetrix.com/

Agilent http://www.chem.agilent.com/Scripts/PCol.asp?lPage=494

Clontech http://www.clontech.com/

Perkin-Elmer NEN http://lifesciences.perkinelmer.com/

Research Genetics http://www.resgen.com/

Sigma Genosys http://www.sigma-genosys.com/

Virtek Vision http://www.virtek.ca/

Paradigm http://www.paradigmgenetics.com/

MWG Biotech http://www.mwgbiotech.com/html/all/index.php

Imaging Research http://www.imagingresearch.com/

ChromaVision Medical Systems http://www.chromavision.com/

X-Mine http://www.x-mine.com/

Numerical Algorithms Group http://www.nag.co.uk/main_lifesciences.asp

Eurogentec http://www.eurogentec.com/carte/carte.asp

High Throughput Genomics http://www.htgenomics.com/

Table5.Clusteringmethods

Name of different clustering methods*• Hierarchical clustering • k-means clustering • Self-organizing maps • Principal components analysis • Cluster affinity search technique • Template matching • QT_Clust • Gene shaving • Evolutionary algorithms • Utilization of hidden Markov models • Artificial neural networks • Relevance networks • Support vector machines • Self Organizing Trees (SOTA) *are some of the most notable clustering methods.



Table6.Geneidconversionandannotation(allfreemayneedregistration)

Name Website

AceView (NCBI) http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/

Biomart http://www.biomart.org

DAVID http://david.abcc.ncifcrf.gov/

EASE (DAVID) http://david.abcc.ncifcrf.gov/ease/ease.jsp

AILUN http://ailun.stanford.edu

DRAGON http://pevsnerlab.kennedykrieger.org/dragon.htm

FANTOM http://www.gsc.riken.go.jp/e/FANTOM/

GeneALaCart http://www.genecards.org/BatchQueries/index.php

GeneAnnot http://genecards.weizmann.ac.il/geneannot/

GeneTide http://genecards.weizmann.ac.il/genetide-bin/tide.cgi

Genetools (NTNU) http://www.genetools.microarray.ntnu.no/adb/index.php

GeneCodis http://genecodis.dacya.ucm.es/

ID Mapping http://pir.georgetown.edu/pirwww/search/idmapping.shtml

Pathways analysis (Ingenuity Systems) http://www.ingenuity.com/products/pathways_analysis.html

MatchMiner http://discover.nci.nih.gov/matchminer/index.jsp

Onto-Translate - Onto-tools (ISBL) http://vortex.cs.wayne.edu/Projects.html

PANTHER http://www.pantherdb.org/

Resourcerer http://compbio.dfci.harvard.edu/tgi/

SOURCE http://source.stanford.edu/

UCSC Table Browser http://genome.cse.ucsc.edu/cgi-bin/hgTables

WebGestalt http://bioinfo.vanderbilt.edu/webgestalt/

Table7.Transcriptionfactorandmotifanalysis

Name Free (Y/N)

Website

AlignACE Y http://atlas.med.harvard.edu/download/index.html

BindGene Y http://www.bioinf.manchester.ac.uk/~lockwood/bindgene.html

BioProspector Y http://ai.stanford.edu/~xsliu/BioProspector/

Cis-analyst Y http://rana.lbl.gov/cis-analyst/

FastM & ModelInspector N http://www.genomatix.de/?s=8d50e93b45206c5a9a348fb1a72d5bd6

Greedy EM algorithm Y http://www.cs.uoi.gr/~kblekas/greedy/GreedyEM.html

INCLUSive Y http://homes.esat.kuleuven.be/~dna/Biol/Software.html

MDscan Y http://ai.stanford.edu/~xsliu/MDscan/

MELINA Y http://melina2.hgc.jp/public/index.html




Yeung, K. Y., Haynor, D. R., & Ruzzo, W. L. (2001). Validating clustering for gene expression data. MEME & MAST Y http://meme.sdsc.edu/meme/

Microarray Promoter Extractor N http://www.biorainbow.com/promoter_extractor/index.php

MSCAN Y http://mscan.cgb.ki.se/cgi-bin/MSCAN

MULTIPROFILER (UCSD) Y http://bix.ucsd.edu/

Pattern Search Y http://myhits.isb-sib.ch/cgi-bin/pattern_search

PatternBranching/ProfileBranch-ing (UCSD)

Y http://bix.ucsd.edu/

PatSearch (BIG) Y http://www.ba.itb.cnr.it/BIG/PatSearch/

ProGA Y http://wwwmgs.bionet.nsc.ru/mgs/programs/proga/

PROMO Y http://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3

Promoter Scan Y http://darwin.nmsu.edu/~molb470/fall2005/projects/vasude/promoscan.htm

Sequence Logos Y http://bioinformatics.weizmann.ac.il/blocks/about_logos.html

Signal Scan TFBIND (Bioin-formatics and Analysis Section, National Institutes of Health)

Y http://www-bimas.cit.nih.gov/molbio/signal/

Toucan Y http://homes.esat.kuleuven.be/~saerts/software/toucan.php

TRANSFAC (BIOBASE Biologi-cal Sciences)

N http://www.biobase-international.com/

PathoDB (BIOBASE Biological Sciences)

N http://www.biobase-international.com/

CONFAC (EMORY School of Medicine)

Y http://morenolab.whitehead.emory.edu/cgi-bin/confac/con-facHelp.pl

OMGProm (HSLS) Y http://bioinformatics.med.ohio-state.edu/OMGProm/

oPOSSUM Y http://burgundy.cmmt.ubc.ca/oPOSSUM/

JASPAR Y http://jaspar.cgb.ki.se/

ConSite Y http:/www.phylofoot.org/consite

Table7.Continued

Table8.MicroRNAspecificsoftware

Name Free (Y/N)

Website

GeneAct Y http://promoter.colorado.edu/geneact/

FatiGO+ Y http://babelomics.bioinfo.cipf.es/fatigoplus/cgi-bin/fatigoplus.cgi

Eumir Y http://miracle.igib.res.in/eumir/

HairpinFetcher Y http://miracle.igib.res.in/hfinder/

miRacle server Y http://miracle.igib.res.in/miracle/

MAMI Y http://mami.med.harvard.edu/




Bioinformatics(Oxford,England), 17(4), 309–318. doi:10.1093/bioinformatics/17.4.309ProMiR II Y http://cbit.snu.ac.kr/%7EProMiR2/

miRNA Registry Y http://www.sanger.ac.uk/Software/Rfam/mirna/index.shtml

TargetmiR Y http://miracle.igib.res.in/targetmir.html

RNAhybrid Y http://bibiserv.techfak.uni-bielefeld.de/rnahybrid

PicTar Y http://pictar.bio.nyu.edu/

MicroInspector Y http://mirna.imbb.forth.gr/microinspector/

micro RNA target search

Y http://www.microrna.org/

miRanda Y http://www.microrna.org/

miTarget Y http://cbit.snu.ac.kr/%7EmiTarget/

Table8.Continued

Table9.Disease/drugtoxicity

Name Free(y/n) Website

Ingenuity Systems Pathways analysis

Nhttp://www.ingenuity.com/products/pathways_analysis.html

Reverse Engineering/Forward Simulation (REFSTM)

N http://www.gnsbiotech.com/static_content/our-approach.html

NEXTBIO N http://www.nextbio.com/b/home/home.nb

ChemBank Y http://chembank.broad.harvard.edu

Table10.Literatureanalysissoftware

Name Free (y/n) Website

AKS2 N http://www.activemotif.com

Biolab Experiment Assistant

N http://www.biovista.com

MedScan N http://www.ariadnegenomics.com/products/medscan/

Pubmatrix Y http://pubmatrix.grc.nia.nih.gov

PubGene Y http://www.pubgene.org/

EBIMed Y http://www.ebi.ac.uk/Rebholz-srv/ebimed/index.jsp

Whatizit Y http://www.ebi.ac.uk/webservices/whatizit/info.jsf

Protein Corral Y http://www.ebi.ac.uk/Rebholz-srv/pcorral/index.jsp

facultyof1000 Y www.f1000biology.com

ElDorado N http://www.genomatix.de/products/ElDorado/index.html

ChipInfo Y http://biosun1.harvard.edu/complab/chipinfo/

CoPub Mapper Y http://services.nbic.nl/cgi-bin/copub/CoPub.pl




MILANO Y http://milano.md.huji.ac.il/

LitInspector Y http://www.genomatix.de/products/ElDorado/index.html

MedGene and BioGene

Y http://biodesign.asu.edu/labs/labaer/services/medgene-and-biogene

PDQ Wizard Y http://www.pathwaymedicine.ed.ac.uk/GPX

Ingenuity Systems Pathways analysis

N http://www.ingenuity.com/products/pathways_analysis.html

Table10.Continued

Table11.Geneontologyanalysissoftware

Name Free (y/n) Web Site

CLENCH Y http://www.stanford.edu/~nigam/cgi-bin/dokuwiki/doku.php?id=clench

ArrayXPath Y http://www.snubi.org/software/ArrayXPath/

DAVID Y http://david.abcc.ncifcrf.gov/

EASE (DAVID) Y http://david.abcc.ncifcrf.gov/ease/ease.jsp

eGOn Y http://www.genetools.microarray.ntnu.no/common/intro.php

EasyGO Y http://bioinformatics.cau.edu.cn/easygo/

ermineJ Y http://www.bioinformatics.ubc.ca/ermineJ/

FatiGO+ Y http://babelomics.bioinfo.cipf.es/fatigoplus/cgi-bin/fatigoplus.cgi

FIVA Y http://bioinformatics.biol.rug.nl/standalone/fiva/

FuncAssociate Y http://llama.med.harvard.edu/cgi/func1/funcassociate_advanced

FunCluster Y http://corneliu.henegar.info/FunCluster.htm

FunNet Y http://www.funnet.info/

G-SESAME Y http://bioinformatics.clemson.edu/G-SESAME/

GARBAN Y http://garban.tecnun.es/garban2/index.php

GeneCodis Y http://genecodis.dacya.ucm.es/

GeneMerge Y http://www.oeb.harvard.edu/hartl/lab/publications/GeneMerge/Gen-eMerge.html

GFINDer Y http://www.medinfopoli.polimi.it/GFINDer/

GOALIE Y http://bioinformatics.nyu.edu/Projects/GOALIE/

GO::TermFinder Y http://bioinformatics.oxfordjournals.org/cgi/content/abstract/bth456v1

GOArray Y http://www.isima.fr/bioinfo/goarrays/

GOdist Y http://basalganglia.huji.ac.il/links.htm

GOEAST Y http://omicslab.genetics.ac.cn/GOEAST/

GO-Diff Y http://www.fishgenome.org/bioinfo/




GoMiner Y http://discover.nci.nih.gov/gominer/

GOstat Y http://gostat.wehi.edu.au/

GoSurfer Y http://bioinformatics.bioen.uiuc.edu/gosurfer/

GO Term Finder Y http://db.yeastgenome.org/cgi-bin/GO/goTermFinder.pl

GOTM Y http://bioinfo.vanderbilt.edu/gotm/

GOToolBox Y http://crfb.univ-mrs.fr/GOToolBox/index.php

GraphWeb Y http://biit.cs.ut.ee/graphweb/

L2L Y http://depts.washington.edu/l2l/

MAPPFinder Y http://www.genmapp.org/

MatchMiner Y http://discover.nci.nih.gov/matchminer/index.jsp

MetaGP Y http://metagp.ism.ac.jp/

OntoGate (OntoBlast) Y http://fazed.molgen.mpg.de:14195/onto/

Table11.Continued

Table12.Pathwayanalysissoftware


Cura Tools pathcalling N http://portal.curagen.com/curatools_portal/index.htm

Ingenuity Pathway analysis

N http://www.ingenuity.com/

Onto Tools – Pathway Express

Y http://vortex.cs.wayne.edu/Projects.html#Pathway-Express

Pathway Studio (ariadne genomics)

N http://www.ariadnegenomics.com/products/pathway-studio/

Cognia’s Catabolism Database

N http://www.cognia.com

GenMapp (Gene Map Annotator and Pathway Profiler)

Y http://www.genmapp.org/

Biocarta Y http://www.biocarta.com/

Whole pathway scope Y http://www.abcc.ncifcrf.gov/wps/wps_index.php

TransPath Y http://transpath.gbf.de

KEGG (Kyoto Ency-clopedia of Genes and Genomes)

Y http://www.genome.ad.jp/kegg/kegg.html

PathoSign Y http://pathosign.bioinf.med.uni-goettingen.de/

Reactome Y http://www.reactome.org/

iHOP Y http://www.ihop-net.org/UniPub/iHOP/

Pathway Explorer Y https://pathwayexplorer.genome.tugraz.at/




Pathway Processor (Uni-versity of Connecticut)

Y http://web.uconn.edu/townsend/software.html

ArrayXPath Y http://www.snubi.org/software/ArrayXPath/

aMAZE (EBI) Y http://www.amaze.ulb.ac.be/

BioMiner (UMR) Y http://web.mst.edu/~bioinf/biominer/

Cytoscape (plug-ins required)

Y http://www.cytoscape.org/

DBmcmc (BioSS) Y http://www.bioss.ac.uk/~dirk/software/DBmcmc/

Dynamic Signaling Maps N http://www.hippron.com/hippron/

Genetic Network Analyzer (GNA)

N http://www-helix.inrialpes.fr/article122.html

GenePath Y http://www.genepath.org/

GSCope Y http://omicspace.riken.jp/osml/

INCLUSive Y http://tomcatbackup.esat.kuleuven.be/inclusive/

InterViewer3 Y http://interviewer.inha.ac.kr/

KnowledgeEditor Y http://gscope.gsc.riken.go.jp/

PathFinder N http://www.imstarsa.com/productsservices/ondemandplatforms/

Table12.Continued

Table13.ProteininteractiondatabasesandrelatedWebtools


Protein Arrays

ProtoArray® (Invitrogen)

N http://www.invitrogen.com/site/us/en/home/Products-and-Services/Services/Discovery-Research/ProtoArra-Services.html

Databases & Data Collections

ADAN (EMBL) Y http://adan-embl.ibmc.umh.es/

BID(A & M University Texas) Y http://tsailab.org/BID/index.php

BIND (Biomolecular Interaction Network Database at the Samuel Lunenfeld Research Institute, Toronto, Canada

Y

http://www.bind.ca

BioCarta (BioCarta) Y http://www.biocarta.com/genes/index.asp

BioCyc (SRI) Y http://biocyc.org/

BioGRID (Samuel Lunenfeld Research Institute) Y http://www.thebiogrid.org/

BOND (Thomson Corp.) Y http://bond.unleashedinformatics.com/

CSNDB (NIHS)Y http://www.chem.ac.ru/Chemistry/Databases/

CSNDB.en.html

DAPID (National Chiao Tung University) Y http://gemdock.life.nctu.edu.tw/dapid




DIP (UCLA) Y http://dip.doe-mbi.ucla.edu/

DOMINO - DOMain peptide INteractiOns database, describing interactions mediated by protein-interaction domains

Yhttp://mint.bio.uniroma2.it/domino/search/searchWelcome.do

DOQCS (NCBS) Y http://doqcs.ncbs.res.in

Drosophila Protein Interaction Map (PIM) Database (Wayne State University)

Yhttp://proteome.wayne.edu/PIMdb.html

E. Coli Predicted Protein Interactions Database (Uni-versidad Autónoma Cantoblanco)

Yhttp://ecid.bioinfo.cnio.es/

EchoBASE (University of York) Y http://www.ecoli-york.org/

EDGEdb (University of Massachusetts Medical School)

Y http://edgedb.umassmed.edu/IndexAction.do;jsessionid=83C4B5E969161C36F9CFA68A8C0EAF3D

ENCODE Y http://www.genome.gov/10005107

Fly-DPI (National Health Research Institutes)Y http://flydpi.nhri.org.tw/protein/fly/gen-

eral_search/

HAPPI (Indiana University School of Informatics, Purdue University School of Science)

Yhttp://bio.informatics.iupui.edu/HAPPI/

HIV-1 - Human Protein Interaction Database (NCBI)Y http://www.ncbi.nlm.nih.gov/RefSeq/HIVIn-

teractions/index.html

hp-DPI (National Health Research Institutes)Y http://dpi.nhri.org.tw/protein/hp/ORF/index.

php

HPID (Inha University) Y http://wilab.inha.ac.kr/hpid/

HPID (Inha University) Y http://wilab.inha.ac.kr/hpid/

HUGE ppi (Kazusa DNA Research Institute) Y http://www.kazusa.or.jp/huge/ppi/

HUGE: Human Unidentified Gene-Encoded large proteins

Yhttp://www.kazusa.or.jp/huge/

Human Protein Reference Database (Johns Hopkins University & The Institute of Bioinformatics, India)

Yhttp://www.hprd.org/

ICBS (University of California) Y http://contact14.ics.uci.edu/index.html

iHOP(Computational Biology Center, Memorial Sloan-Kettering Cancer Center, USA & Protein Design Group, National Center of Biotechnology, Spain)

Y

http://www.ihop-net.org/UniPub/iHOP/

InCeP (Kazusa DNA Research Institute) Y http://www.kazusa.or.jp/create/index.jsp

Intenz (EBI) Y http://www.ebi.ac.uk/intenz/

INTERPARE (National Genome Information Center, Korea Research Institute of Bioscience and Biotechnol-ogy & BiO Centre)

Y

http://interpare.net/

KDBI (National University of Singapore) Y http://xin.cz3.nus.edu.sg/group/kdbi/kdbi.asp

KEGG BRITE (Kyoto University) Y http://www.genome.ad.jp/brite/brite.html

Table13.Continued




KEGG LIGAND Y http://www.genome.ad.jp/dbget/ligand.html

Kinase Pathway Database (Human Genome Center) Y http://kinasedb.ontology.ims.u-tokyo.ac.jp/

MINT (CBM, Rome) Y http://cbm.bio.uniroma2.it/mint/

molmovdb.org (Yale University) Y http://molmovdb.mbb.yale.edu/

MPact (MIPS)Y http://mips.gsf.de/genre/proj/mpact/index.

html

MPPI (MIPS) Y http://mips.gsf.de/proj/ppi/

NOXclass (Max-Planck-Institut für Informatik) Y http://noxclass.bioinf.mpi-inf.mpg.de/

OPHID (Ontario Cancer Institute & University of Toronto)

Yhttp://ophid.utoronto.ca/ophidv2.201

Pathway Database (Protein Lounge)Y http://www.proteinlounge.com/pathway_

home.asp

PDZBase (Weill Medical College of Cornell Uni-versity)

Yhttp://icb.med.cornell.edu/services/pdz/start

Pfam (Sanger Institute) Y http://www.sanger.ac.uk/Software/Pfam/

PIBASE (University of California)Y http://modbase.compbio.ucsf.edu/pibase/

queries.html

POINT (National Health Research Institutes & National Taiwan University)

Yhttp://phos.bioinformatics.tw/

PPIDB (Iowa State University) Y http://ppidb.cs.iastate.edu/

Predictome (Boston University) Y http://predictome.bu.edu/static/sources.html

PreSPI (Information and Communications University) Y http://prespi.icu.ac.kr/

PRIME Human Genome Center, University of Tokyo) Y http://prime.ontology.ims.u-tokyo.ac.jp:8081/

PRIMOS (BIOMIS, FH Hagenberg) Y http://biomis.fh-hagenberg.at/isp/Primos/

PRISM (Koc University) Y http://gordion.hpc.eng.ku.edu.tr/prism/

PRODISTIN Web Site (LGPD/IBDM, CNRS) Y http://crfb.univ-mrs.fr/webdistin/

Prolinks Database (University of California)Y http://www.doe-mbi.ucla.edu/Services/MT-

Breg/prolinks.html

ProMesh (University of Queensland) (Restricted Access)

Yhttp://localisation.imb.uq.edu.au/

Protein Interaction Database (Protein Lounge)Y http://www.proteinlounge.com/inter_home.

asp

Protein Interaction Maps - PIMs (Hybrigenics) Y http://pimr.hybrigenics.com/

Protein-Protein Interaction Panel using mouse full-length cDNAs (RIKEN, Yokohama Institute)

Yhttp://genome.gsc.riken.go.jp/ppi/

PSIbase (BioSystems Dept., KAIST & BiO centre) Y http://psibase.kobic.re.kr/

PUMA2 (Argonne National Lab) Y http://compbio.mcs.anl.gov/puma2/

Table13.Continued




Roche Applied Science ‘Biochemical Pathways’Y http://www.expasy.org/cgi-bin/search-bio-

chem-index

SCOPPI (TU Dresden) Y http://www.scoppi.org/

SMART (EMBL Heidelberg) Y http://smart.embl-heidelberg.de/

SNAPPI-Predict (University of Dundee)Y http://www.compbio.dundee.ac.uk/SNAPPI/

predict.jsp

SNAPPIView (University of Dundee)Y http://www.compbio.dundee.ac.uk/SNAPPI/

downloads.jsp

SPAD (Kyushu University) Y http://www.grt.kyushu-u.ac.jp/spad/

SPIDer (Beijing Normal University) Y http://cmb.bnu.edu.cn/SPIDer/index.html

SPIN-PP Server (Columbia University)Y http://wiki.c2b2.columbia.edu/honiglab_pub-

lic/index.php/Software:SPIN-PP

The Interactive Fly (Society for Developmental Biology)

Y http://www.sdbonline.org/fly/aimain/1aahome.htm

TRANSCompel (BIOBASE)Y http://www.gene-regulation.com/pub/data-

bases.html#transcompel

TRANSPATH (BIOBASE)Y http://www.biobase-international.com/pages/

index.php?id=transpathdatabases

UniHI (Charite - Medical Devision, Humboldt-University zu Berlin)

Y http://theoderich.fb3.mdc-berlin.de:8080/unihi/home

Wnt Signaling Pathway (Stanford University Medi-cal Center)

Y http://www.stanford.edu/%7Ernusse/wntwin-dow.html

Yeast Interacting Proteins Database (Kanazawa University)

Yhttp://itolab.cb.k.u-tokyo.ac.jp/Y2H/

Yeast Interactome (Boston University) Y http://structure.bu.edu/rakesh/myindex.html

Yeast Pathways in the Comprehensive Yeast Genome Database (MIPS)

Y http://mips.gsf.de/proj/yeast/CYGD/db/path-way_index.html

Yeast Protein Linkage Map Data (University of Washington)

Y http://depts.washington.edu/sfields/yp_inter-actions/index.html

YPD™ (BIOBASE)N http://www.biobase-international.com/pages/

index.php?id=ypd

3D structures

123+ Y http://123d.ncifcrf.gov/

3D-JIGSAW Y http://www.bmm.icnet.uk/servers/3djigsaw/

3D-PSSMY http://darwin.nmsu.edu/~molb470/fall2003/

Projects/mara/3dPSSM.html

bioinbguY http://www.cs.ualberta.ca/~yaser/web/bio-

inbgu.html

CATHY http://www.biochem.ucl.ac.uk/bsm/cath_new/

index.html

Table13.Continued




Table13.Continued

CPHmodels Y http://www.cbs.dtu.dk/services/CPHmodels/

FSSP (Dali) Y http://www.ebi.ac.uk/dali/

ModellerY http://www.chem.ac.ru/Chemistry/Soft/

MODELLER.en.html

OCA Y http://oca.ebi.ac.uk/

PDB Y http://www.rcsb.org/

PDBsum Y http://www.biochem.ucl.ac.uk/bsm/pdbsum/

PUDGEY http://wiki.c2b2.columbia.edu/honiglab_pub-

lic/index.php/Software:PUDGE

SAM-T99Y http://www.cse.ucsc.edu/research/compbio/

HMM-apps/T99-model-library-search.html

SCOP Y http://scop.mrc-lmb.cam.ac.uk/scop/

SCOWLP (TU Dresden) Y http://www.scowlp.org/

SDC1 Y http://cl.sdsc.edu/hm.html

STRING (EMBL) Y http://string.embl.de

SWISS-MODELY http://www.expasy.ch/swissmod/SWISS-

MODEL.html

Threader2 Y http://globin.bio.warwick.ac.uk/

Threadlize Y http://www.cnb.uam.es/~pazos/threadlize/

TOPITS (PHDthreader) Y http://www.predictprotein.org/

YETI (University Edinburgh) Y http://www.yetibio.com/

ID Predictions

AgadirY http://www.embl-heidelberg.de/Services/ser-

rano/agadir/agadir-start.html

JPredY http://www.compbio.dundee.ac.uk/~www-

jpred/

NPS@ Y http://npsa-pbil.ibcp.fr/

PHDsec Y http://www.predictprotein.org/

Predator

Y http://www-db.embl-heidelberg.de/jss/servlet/de.embl.bk.wwwTools.GroupLeftEMBL/argos/predator/predator_info.html

PROF Y http://www.aber.ac.uk/~phiwww/prof/

PSI-pred Y http://bioinf.cs.ucl.ac.uk/psipred/

Solvent Accessibility

HMMTOP Y http://www.enzim.hu/hmmtop/

PHDhtm/PHDtopology Y http://cubic.bioc.columbia.edu/predictprotein/

PHDsec Y http://cubic.bioc.columbia.edu/predictprotein/




TMpredY http://www.ch.embnet.org/software/TM-

PRED_form.html

TopPred 2 Y http://www.sbc.su.se/~erikw/toppred2/

Transmembrane Helix Prediction

Coiled-coil Prediction.

COILS.Y http://www.ch.embnet.org/software/COILS_

form.html

MulticoilY http://groups.csail.mit.edu/cb/multicoil/cgi-

bin/multicoil.cgi

Paircoil2 Y http://groups.csail.mit.edu/cb/paircoil2/

Domains and Motifs

FUGUE Y http://www-cryst.bioc.cam.ac.uk/~fugue/

Pfam Y http://www.sanger.ac.uk/Pfam/

ProDom Y http://prodom.prabi.fr/

Prosite. Y http://www.expasy.ch/prosite/

4D Predictions

AUTODOCKY http://www.scripps.edu/pub/olson-web/doc/

autodock/

DOCK Y http://dock.compbio.ucsf.edu/

FlexX Y http://www.biosolveit.de/FlexX/

FTdock Y http://www.bmm.icnet.uk/docking/

GRAMMY http://vakser.bioinformatics.ku.edu/resources/

gramm/grammx

Visualization Programs

Chime Y http://www.mdlchime.com/chime/

JMOL Y http://firstglance.jmol.org/

Protein ExplorerY http://www.umass.edu/microbio/chime/

pe_beta/pe/protexpl/frntdoor.htm

RasMol Y http://www.umass.edu/microbio/rasmol/

Swiss-PdbViewer Y http://www.expasy.ch/spdbv/mainpage.htm

WhatIF N http://swift.cmbi.ru.nl/whatif/

Evaluation of Prediction Methods

CAFASP experiments Y http://www.cs.bgu.ac.il/~dfischer/CAFASP2/

CASP meetings Y http://predictioncenter.gc.ucdavis.edu/

EVA Y http://cubic.bioc.columbia.edu/eva/

LiveBench Y http://BioInfo.PL/LiveBench/

Table13.Continued



Table14.Meta-analysissoftware

Name Free (Y/N) Website

WEB TOOLS

MAMA Y

R GeneMeta Y http://www.bioconductor.org

R metaArray Y http://www.bioconductor.org

R- RankProd Y http://www.bioconductor.org

CLOE Y

yMGV Y http://www.transcriptome.ens.fr/ymgv/

yTAFNET Y http://www.transcriptome.ens.fr/ytafnet/

MiCoViTo Y http://www.transcriptome.ens.fr/micovito/

AILUN Y http://ailun.stanford.edu

microarray DATABASE BASED dataset comparisons

Lola Y http://lola.gwu.edu/

OncoMine Y http://www.oncomine.org/

M3D Y http://m3d.bu.edu/cgi-bin/web/array/index.pl?section=home

ITTACA Y http://bioinfo-out.curie.fr/ittaca/

L2L MDB Y http://depts.washington.edu/l2l/database.html

Genevestigator N https://www.genevestigator.ethz.ch/gv/index.jsp

ArrayQuest Yhttp://proteogenomics.musc.edu/ma/arrayQuest.php?page=home&act=manage

microarray DATABASE BASED gene expression profiling (you can submit data and compare online)

ArrayExpress Y http://www.ebi.ac.uk/microarray-as/ae/

GEO Y http://www.ncbi.nlm.nih.gov/geo/

Gene Aging Nexus Y http://gan.usc.edu/public/index.jsp

caIntegrator based on caArray Y http://caintegrator-info.nci.nih.gov/caintegrator/about

OncoMine Y http://www.oncomine.org/

RefExA Y http://157.82.78.238/refexa/main_search.jsp

ITTACA Y http://bioinfo-out.curie.fr/ittaca/

T1DBase Y http://www.t1dbase.org/page/Welcome/display

Integrative Datamining and meta-analysis software

GSEA Y http://www.broad.mit.edu/gsea/

GeneTrail Y http://genetrail.bioinf.uni-sb.de/

caIntegrator Y http://caintegrator-info.nci.nih.gov/csp

Whole pathway scope Y http://www.abcc.ncifcrf.gov/wps/wps_index.php

Documents

Data Mining MetaAnalysis