Upload
cisco
View
58
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Managing Data Modeling. GO Workshop 3-6 August 2010. Managing Data. Functional modeling strategy Converting between Database IDs Ensembl Biomart UniProt DAVID AgBase ArrayIDer Arrays examples to work on. Types of data sets and modeling. - PowerPoint PPT Presentation
Citation preview
Managing Data Modeling
GO Workshop3-6 August 2010
Managing Data Functional modeling strategy Converting between Database IDs
Ensembl Biomart UniProt DAVID AgBase ArrayIDer
Arrays examples to work on
Types of data sets and modeling Commercial array data – more likely to have ID
mapping to support functional modeling. Custom/USDA array data – may need to do your
own ID mapping: see examples on workshop page.
Proteomics data RNA-Seq data sets – computational pipelines to
assign GO (GOanna is limited; contact AgBase). Real-time data or quantitative proteomics data –
hypothesis testing.
Protein/Gene identifiers
GORetriever
GO annotationsGenes/Proteins with no GO annotations
GOanna
Pathways and network analysis
GO Enrichment analysis
ArrayIDer
Microarray Ids
GOSlimViewer
Yellow boxes represent AgBase toolsGreen/Purple boxes are non-AgBase resources
Ingenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVID
Ingenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVIDEasyGO/AgriGOOnto-ExpressOnto-Express-to-go (OE2GO)
Overview of Functional Modeling Strategy
summarizes GO function
GOModeler hypothesis testing
Functional Modeling Considerations
Should I add my own GO? use GOSlimViewer to see how much GO is available for your
species use GORetriever to see how much GO is available for your
dataset Should I do GO analysis and pathway analysis and
network analysis? different functional modeling methods show different
aspects about your data (complementary) is this type of data available for your species (or a close
ortholog)? What tools should I use?
which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available)
structurally and functionally re-annotated a microarray
quantified the impact of this re-annotation based on GO annotations & pathways represented on the array
tested using a previously published experiment that used this microarray
re-annotation allows more comprehensive GO based modeling and improves pathway coverage
re-annotation resulted in a different model from previously published research findings
Converting accessions Depending on your data set & the tools you use,
you are likely to need to convert between database accessions to do your functional modeling.
UniProt database – ID mapping tab Ensembl BioMart Online analysis tools:
DAVID g:profiler GORetriever
ArrayIDer – converts EST accessions for some species (by request)
ID Mapping Commercial arrays
Custom arrays
EST arrays
Proteomics
RNA-Seq data
Commercial ID mapping eg. NetAffy
Ensembl BioMart Online tools
(g:convert, DAVID) ArrayIDer
UniProt ID Conversion
Working on your own data: New to GO
GO browser tutorials to familiarize yourself with the GO
learn what GO is available for your species Your own data set
functional grouping to get overview (eg. GOSlimViewer
GO enrichment analysis (tools available for your species)
Pathway analysis Example data sets available – use as
worked examples
Working on your own data: New to GO
GO browser tutorials to familiarize yourself with the GO
learn what GO is available for your species Your own data set
functional grouping to get overview (eg. GOSlimViewer
GO enrichment analysis (tools available for your species)
Pathway analysis Example data sets available – use as
worked examples
Most of these tools (including Pathways Analysis) accept only certain database accessions
need to convert accessions between databases
Example: ID conversion Ensembl Plant Biomart tool currently limited species, but Ensembl is adding
more plants BioMart allows sophisticated querying of genomic
data DAVID ID conversion tool
allows users to convert IDs and do GO enrichment analysis
UniProt ID conversion highly annotated data
ArrayIDer links ESTs to public database IDs
http://plants.ensembl.org/index.html
NOTE: Ensembl is adding new plant species…
1. Ensembl BioMart
Clicking on these headings allows you to set up searches.
Selecting FILTERS gives you different filtering options:
Expand GENE and check “ID list limit” to select a defined list of accessions.
Enter your list of accessions.
Selecting ATTRIBUTES allows you to choose what information is reported:
Check accessions from external databases (UniProt & RefSeq).
Clicking on RESULTS will show you the output information. Output can be displayed online and/or downloaded (text,
Excel). Selecting FILTERS or ATTRIBUTES will allow you to go back and
make changes. Limited to species represented in Ensembl
2. Online analysis toolsDatabase for Annotation, Visualization and Integrated Discovery (DAVID)http://david.abcc.ncifcrf.gov/conversion.jsp
This tool works for a wide range of species.
Paste in your accession list
(You can also upload a file of accessions.)
Select accession type.
NOTE: If you choose “Note Sure” the tool will try to decide what type of accession you have.
Select gene list.Submit list.
Select the type of accession you want to convert TO.
Any ambiguous IDs are listed for you to decide.
3. UniProt ID Mapping
Paste accession list (>1000 may cause errors).
COMMENT: Note the difference between UniProt Accessions and UniProt IDs.
UniProt accessions are a short string a letters and numerals 6-8 characters long. UniProt IDs have a suffix related to the species name.
Eg: Cassava HydroxynitrilaseAccession: P52705ID: HNL_MANES
Select the accession type you have:
and the accession type you want to convert to:
Click on MAP
The mapping link will display a tab separated file that can be displayed in Excel:
Contact AgBase to request additional species.
4. AgBase: ArrayIDer
Maps ESTs to gene/protein accessions.
Upload a list of dbEST accessions or EST names.
An email will be sent with a link to the results. Results are formatted as an Excel file.
For additional help with database accessions please contact AgBase.
Working on your own data:NOTE: Always keep note of what tool you used to
do the accession ID mapping/conversion and its version/update/date.
Keep a copy of your original IDs and what they mapped to so that you can refer back to this during your modeling.
Tutorial 1: ID conversionThe AgriGO GO enrichment analysis tool
accepts the following inputs for rice: GenBank ID: AAP50233.1 DDBJ ID: BAB11514.1 EMBL ID: CAA18188.1 UniProt ID: Q9LYA9 RefSeq Peptide ID: NP_564434
We will convert a list of Rice Affy IDs to these IDs for use in the AgriGO tool.
Arrays: ID Mapping “annotation” file that shows which database
accessions the probes were based on array annotation files may include multiple
database IDs Commercial arrays – may be updated
regularly Custom/Research arrays – not updated as
often Always check when the last ID mapping was
updated, as this data changes continually
Array annotation available:FHCRC chicken 13K GPL2863Agilent-015068 Chicken Gene Expression Microarray 4x44k GPL8764Avian Innate Immunity Microarray (AIIM) GPL1461Affymetrix Chicken Genome Array GPL3213*UIUC Bos taurus 13.2K 70-mer oligoarray GPL2853Affymetrix Bovine Genome Array GPL2112Agilent-015354 Bovine Oligo Microarray (4x44K) Equine Whole Genome Oligonucleotide (EWGO) array
Array annotation in progress:ARK-Genomics G. gallus 20K v1.0 GPL5480FHCRC Chicken 13K v2.0 GPL1836Chicken cDNA DDMET 1700 array version 1.0 GPL3265
Tutorial 1: ID conversionWork through tutorial 1 on the workshop
website.
Alternatively – work on your own data set during this time, using the tutorial as a guide.