40
Managing Data Modeling GO Workshop 3-6 August 2010

Managing Data Modeling

  • Upload
    cisco

  • View
    58

  • Download
    1

Embed Size (px)

DESCRIPTION

Managing Data Modeling. GO Workshop 3-6 August 2010. Managing Data. Functional modeling strategy Converting between Database IDs Ensembl Biomart UniProt DAVID AgBase ArrayIDer Arrays examples to work on. Types of data sets and modeling. - PowerPoint PPT Presentation

Citation preview

Page 1: Managing Data Modeling

Managing Data Modeling

GO Workshop3-6 August 2010

Page 2: Managing Data Modeling

Managing Data Functional modeling strategy Converting between Database IDs

Ensembl Biomart UniProt DAVID AgBase ArrayIDer

Arrays examples to work on

Page 3: Managing Data Modeling

Types of data sets and modeling Commercial array data – more likely to have ID

mapping to support functional modeling. Custom/USDA array data – may need to do your

own ID mapping: see examples on workshop page.

Proteomics data RNA-Seq data sets – computational pipelines to

assign GO (GOanna is limited; contact AgBase). Real-time data or quantitative proteomics data –

hypothesis testing.

Page 4: Managing Data Modeling

Protein/Gene identifiers

GORetriever

GO annotationsGenes/Proteins with no GO annotations

GOanna

Pathways and network analysis

GO Enrichment analysis

ArrayIDer

Microarray Ids

GOSlimViewer

Yellow boxes represent AgBase toolsGreen/Purple boxes are non-AgBase resources

Ingenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVID

Ingenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVIDEasyGO/AgriGOOnto-ExpressOnto-Express-to-go (OE2GO)

Overview of Functional Modeling Strategy

summarizes GO function

GOModeler hypothesis testing

Page 5: Managing Data Modeling

Functional Modeling Considerations

Should I add my own GO? use GOSlimViewer to see how much GO is available for your

species use GORetriever to see how much GO is available for your

dataset Should I do GO analysis and pathway analysis and

network analysis? different functional modeling methods show different

aspects about your data (complementary) is this type of data available for your species (or a close

ortholog)? What tools should I use?

which tools have data for your species of interest? what type of accessions are accepted? availability (commercial and freely available)

Page 6: Managing Data Modeling

structurally and functionally re-annotated a microarray

quantified the impact of this re-annotation based on GO annotations & pathways represented on the array

tested using a previously published experiment that used this microarray

re-annotation allows more comprehensive GO based modeling and improves pathway coverage

re-annotation resulted in a different model from previously published research findings

Page 7: Managing Data Modeling
Page 8: Managing Data Modeling

Converting accessions Depending on your data set & the tools you use,

you are likely to need to convert between database accessions to do your functional modeling.

UniProt database – ID mapping tab Ensembl BioMart Online analysis tools:

DAVID g:profiler GORetriever

ArrayIDer – converts EST accessions for some species (by request)

Page 9: Managing Data Modeling

ID Mapping Commercial arrays

Custom arrays

EST arrays

Proteomics

RNA-Seq data

Commercial ID mapping eg. NetAffy

Ensembl BioMart Online tools

(g:convert, DAVID) ArrayIDer

UniProt ID Conversion

Page 10: Managing Data Modeling

Working on your own data: New to GO

GO browser tutorials to familiarize yourself with the GO

learn what GO is available for your species Your own data set

functional grouping to get overview (eg. GOSlimViewer

GO enrichment analysis (tools available for your species)

Pathway analysis Example data sets available – use as

worked examples

Page 11: Managing Data Modeling

Working on your own data: New to GO

GO browser tutorials to familiarize yourself with the GO

learn what GO is available for your species Your own data set

functional grouping to get overview (eg. GOSlimViewer

GO enrichment analysis (tools available for your species)

Pathway analysis Example data sets available – use as

worked examples

Most of these tools (including Pathways Analysis) accept only certain database accessions

need to convert accessions between databases

Page 12: Managing Data Modeling

Example: ID conversion Ensembl Plant Biomart tool currently limited species, but Ensembl is adding

more plants BioMart allows sophisticated querying of genomic

data DAVID ID conversion tool

allows users to convert IDs and do GO enrichment analysis

UniProt ID conversion highly annotated data

ArrayIDer links ESTs to public database IDs

Page 13: Managing Data Modeling

http://plants.ensembl.org/index.html

NOTE: Ensembl is adding new plant species…

Page 14: Managing Data Modeling

1. Ensembl BioMart

Page 15: Managing Data Modeling
Page 16: Managing Data Modeling

Clicking on these headings allows you to set up searches.

Selecting FILTERS gives you different filtering options:

Page 17: Managing Data Modeling

Expand GENE and check “ID list limit” to select a defined list of accessions.

Enter your list of accessions.

Page 18: Managing Data Modeling

Selecting ATTRIBUTES allows you to choose what information is reported:

Check accessions from external databases (UniProt & RefSeq).

Page 19: Managing Data Modeling

Clicking on RESULTS will show you the output information. Output can be displayed online and/or downloaded (text,

Excel). Selecting FILTERS or ATTRIBUTES will allow you to go back and

make changes. Limited to species represented in Ensembl

Page 20: Managing Data Modeling

2. Online analysis toolsDatabase for Annotation, Visualization and Integrated Discovery (DAVID)http://david.abcc.ncifcrf.gov/conversion.jsp

This tool works for a wide range of species.

Page 21: Managing Data Modeling

Paste in your accession list

(You can also upload a file of accessions.)

Page 22: Managing Data Modeling

Select accession type.

NOTE: If you choose “Note Sure” the tool will try to decide what type of accession you have.

Page 23: Managing Data Modeling

Select gene list.Submit list.

Page 24: Managing Data Modeling

Select the type of accession you want to convert TO.

Page 25: Managing Data Modeling

Any ambiguous IDs are listed for you to decide.

Page 26: Managing Data Modeling

3. UniProt ID Mapping

Page 27: Managing Data Modeling

Paste accession list (>1000 may cause errors).

COMMENT: Note the difference between UniProt Accessions and UniProt IDs.

UniProt accessions are a short string a letters and numerals 6-8 characters long. UniProt IDs have a suffix related to the species name.

Eg: Cassava HydroxynitrilaseAccession: P52705ID: HNL_MANES

Page 28: Managing Data Modeling

Select the accession type you have:

and the accession type you want to convert to:

Click on MAP

Page 29: Managing Data Modeling

The mapping link will display a tab separated file that can be displayed in Excel:

Page 30: Managing Data Modeling

Contact AgBase to request additional species.

4. AgBase: ArrayIDer

Maps ESTs to gene/protein accessions.

Page 31: Managing Data Modeling

Upload a list of dbEST accessions or EST names.

Page 32: Managing Data Modeling

An email will be sent with a link to the results. Results are formatted as an Excel file.

Page 33: Managing Data Modeling

For additional help with database accessions please contact AgBase.

Page 34: Managing Data Modeling

Working on your own data:NOTE: Always keep note of what tool you used to

do the accession ID mapping/conversion and its version/update/date.

Keep a copy of your original IDs and what they mapped to so that you can refer back to this during your modeling.

Page 35: Managing Data Modeling

Tutorial 1: ID conversionThe AgriGO GO enrichment analysis tool

accepts the following inputs for rice: GenBank ID: AAP50233.1 DDBJ ID: BAB11514.1 EMBL ID: CAA18188.1 UniProt ID: Q9LYA9 RefSeq Peptide ID: NP_564434

We will convert a list of Rice Affy IDs to these IDs for use in the AgriGO tool.

Page 36: Managing Data Modeling

Arrays: ID Mapping “annotation” file that shows which database

accessions the probes were based on array annotation files may include multiple

database IDs Commercial arrays – may be updated

regularly Custom/Research arrays – not updated as

often Always check when the last ID mapping was

updated, as this data changes continually

Page 37: Managing Data Modeling
Page 38: Managing Data Modeling
Page 39: Managing Data Modeling

Array annotation available:FHCRC chicken 13K GPL2863Agilent-015068 Chicken Gene Expression Microarray 4x44k GPL8764Avian Innate Immunity Microarray (AIIM) GPL1461Affymetrix Chicken Genome Array GPL3213*UIUC Bos taurus 13.2K 70-mer oligoarray GPL2853Affymetrix Bovine Genome Array GPL2112Agilent-015354 Bovine Oligo Microarray (4x44K) Equine Whole Genome Oligonucleotide (EWGO) array

Array annotation in progress:ARK-Genomics G. gallus 20K v1.0 GPL5480FHCRC Chicken 13K v2.0 GPL1836Chicken cDNA DDMET 1700 array version 1.0 GPL3265

Page 40: Managing Data Modeling

Tutorial 1: ID conversionWork through tutorial 1 on the workshop

website.

Alternatively – work on your own data set during this time, using the tutorial as a guide.