Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop....

Bioinformatics. Analysis of proteomic data.

Dr Richard J Edwards 28 August 2009; CALMARO workshop.

©Gary Larson

(In not much detail)

Bioinformatic analysis of proteomic data

Improving sequence identifications Dealing with redundancy Annotating protein hits

Adding value to protein lists Accession number mapping & data integration Gene Ontology analysis Protein interaction networks

Example: identifying E. huxleyi proteins with multi-species and EST sequence databases

Open Discussion

Improving identifications:dealing with redundancy.

Identifying redundancy

Choice of database affects redundancy identification SwissProt/IPI indicate splice variants EnsEMBL peptides map back onto non-redundant gene IDs Poor annotation hard to differentiate variant/error/family

Example: alpha tubulin protein family

Identifying redundancy Sometimes, identification cannot be conclusive

Basic peptidegrouping scenarios

Identifying redundancy Sometimes, identification cannot be conclusive

Different scenarios canpresent different problems

How important is it to study? Might need to identify

protein(s) through furtherexperiments

A simplified example of a protein summary list

Identifying redundancy

Final protein list: Conclusive IDs Protein groups Inconclusive IDs

Are inconclusive/ group hits redundant?

Same protein from different species

Splice variants

Does it matter? Inflated

numbers Biased analyses Comparisons

between experiments

Unique to protein

Unique to group

No unique

Homology groupings

Can use BLAST to identify groups of related proteins Help identify possible redundancies Need to look at peptides

Particularly useful for “off-species” identifications Tendency for many hits

to same protein in different species

Clustering proteins by %identity

http://www.southampton.ac.uk/~re1u06/software/gablam/

Improving identifications:annotating protein hits.

Protein annotation

Database

Protein ListProtein List

Poorly (un)annotated proteins Real proteins or database noise? Reliable annotation?

Most of our protein data comes from DNA sequences

PDB: 53,660 structures = 3D

SwissProt: 392,667 = Curated

TrEMBL: >6 million &UniParc: >16 million

= Most inferred from DNA Most annotation inferred through

sequence analysis

Protein data from translated DNA

Lots of errors! Sequence errors Annotation errors

AnnotationTranslation

Where does the data come from?

Protein annotation

Use standard sequence analysis tools Manual guidance/care = better than automated databases!

Homology searching BLAST vs. UniProtKB Protein domain searches, e.g. PFam

Conservation analysis Multiple sequence alignment with homologues

Are functionally important sites conserved?

Phylogenetic analysis Evolutionary relationships can help distinguish function

Assignment to protein subfamily etc. Useful where BLAST hits have competing annotation

http://www.southampton.ac.uk/~re1u06/software/haqesac/

Beyond proteomics:adding value to protein

lists.

What Bioinformatics cannot (usually) do

Replace hypothesis driven research

Directed analysis is always better than “fishing” (e.g. GO)

Provide a definitive answer

Ranking/prioritising better

Follow-up analyses

Many possibilities What was the aim of the study? What resources are available for your organism?

Imitation is the sincerest form of flattery Find a good study and copy the best bits

Easier to describe Easier to justify to reviewers

Hypothesis-driven analysis is best Many tools facilitate hypothesis generation (data

exploration) Be aware of risk of testing a hypothesis on data used to

generate it Be aware of multiple testing issues

Follow-up analyses

EBI and NCBI both provide many useful tools EBI run many good courses at Hinxton

http://www.ebi.ac.uk/Tools/

Seek collaborations

Time / Energy

Bioinformatics

Find a tame bioinformatician to help if needed Good collaboration = Trade

Papers / Grants / improving the bioinformatics E.g. adding your organism/database

to an online resource

©Gary Larson

Accession number mapping Other databases may contain better/specific annotation

UniProtKB, OMIM etc.

Results from searches against older databases may need updating

EBI tool: PICR [Protein Identifier Cross-Reference Service]

BioMart: Query & Xref tool for manydatabases www.biomart.org

http://www.ebi.ac.uk/Tools/picr/

BioMart

Gene Ontology analysis

Gene Ontology [GO] = gene annotation project Controlled vocabulary allows standardisation & comparisons

http://www.geneontology.org/

Gene Ontology analysis

Many Gene Ontology exploration tools AmiGO, GOA, FatiGO, DAVID etc. Depend on source databases

May need to map IDs using PICR first

GO enrichment Assess frequency of GO terms in your list against

expectation Often a big multiple testing issue Be aware of biases – how is expectation derived

E.g. Abundant, conserved proteins more likely to be annotated & more likely to be identified in a proteomics experiment

Best if hypothesis-driven or used for data confirmation E.g. Enrichment of certain subcellular fraction

Protein interaction networks Can be useful for identifying protein complexes in

data E.g. STRING [http://string-db.org/]

Example: identifying E. huxleyi proteins with multi-species and EST

sequence databases

Combined search strategy

Genome unavailable (for download & searching)

dbESTThalassiosirapseudonana

Taxa-limitedDatabase

90,000 E huxESTs

Protein ListProtein List

:Rhodophyta::Stramenopiles

::Haptophyceae:

:Alveolata::Cryptophyta:

EST dataset

BLASTdatabase

MS/MS dataMASCOT

MASCOT hitsTranslated to

RFs and MASCOTpeptides filtered

FIESTA consensus &

annotation

Final proteinidentifications

BUDAPESTCORE

Poor qualityRFs removed

OPTIONAL(MANUAL or AUTOMATED)

90,000 E huxESTs

173 ESTs728

189 RFs

117 Cons321

34 Cons34

83 Cons287

173 EST hits (728 peptides)

83 Consensus sequences 40 Clusters by homology

(variants/isoforms)

287 Peptides 239 Unique to one

consensus 48 Shared within one

cluster

http://www.southampton.ac.uk/~re1u06/software/budapest/

Annotating EST ConsensusSequences Homology searching & phylogenetics

SequenceDatabase

Consensus

UniProt

Alignment

Protein family identification

Redundancy/Variants

Combined search strategy

Genome unavailable (for download & searching)

dbESTThalassiosirapseudonana

90,000 E huxESTs

173 Hits83 Consensus40+ Proteins

96 Hits26+ Proteins

:Rhodophyta::Stramenopiles

::Haptophyceae:

:Alveolata::Cryptophyta:

64+ Proteins(12 Common)

Conclusions.

Summary Extra analysis of raw protein lists adds value

False positives vs. Real proteins Annotation of uncharacterised hits

Numerous tools for mining protein lists Data exploration and/or hypothesis testing Community/Organism dependent Worth contacting bioinformaticians for further development

Development of customised bioinformatics solutions can greatly increase power of study Increased availability of high throughput technologies

Poor annotation & high error rates Increased need for bioinformatics post-processing to improve

quality

Open DiscussionR.Edwards@Southampton.ac.uk

Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop....

Documents

larson pe -larson fr - environdec.com

PROTEOMIC COMMAND LINE SOLUTION

CUADERNO TECNICO larson 2018 (ENGLISH) · 1. Product description 1.1 larson® 1.2 larson wood® 1.3 larson metals® 1.4 larson® Illusions 1.1 larson® larson fr® aluminium composite

Genomic and Proteomic Characterization

Protein Chips for proteomic Study

Proteomic analysis of Marinobacter hydrocarbonoclasticus

Proteomic Report

Cerebral Ischemia Induced Proteomic Alterations: Consequences …cdn.intechopen.com/pdfs/32444/InTech-Cerebral_ischemia_induced... · Cerebral Ischemia Induced Proteomic Alterations:

Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University

DIGE-based Quantitative Proteomic

Proteomic Interrogation of Human Chromatin

ProteoModlR for functional proteomic analysis

GUI Meets VUI: Some Possible Guidelines James A. Larson VP, Larson Technical Services jim42@larson-tech.com 4/21/20151© 2015 Larson Technical Services

Comparative proteomic profiling identifies potential

Comparative Proteomic Analysis of Mycobacterium

Proteomic Analysis of AMPA Receptor Complexes 1 Proteomic Analysis of AMPA Receptor

Week 8-Bioinformatics/Proteomics - Winonacourse1.winona.edu/ssegal/documents/Week8-proteomic… · Web viewRecent advances in targeting specific prostaglandin-synthesizing enzymes

Proteomic profiling of precipitated Clostridioides

By Angela Brooks and David Chapman Mentor: Dr. Garry Larson Molecular Medicine, City Of Hope Southern California Bioinformatics Institute 2004

LARSON PARK / EL PARQUE LARSON - Sonoma Countyparks.sonomacounty.ca.gov/uploadedFiles/Parks/About_Us/Project_Details/... · 0 75 150 300 450 Feet ´ LARSON PARK / EL PARQUE LARSON