Bioinformatics for Genomic and Proteomic data analysis

• Sequence Analysis

-- Predicting Function, domains etc.

-- Predicting phyico-chemical properties of protein (ProtParam).

-- Predicting signal peptides and transmembrane proteins (SignalP).

-- finding homology between sequences, identifying repeats etc (DOTPLOT).

-- Major databases and retrieval techniques.

• Structure analysis

-- Gene Prediction

-- Phylogenetic analysis

-- Alignment techniques (BLAST, PSI-BLAST)

-- Analysis of Protein structure and conformation (Rasmol, SwissPDBViewer, VMD etc).

-- Protein structure predictions- Homology modeling (SwissModel, Modeller).

• Some practical applications

-- Sequence analysis

-- Structure analysis

Major Bioinformatics databases, Search engines and data

formats.

By: Sachin Pundhir Bioinformatics sub-centre DAVV, Indore

Database

• Collection of records and files

• Organized for a particular purpose

• Tables• Tuples (records)

– Attributes» Values

BIO520 Student Database

Name ID Grade

Amy 123 A

Joe 456 B

Sue 789 C

Attribute.

Database Operations

• Tables– Create, delete

• Tuples (Records)– Read,write, delete

• Search, sort, modify, print…

Name ID Grade

Amy 123 A

Joe 456 B

Sue 789 C

International Nucleotide Sequence Database Collaboration (INSDC)

• Consists of

DDBJ (Japan)

GenBank (USA)

EMBL Nucleotide Sequence Database.

• The three databases exchange new and updated data on a daily basis to achieve optimal synchronisation.

Bioinformatics databases

• Nucleotide sequence database:

– Genbank: Nucleotide sequence database. Highly redundant.

– DDBJ: DNA Data Bank of Japan.

– EMBL: nucleotide sequence database.

– Refseq: integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein

products, for major research organisms.

Primary databases

• Protein sequence database:

• Genpept: Protein sequence database.

• UniProtKB/Swiss-Prot: curated protein sequence database, minimal level of redundancy and high

level of integration with other databases.

• UniProtKB/TrEMBL: computer-annotated supplement of Swiss-Prot that contains all the

translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot.

•Refseq: Well curated, non-redundant database.

• Structure Database

•PDB: Protein Data Bank

•MMDB: Molecular Modeling Database

Secondary database

GenBank Record

Header

information that apply to the whole record

Features

annotations on the record

Sequence

GeneBank Record

modification date

Header

GenBank Record

Locus Name

Sequence Length

Molecule Type

GenBank Division

Modification DateAccession Number

Version Number

GeneBank Record

Link to Seq

FEATURE

GenBank RecordSequence

Using Entrez

An integrated database

search and retrieval system

WWWAccess

Entrez&BLAST

Genomes

Taxonomy

Entrez: Database Integration

PubMed abstracts

Nucleotide sequences

Protein sequences

3-D Structure

3 -D Structure

Word weight

BLASTBLAST

Phylogeny

Database Searching with Entrez

Using limits and field restriction to find human MutL homologLinking and neighboring with MutL

Global Entrez Search

Document Summaries:MutL[All Fields]

Entrez Nucleotides: Limits & Preview/Index

Entrez Nucleotides: LimitsAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitleUidVolume

Field Restriction

Exclude bulk sequences

Entrez Nucleotides: Limits

Title == Definition

Exclude Bulk Sequences

Document Summaries: Limits

Adding Terms: Preview/IndexAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitle UidVolume

Human MutL Search Results

Human MutL RefSeq

GenBank Records

NM_000249: Links

Literature Links

PubMed

NM_000249: PubMed

Books Link

OMIM: Human Disease Genes

Conserved Domain

Sequence Links

Nucleotide Protein

NM_000249: Related Sequences

simila

Original GenBank mRNAs

Original GenBank genomic

Genome Project BAC

Taxonomy Link

The Tax Browser

NCBI’s Taxonomy

Taxonomy Link

NCBI Protein Databases

• GenPept GenBank, EMBL, DDBJ CDS translations

• RefSeq mRNA based (NP_) and genome based (XP_)

• Swiss-Prot curated high quality protein reviews

• PIR protein information resource Georgetown University

• PRF protein resource foundation

• PDB Protein Databank sequences from structures

Protein Link

BLAST Link

Conserved Domains

Related Proteins: Redundancy

Sequence from MutL structure

Related Proteins: Links

BLink: non-redundant relatives

Arabidopsis homolog

Conserved Domain

MLH1 Domain Structure: CDD

ATPase Domain Mismatch Repair Domain

MLH1: ATPase Domain

ATPase structural alignment

ATP Binding site helix

Genome Resources

NM_000249: Genome Links

Higher Genome Resources

MLH1: UniGene Cluster

ESTs in UniGene

The New Homologene

early globin gene

A-chain gene B-chain gene

frog A chick A mouse A mouse B chick B frog B

paralogsorthologs orthologs

gene duplication

• No longer UniGene based• Protein similarities first• Guided by taxonomic tree• Includes orthologs and paralogs

The New Homologene

Entrez Genes: integrated gene-based access

LocusLinkComplete Genomes

•eukaryotic•microbial•organelle

Genes MLH1: Central Resource

QUESTIONS!!!

Bioinformatics for Genomic and Proteomic data analysis

Documents

Towards a prokaryotic genomic taxonomy - Bioinformatics and

582670 Algorithms for Bioinformatics · 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Adapted from slides by Alexandru

Genomic and Proteomic Properties of the Genes involved for

Bioinformatics for Biologistsbarc.wi.mit.edu/education/bioinfo/lecture4-color.pdf · 2009-11-05 · Bioinformatics for Biologists Computational Methods I: Genomic Resources and Unix

Five Human Genomic Variations - SJSU Computer Science ...khuri/SMPD_287/Bioinformatics/...Bioinformatics in Medical Product Development ... Bioinformatics in Medical Product Development

Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University

1 Genomic and Proteomic Analysis of Invertebrate Iridovirus Type 9

VESPA: software to facilitate genomic annotation of ... · VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic

Genomic and proteomic evidence supporting the division of ... · Genomic and proteomic evidence ... The unclassified banana Blood Disease Bacterium ... However, the biovar sub-classification

Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based

Practically Genomic A hands-on bioinformatics IAP

Applying Genomic, Proteomic Microarray Tech in Drug Discovery - R. Matson (CRC, 2005) WW

Combined genomic and proteomic approaches reveal DNA

Jiayin Gu Et.al 2009 Assessing Susceptibility to Age-related Macular Degeneration With Proteomic and Genomic Biomarkers

Bioinformatics in Medical Genomic Variations Product ...khuri/SMPD_287/Bioinformatics/...Bioinformatics in Medical Product Development! Five Human Genomic Variations Sami Khuri Department

Genome Bioinformatics Tyler Alioto Center for Genomic Regulation Barcelona, Spain

Joint Genomic and Proteomic Analysis Identifies Meta-Trait Characteristics … · 2018-09-19 · Joint Genomic and Proteomic Analysis Identifies Meta-Trait Characteristics of Virulent

Web Based Annotation Tools for Bioinformatics Analysis of Proteomic

Genomic and Proteomic Characterization

Correlating mRNA and protein abundance via genomic and proteomic characteristics