31
Amino acids

Structural database and their classification by abdul qahar

Embed Size (px)

Citation preview

Page 1: Structural database and their classification by abdul qahar

Amino acids

Page 2: Structural database and their classification by abdul qahar

Presenting By Abdul Qahar (A Q)

Buner Campus

Edited, Prepared and shared ByAbdul Qahar

Page 3: Structural database and their classification by abdul qahar

Structural database and their classification.

Page 4: Structural database and their classification by abdul qahar

Basic concept about DatabaseBasic concept about Database

1. What is a database?1. What is a database?A database is a collection of data which can be A database is a collection of data which can be used: used: • alone, or alone, or • combined / related to other data combined / related to other data

to provide answers to the user’s question.to provide answers to the user’s question.

Page 5: Structural database and their classification by abdul qahar

Data types

primary data

secondary data

tertiary data

sequence

DNA

amino acid

DMPVERILEALAVE…

primary database

secondary protein structure“motifs”: regular

expressions, blocks, profiles, fingerprints

e. g., alpha-helices, beta-strands

secondary db

domains, folding units

tertiary protein structure tertiary db

atomic co-ordinates

interaction data

binary protein-protein interactions/ networks

pathways and functional networks

interaction db

Page 6: Structural database and their classification by abdul qahar

Primary biological databases

Nucleic acid databases

EMBL

GenBank

DDBJ (DNA Data Bank of Japan)

Protein databases

PIR

MIPS

SWISS-PROT

TrEMBL

NRL-3D

Page 7: Structural database and their classification by abdul qahar

Nucleotide Databases

•EMBL:Nucleotide sequence database•Ensembl: Automatics annotation of eukaryotic genomes•Genome Server: Overview of completed genomes at EBI•Genome-MOT: Genome monitoring table•EMBL-Align: Multiple sequence alignment database

Page 8: Structural database and their classification by abdul qahar

Sequence data = strings of letters

Nucleotides (bases)

Adenine (A)

Cytosine (C)

Guanine (G)

Thymine (T)

triplet codons

genetic code

20 amino acids (A, L, V, S etc.)

Page 9: Structural database and their classification by abdul qahar

Three-dimensional protein structure = atomic coordinates in 3D space

Page 10: Structural database and their classification by abdul qahar

Protein folding

Page 11: Structural database and their classification by abdul qahar

EMBL/GenBank/DDJB

• These 3 db contain mainly the same information (few differences in the format and syntax)

• Serve as archives containing all sequences (single genes, ESTs, complete genomes, etc.) derived from:– Genome projects and sequencing centers– Individual scientists – Patent offices (i.e. USPTO, EPO)

• Non-confidential data are exchanged daily.

Page 12: Structural database and their classification by abdul qahar

Databases related to Genomics

• Contain information on genes, gene location (mapping), gene nomenclature and links to sequence databases;

• Exist for most organisms important for life science research;• Examples: MIM, GDB (human), MGD (mouse), FlyBase

(Drosophila), SGD (yeast), MaizeDB (maize), SubtiList (B.subtilis), etc.

Page 13: Structural database and their classification by abdul qahar

Swiss-Prot

• Annotated protein sequence database established in 1986 and maintained collaboratively since 1987, by the Department of Medical Biochemistry of the University of Geneva and EBI

• Complete, Curated, Non-redundant and cross-referenced with 34 other databases

• Highly cross-referenced• Available from a variety of servers and through sequence analysis

software tools• More than 8,000 different species• First 20 species represent about 42% of all sequences in the

database• More than 1,29,000 entries with 4.7 X 1010 amino acids

Page 14: Structural database and their classification by abdul qahar

PDB: Protein Data Bank

• Holds 3D models of biological macromolecules (protein, RNA, DNA).

• All data are available to the public.

• Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%).

• Submitted by biologists and biochemists from around the world.

Page 15: Structural database and their classification by abdul qahar

EMBL Nucleotide Sequence Database

• An annotated collection of all publicly available nucleotide and protein sequences

• Created in 1980 at the European Molecular Biology Laboratory in Heidelberg.

• Maintained since 1994 by EBI- Cambridge.

Page 16: Structural database and their classification by abdul qahar

DDBJ–DNA Data Bank of Japan

• An annotated collection of all publicly available nucleotide and protein sequences

• Started, 1984 at the National Institute of Genetics (NIG) in Mishima.

• Still maintained in this institute a team led by Takashi Gojobori.

Page 17: Structural database and their classification by abdul qahar

Why Proteins Structure ?Why Proteins Structure ?

Proteins are fundamental components of all living cells, performing a variety of biological tasks.

Each protein has a particular 3D structure that determines its function.

Protein structure is more conserved than protein sequence, and more closely related to function.

Page 18: Structural database and their classification by abdul qahar

Supersecondary structures

Assembly of secondary structures which are shared by many structures.

Beta hairpin

Beta-alpha-beta unit

Helix hairpin

Page 19: Structural database and their classification by abdul qahar

Structural Databases

SCOP: Structural Classification of Proteins

Current Release: 686 folds; 1073 Superfamilies; 1827 Familes representing 15,979 PDB entries

CATH: Classification, Architecture, Topology, Homology

Page 20: Structural database and their classification by abdul qahar

Levels in SCOP

1. Class2. Folds3. Super families4. Families

Page 21: Structural database and their classification by abdul qahar

Major classes in scop

• Classes– All alpha proteins– Alpha and beta proteins (a/b)– Alpha and beta proteins (a+b)– Multi-domain proteins– Membrane and cell surface proteins– Small proteins

Page 22: Structural database and their classification by abdul qahar

Folds*

• Each Class may be divided into one or more folds• Proteins which have the same secondary structure elements

arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold

Page 23: Structural database and their classification by abdul qahar

Superfamilies

• Superfamilies are a subdivisions of folds• A superfamily contains proteins which are thought to be

evolutionarily related due to– Sequence– Function– Special structural features

• Relationships between members of a superfamily may not be readily recognizable from the sequence alone

Page 24: Structural database and their classification by abdul qahar

Families

• Subdivision of super families• Contains members whose relationship is readily recognizable

from the sequence• Families are further subdivided in to Proteins• Proteins are divided into Species

– The same protein may be found in several species

Page 25: Structural database and their classification by abdul qahar

All alpha: Hemoglobin

Page 26: Structural database and their classification by abdul qahar

All beta: Immunoglobulin (8fab)

OL

Page 27: Structural database and their classification by abdul qahar

OL

Alpha/beta: Triosephosphate isomerase

Page 28: Structural database and their classification by abdul qahar

CATH

• Levels• Class• Architecture

– This level is unique to CATH • Topology

– ~Fold(/super family) in SCOP• Homologous Super family

– ~Super family(/family) in SCOP

Page 29: Structural database and their classification by abdul qahar

Architecture

• Same overall arrangement of secondary structures – Example: The architecture :Two layer beta sheet proteins

contains different folds each with a distinct number and connectivity of strands

Page 30: Structural database and their classification by abdul qahar
Page 31: Structural database and their classification by abdul qahar

Abdul Qahar Buneri [email protected]

www.slideshare.net/abdulqahar045