1 Bioinformatics Master Course Sequence Alignment Lecture 9a Pattern matching part I

Preview:

Citation preview

1

Bioinformatics Master Course Sequence Alignment

Lecture 9aPattern matching

part I

2

Sequence Patterns vs. Protein Structure

I. Protein-Protein interaction1. enzyme (protein) substrate : serine protease trypsin2. receptor (protein) ligand : growth hormone receptor3. antibody (protein) antigen : immunoglobulin (Ig)

II. Protein-Ion and small molecule interaction1. protein ion (Ca2+, Mg2+, Na+, K+, Cl–, HCO3

–, SO42–) :

calmodulin2. pump ion, coupled to enzymatic function : ATPase3. channel water : aquaporin

III. Protein-DNA/RNA interaction1. enzyme DNA : Eco-RI ribozyme2. binder DNA groove : leucine zipper, zinc finger3. regulator RNA : KH domain

3

Reactions and Interactions

• What is the difference between a reaction and an interaction? change in chemical bonding

• Which one of these is a chemical bond?1. H3C-CH2-O-H2. Na+ Cl–

3. H-O-H···OH2

4. H-O-CH2-CH3···H3C-CH2-O-H

4

Bond Strength

• Bond strength and lifetime are a function of temperature vibration (bond stretching), thermal background

• Non-covalent interactions depend very much on the medium compare salt crystal with salt solution

• Interaction strength has a strong distance dependence ion-ion ~ r–2, dipole-dipole ~ r–4

quadrupole-quadrupole ~ r–6

5

Binding: Complementary Interfaces

Binding requires complementary interfaces:

Interfaces have characteristic and conserved residues patterns or motifs

6

Sequence Patterns and Profiles• Comparison between sequence pattern matching and

similarity scoring

PATTERN SCORE

exact word identity

regular expression weight matrix

Hidden Markov Model profile

generalized profilegeneral Hidden Markov Model

7

Resources• PROSITE: biologically significant sites, patterns and profiles

– www.ebi.ac.uk/ppsearch/

• PFAM: large collection of multiple sequence alignments– www.sanger.ac.uk/Software/Pfam/

• DIP: interacting proteins– dip.doe-mbi.ucla.edu/

• Specialized Databases– Immunoglobins: imgt.cines.fr/– Ca2+-binding proteins structbio.vanderbilt.edu/cabp_database/

• Molecular visualisation packages– VMD: www.ks.uiuc.edu/Research/vmd/– MOLMOL: www.mol.biol.ethz.ch/wuthrich/software/molmol/– Rasmol: www.umass.edu/microbio/rasmol/

8

Protein-Protein Interactions

9

Protein Interaction NetworksMost proteins are functionally linked to other proteins

H Jeong, SP Mason, A-L Barabási & ZN Oltvai "Lethality and centrality in protein networks" Nature 2001;411(6833):41

10

I.1 Enzyme: Serine Protease Trypsin

• Specific class of hydrolases– cleave peptide bonds at specific residue positions.

• aspartate proteases, cysteine proteases, serine proteases

• Trypsin is a serine protease– cleaves C-terminal of the basic residues Lys and Arg– one of the three principal digestive proteases

• other two are pepsin and chymotrypsin

– produced in an inactive form by the pancreas

• Pattern: His57, Asp102 and Ser195 (H-D-S)

NC

CN

'R'

OH

H

HN

CC

'R'

OHH

O

H2O

NC

C

'R'

OHH

OH

CH2

Trypsin

HOCH2

Trypsin

HOCH2

Trypsin

N

H

HN

H

H

11

Serine Protease: Trypsin

• Pattern: His57, Asp102 and Ser195 (H-D-S)

12

Principle of Catalysis

http://www.chemguide.co.uk/physical/basicrates/catalyst.html

13

Trypsin Complex with Inhibitor

1btc.pdb

14

I.2 Receptor: Growth Hormone Receptor

• Membrane-borne receptors:– extra-cellular domain

• ligand-binding site

– transmembrane domain• anchoring in the cell membrane

– intracellular domain• kinase or another signalling module (typically)

• Receptor for growth hormone – member of the cytokine receptor superfamily– dimerizes upon binding growth hormone as ligand– activates intracellular kinase, triggers cellular signalling cascade.

• Most structures only contain extra/intracellular domain– transmembrane domain is difficult to crystallize

• Patterns:– YGEFS (growth hormone receptor)

– WSxWS (cytokine receptor family)

15

Growth Hormone Receptor Complex with Growth Hormone

1a22.pdb

16

I.3 Immune System: Antibody• Antibodies (immunoglobulins, or Ig)

– immune system: bind ’foreign’ (non-self) characteristic structures

• e.g. protein surfaces

• Heavy Chain and Light Chain• Constant part (Fc) and Variable part (Fv).

– Fv specific recognition of target molecule (‘antigen’)

• structure called ‘Ig fold’:– Two -sheets face-to-face, with ‘Greek-key’ motif– binding site between two Ig folds– hypervariable loops participate in binding:

• H1, H2, H3 and L1, L2, L3• composition characteristic for antigen

17

Pfam Ig Family Alignment

18

Patterns of Hypervariable Loops

Loop Before After Length

CDR-L1 always Cys always Trp 10 to 17

CDR-L2 generally Ile-Tyr, also Val-Tyr, Ile-Lys, Ile-Phe

- always 7

CDR-L3 always Cys always Phe-Gly-xxx-Gly 7 to 11

CDR-H1 always Cys-xxx-xxx-xxx always Trp 10 to 12

CDR-H2 typically Leu-Glu-Trp-Ile-Gly Lys, Arg-Leu, Ile, Val, Phe, Thr, Ala-Thr, Ser, Ile, Ala

16 to 19

CDR-H3 always Cys-xxx-xxxx always Trp-Gly-xxx-Gly 3 to 25

19

Antibody Structure

Kontou et al. Eur J Biochem 2000 267 23891F3R.pdb

20

Antibody Diversity• Gene translocation

• heavy chain – multiple VH genes join with one DH and one JH

• light chain – multiple VL genes join with one JL gene

www.cat.cc.md.us/courses/bio141/lecguide/unit3/humoral/antibodies/abydiversity/abydiversity.html

21

Protein-Ion and Protein-’small molecule’

Interactions

22

II.1 Ion Binding: Calmodulin

• Two domains, each two ‘EF-hands’: – helix-loop-helix structure– loop contains Ca2+-binding motif.

• Ca2+-ion: 6-fold coordinated: – Oxygens from residues 1, 3, 5, 7, 9, and 12 in EF loop:

D-K-D-G-D-G-T-I-T-T-K-Q– one water molecule– three are negatively charged

• Ca2+-binding changes conformation of entire protein from closed to open– open conformation exposes hydrophobic surface area– binding site for calmodulin target proteins

23

Calmodulin Complex with Calcium Ions

1exr.pdb

24

II.2 Ion Pump: 2. Calcium ATPase (ATP synthase)• protein complex

– links electrical potential to ATP hydrolysis/synthesis– interconversion between mechanical and electrochemical energy in

molecular motors.

• F1F0 ATPase: reversible proton pump/motor• P-type ATPases: transport ions across membrane against a

concentration gradient.– Pattern: D-K-T-G-T-[LIVM]-[TIS]– Next to aspartate which is phosphorylated during reaction cycle

• Na+/K+-ATPase: ubiquitous membrane transport protein in mammalian cells– maintains high K+ and low Na+ in cytoplasm for normal membrane potentials

and cellular activities

• Ca-ATPases: Ca2+ from cytoplasm to organels (mammalian)– e.g. sarcoplasmic reticulum, endoplasmic reticulum

25

ATPases

F1Fo-ATPase Ca2+-ATPasewww.rpi.edu/dept/bcbp/molbiochem/MBWeb/mb1/part2/f1fo.htm

www.utoronto.ca/maclennan/rint1.htm

26

ATPase: Calcium Ions in Active Site

1eul.pdb

27

II.3 Membrane Channel: Aquaporin

Conserved NPA motifs: Asn, Pro and Ala stabilise loops through multiple hydrogen bonds

Bert de Groot: www.mpibpc.mpg.de/groups/de_groot/bgroot.html

28

Aquaporin: Motifs

•NPA: stabilizes loops B and E

• G(a)xxxG(a)xxG(a):– Crossing of

right-handhelicalbundles

Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press

29

Aqu

apor

in S

ubun

it

Ber

t de

Gro

ot: w

ww

.mpi

bpc.

mpg

.de/

grou

ps/d

e_gr

oot/b

groo

t.htm

l

1j4n.pdb

30

Protein-DNA/RNA Interactions

31

III.1 Enzyme: Eco-RI• Restriction enzyme:

– cut palindrome sequences – complex of one

DNA molecule with two Eco-RI molecules with inversion symmetry

www.accessexcellence.org/RC/VL/GG/restriction.html

32

Eco-RI

1qrh.pdb

33

III.2a DNA recognition: Leucine Zipper

• Dimer – Leu interactions– binds DNA by a fork-shaped structure

• ‘coiled-coil’ structure:– leucines on one side of helix– 7-residue repeat; one helix turn is 3.6 residues

a b c d e f g (position)

256 KV E E L L S KN Y H L E N EV A R L K K LV G 279

34

Leucine Zipper: Complex with DNA

1an2.pdb

35

Leucine Zipper: 7-Residue Repeat

36

III.2b DNA Recognition: Zinc Finger Proteins

• zinc coordinates several side chains– pulls them together to form ‘finger’ loops

• Pattern: C-x2-4-C-x12-15-H-x3-5-H or C-x2-4-C-x12-15-C

– recognize nucleic acids (DNA or RNA) • modulate genes (also proteins can be targeted)

• modulate important functions:– gene expression– reverse transcription and virus assembly

• drug discovery targets: – pathogen-specific 3D structures – different from endogeneous (cellular) zinc finger proteins

37

Zinc Finger Complex with DNA

1a1h.pdb

38

III.3 RNA Regulation: KH Domain

• bind to specific DNA/RNA locations– regulation of RNA synthesis and metabolism– combination with other domains– Pattern: G-x-x-G

• ribonucleoprotein (RNP) domain• double stranded RNA binding domain (dsRBD)• K Homology (KH) domain

– recognize tetranucleotide motifs – high affinity/specificity:

• RNA secondary structure• repeated sequence elements

• alpha/beta fold similar to ribosomal proteins

39

KH Domain Complex with RNA

1k1g.pdb

40

Copyright ©2005 American Society of Plant BiologistsPrzybilski, R., et al. Plant Cell 2005;17:1877-1885

The HHRzHammerhead Motif of Ribozyme

41

Hammerhead Motif of Ribozyme

• three base-paired helices (I-III) • core of 11 highly conserved, non-complementary

nucleotides – necessary for the catalysis.

• catalytic motif discovered by sequence comparison of plant viroids– site-specific,

self-catalyzed cleavage

(Birikh, 1997)academic.brooklyn.cuny.edu/chem/zhuang/QD/toppage1.htm

42

Hammerhead Ribozyme Action

488d.pdb

43

Copyright ©2005 American Society of Plant Biologists

Przybilski, R., et al. Plant Cell 2005;17:1877-1885

Modeling of the Arabidopsis HHRz Ara2

44

Recommended