65
DR. OBAIDUR RAHMAN Bioinformatics TOPIC 4 Protein BLAST: BLASTP

Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

DR. OBAIDUR RAHMAN

Bioinformatics

TOPIC 4

Protein BLAST: BLASTP

Page 2: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

LECTURE TOPICS:

Protein BLAST (BLASTP) at the NCBI and ExPASy Websites

The genetic code

Amino acids and their overlapping properties

The BLASTP scoring matrix

Page 3: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

BLASTN Vs BLASTP

Type Query Database

BLASTN Nucleotide Nucleotide

BLASTP Protein Protein

Page 4: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

There are 64 possible triplets of the four nucleotides. How?

4 bases in the first position

4 bases in the second position

4 bases in the third position

4 x 4x 4=64 codons

Page 5: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

64 codons make 20 amino acids how?

There are redundancy among amino acids, usually

referred as degeneracy.

Page 6: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Third position change generally don’t change any amino acid.

But not true for all case, eg. 1

Some time even the first position change don’t effect the amino acid

It seems protein sequences are more conserved than DNA sequences

More information is found in protein sequence alignment than DNA sequence

alignment.

Page 7: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Memorizing the genetic code:

• Most proteins begin with the codon ATG Methionine

• The translation ends with one of the codons known as

stop codons TAA, TAG, TGA

• Some organism preference in codon is higher than

other during degeneracy

Page 8: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Amino acids Codon

Page 9: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Amino acid properties:

Page 10: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

This arrangement, which is shown by polarity and charge, is one of these groupings.

Classifying amino acids by polarity is important because their polarity effects which non-covalent interactions they can form.

And these interactions are largely what gives proteins their shape.

Page 11: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

There are three main kinds of non-covalent interactions.

Page 12: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

The weakest ones are Van Der Waals interactions, such as this one between an aliphatic isoleucine and an aliphatic leucine side chain.

As illustrated on the energy diagram on the right, Van Der Waals interactions are weak and act only over short distances, although they are present between any pair of atoms in close proximity.

The distance at which the energy is minimal represents the Van Der Waals radius that's illustrated here by the transparent spheres.

Page 13: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

The strongest non-covalent interactions are salt bridges between pairs of charged ions.

Here, a lysine side chain is paired to the C terminal carboxylate of the protein.

Depending on the polarity of the environment, a salt bridge can provide more than 10 times the binding energy of a Van Der Waals interaction.

Page 14: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

Finally, hydrogen bonds are two to five times stronger than Van Der Waals interactions, but they only occur between polar groups with permanent dipoles.

One of these polar groups is acting as a hydrogen bond donor, and the other one is a hydrogen acceptor.

Here, you can see a hydrogen bond within the backbone of a protein within an alpha helix.

Hydrogen bonds are unique because they are directional. They are strongest when the two dipoles are aligned.

In contrast, both of the Van Der Waals interactions and salt bridge interactions are non-directional.

Page 15: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

Survey a few of the amino acids.

acidic amino acids-- Aspartate and Glutamate.

They both have the carboxylic acid group at the end of their side chain.

Or maybe I should say a carboxylic, because at physiological pH, they're ionized and charged, negatively charged.

Now glutamate is longer than aspartate by one methylene group.

And you might think that that's not very much. That's not a big difference.

But it actually makes a big difference, especially in the types of conformations or rotamers that each of these side chains can achieve.

Page 16: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

Survey a few of the amino acids.

acidic amino acids-- aspartate and glutamate.

The glutamate has many conformations that it can achieve, many more than the aspartate side

chain.

And that means that it might be better able to position itself exactly in the right position to

interact with a substrate or a ligand in the active side of an enzyme.

So although the glutamate can optimally position itself, that can come at an entropic cost.

And that's because the conformational flexibility of the many rotamers will then be limited once

it reaches its bound conformation, reducing its entropy.

Page 17: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

Histidine, which is another interesting amino acid that's often found in the active sides of

enzymes.

Now histidine has a pKa for the imidazole group of its side chain of about six, which means

that it can either be uncharged or charged at physiological pH, depending on its environment.

So on the left is a deprotonated or uncharged form of the histidine, whereas on the right is a

protonated and positively charged form of the histidine.

Now in the neutral state, the proton can actually be on either nitrogen atom of the imidazole

group.

Histidine

Page 18: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

These two neutral states have different hydrogen bonding

properties as suggested by the red and blue arrows here.

The transitions between the different states can be used to

shuttle protons in active sides.

Page 19: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

So from histidine we saw that the pKas of the protein groups reflect their chemical

properties.

Several amino acids have polar groups that have pKas spanning a wide range of values,

as shown here.

As a food for thought, consider why would tyrosine be so much more acidic than

threonine and serine, even though they're all alcohols? (find the answer)

Page 20: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

So most amino acids are formed of carbon, nitrogen, oxygen, and hydrogen.

But two of them have a sulfur atom.

The first one is methionine, essentially a hydrophobic

residue.

It's very similar in size and shape to leucine,

shown here.

The second one is cysteine, which has a sulfhydryl

group.

And this group is interesting, because it is actually

quite reactive under physiological conditions.

Page 21: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

One of the reactions that it can undertake is

to be oxidized to form a disulfide bond.

So cysteines can react to form these disulfide

bonds under oxidizing conditions.

And those are conditions that are often found

on the extracellular side.

Whereas inside cells, conditions tend to be

more on the reducing side, which means that

the cysteines will be found in the reduced free

form.

Page 22: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

So I told you that amino acids that are used by the ribosome, the natural amino acids, and

are L amino acids.

Well, there are actually two exceptions to this.

Glycine has a hydrogen as a side chain, which means

that it now has two hydrogens coming off of its alpha

carbon.

But also, the small side chain means that it has fewer

conformational restrictions.

And that's going to be important in the process of

protein folding.

The second exception is proline.

Proline is a cyclic amino acid.

And that's because its side chain is actually covalently linked

to its amino group.

Now this linkage, this covalent linkage, means that proline is

actually more conformationally restricted than most amino acid.

And again, this interesting conformational property

comes into play when we think about protein folding.

Page 23: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

Let's revisit how we can classify the amino

acids.

This Venn diagram shows some of the many ways to

classify amino acids.

Page 24: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

For example, if we look at lysine, it is charged at physiological pH because its side chain

amino group carries a positive charge.

It can also readily form hydrogen bonds, and therefore it's also polar.

Why is Lysine also classified as nonpolar?

Page 25: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Protein structure

A look at the structure again, and I'll give you a hint.

Now aside from the charged amino group at the end of its side chain, the rest of the side

chain is aliphatic or nonpolar.

So that means that lysine can sometimes act as nonpolar in certain situations.

So the sequence of amino acids that make up a protein is called its primary structure.

Page 26: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

BLASTP & Scoring Matrix

Page 27: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 28: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 29: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 30: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 31: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 32: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 33: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 34: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 35: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 36: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 37: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 38: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 39: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 40: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 41: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 42: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 43: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Class Lab Protocol

Page 44: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Task 1. Retrieving protein sequence

Page 45: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 46: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Ref. protein record,

DB source has a

hyperlink to the DNA

record encoding this

protein

Page 47: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

The CDS section of

an NCBI mRNA

record. This contains

a translation protein

encoded by this

mRNA

Page 48: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

THE RESULTS OF BLASTP

The program detected that the protein belongs to a

larger family or “superfamily”: IGF (Insulin Like

Growth Factor) that includes insulin and many related

sequences.

Page 49: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 50: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 51: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

1e-52 is quite small and nobody would argue that hit is by

chance.

Page 52: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 53: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

Distant homologue

Page 54: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

PAIRWISE BLAST

Page 55: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM

USING ExPASy Website

Page 56: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 57: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 58: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 59: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 60: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 61: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 62: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 63: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 64: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM
Page 65: Bioinformatics · 2018. 10. 19. · Practical Bioinformatics Author: Michael Agostino Created Date: 10/15/2015 4:37:52 AM