24
“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Embed Size (px)

DESCRIPTION

Chang, J-M, P Di Tommaso, JF Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

Citation preview

Page 1: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

“Homology-enhanced probabilistic consistency” multiple sequence alignment :

a case study on transmembrane protein

Jia-Ming Chang

2013-July-09

Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

Page 2: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Transmembrane proteinMembrane proteins are likely to constitute 20-30% of all ORFs contained in genomes.

Odorant receptors

Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no. 5951 (October 16, 2009): 382-383.

Page 3: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Transmembrane protein multiple sequence alignment

• 1994 first address alignment for transmembrane proteins

– Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy for

transmembrane proteins. J Mol Biol 1994, 243(3):388-396.

• Few multiple sequence alignment software till now => 3

– ShafrirY, Guy HR: STAM: simple transmembrane alignment method.

Bioinformatics 2004, 20(5):758-769.

– Forrest LR, Tang CL, Honig B: On the accuracy of homology modeling and

sequence alignment methods applied to membrane proteins. Biophys J 2006,

91(2):508-517.

– Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved

multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-

497.

Page 4: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

BAliBASE 2.0 reference 7

Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.

Page 5: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

We need an accurate Transmembrane MSA!

Page 6: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Homology-extended

Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.

Page 7: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Homology-extended

Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.

Page 8: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Pair-hidden Markov Model

Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15(2):330-340.

Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.

Page 9: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Probabilistic consistency transformation

Page 10: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Homology-extended probabilistic consistency

New emission probabilities are like the following.

20 20

)..,..(),('m n

nmnmji AAAApyxp

where αm is the frequency with which residue m appears at position i and βn is the frequency with which residue n appears at position j; p(A.A.m, A.A.n) is the original emission probabilities in ProbCons.

Page 11: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Homology-extended probabilistic consistency

 

P(xi ~ y j Îa* | x,y)¬1

Sa ig kP xi ~ zk Îa* | x,z( )

zk

å · b jg kP zk ~ y j Îa* | z,y( )zÎS

å

where αi , βj , and rk are the profile frequency.

Page 12: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Homology-extended

Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.

Que1: how to build a profile?

Que2: how to score profiles?

Page 13: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Que1: how to build a profile?• Database Size

• Searching parameters

– E-value : most used, anything else???

1. Matrix file : -M2. Filter the query sequence for low-complexity subsequence : -F3. Neighborhood word threshold : -f4. Truncates the report to number of alignments: -b

Page 14: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Word hit & Neighborhood

Page 15: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Searching parameters

• Fast, Insensitive search

– High percent identity

– blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5

• Slow, Sensitive search

– Increase sensitivity, decrease specificity

– blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v 10000

• Book “BLAST”, page 146, 147

Page 16: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

UniRef50TM

UniRef90TM

UniRef100TM

UniProtTM

Different database

UniProt (release 15.15 – 2010)

NCBI non-redundant (NR)

UniRef50 UniRef90 UniRef100

keyword:"Transmembrane [KW-0812]"

Page 17: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Database Size

Data Set No.

UniRef50-TM 87,989

UniRef90-TM 263,306

UniRef100-TM 613,015

UniProt-TM 818,635

UniRef50 3,077,464

UniRef90 6,544,144

UniRef100 9,865,668

UniProt 11,009,767

NCBI NR 10,565,004

UniRef50TM

UniRef90TM

UniRef100TM

UniProtTM

UniProt (release 15.15 – 2010)

NCBI non-redundant (NR)

UniRef50 UniRef90 UniRef100

keyword:"Transmembrane [KW-0812]"

Page 18: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Performance comparison of different database sizes for the BAliBASE2-ref7.

UniRef50-TM contains about 100 times fewer sequences than the full UniProt.

The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.

Page 19: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee
Page 20: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

10% more columns are correctly aligned when compared with PRALINETM .

The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.

Page 21: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

BAliBASE 3.0

The performance of other methods are from Rausch et al. The SP and TC scores of full-length sequences are evaluated by core blocks (by xml).

Page 22: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Que2: how to score profiles?

Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301-1308.

Page 23: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

• Prediction mode : –template_file PSITM

• Output : -output tm_html

This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species.Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.

Page 24: TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Paolo Di Tommaso

http://tcoffee.crg.cat/tmcoffee