27
Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby, Chengzhi Liang, Ming Li The peptide de novo sequencing from MS/MS spectrum

Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Embed Size (px)

Citation preview

Page 1: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Kaizhong Zhang

Department of Computer Science

University of Western Ontario

London, Ontario, Canada

Joint work with

Bin Ma,

Gilles Lajoie, Amanda Doherty-Kirby,

Chengzhi Liang, Ming Li

The peptide de novo sequencing from MS/MS spectrum

Page 2: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Introduction

Tandem mass spectrometry (MS/MS) now plays a very important role in protein identification due to its fastness and its high sensitivity.

The derivation of the peptide sequence from its MS/MS spectrum is an important task in proteomics.

The derivation without the help from a protein database is called the de novo sequencing which is especially important in the identification of unknown protein.

Page 3: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Introduction (2)

The basic lab experimental steps of this method are the following:

1. The proteins are digested with an enzyme to produce peptides;

2. The peptides are charged (ionized) and separated according to their different mass to charge (m/z) ratios;

3. Each peptide is fragmented into fragment ions and the m/z values of the fragment ions are measured.

Page 4: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Introduction (3)

Both step 2 and 3 are performed within a

tandem mass spectrometer.

Since there are many copies of each peptide being fragmented and the fragmentation can occur anywhere along the peptide, a spectrum of the observed m/z values is obtained.

Page 5: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Mass spectrum

For each possible fragment ion there could be a peak at the corresponding m/z value.

The height of the peak is proportional to the frequency of the m/z value begin observed by the mass spectrometer.

In general proteins consist of 20 different types of amino acids, of which most have different masses (except for one pair Leucine and Isoleucine).

Page 6: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Mass spectrum (2)

Consequently different peptides usually produce different spectra.

It is therefore possible, and now a common practice, to use the spectrum of a peptide to determine its sequence.

Page 7: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Peptide fragmentation

A charged peptide may be fragmented into two pieces in three ways, which may produce a pair of a- and x-ions, a pair of b- and y-ions, or a pair of c- and z-ions.

Theoretically, a fragmentation can occur at any place in a peptide and a spectrum is expected to contain all the possible ion peaks.

In practice, due to uneven strength of the bonds at different positions, different ions occur with different frequencies.

Page 8: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Peptide fragmentation (2)

Page 9: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Peptide fragmentation (3)

The most abundant ions are y-ions, which often form the complete series in a spectrum.

The next are a- and b-ions, of which many are not observed.

The c-, x-, and z-ions occur much less frequently.

In addition, these ions can often form new ions due to loss of water or loss of ammonia.

Page 10: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The approximate masses of some atoms that appear in peptides, where C13 is the isotope of C

Atom C C13 H O N Mass(Dalton) 12 13 1 16 14

Page 11: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Mass of an amino acid

For any amino acid a, we use ||a|| to denote the mass of C2H2RNO, i.e., the amino acid a with loss of a water.

For P=a1 a2 … ak being a sequence of amino acids, let ||P|| = 1 j k ||aj||.

Therefore the actual mass of peptide P is 18+||P|| because the extra H2O in it.

Page 12: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The approximate masses of the 20 amino acids Amino acid A R N D Mass (Dalton) 71.04 156.10 114.04 115.03

Amino acid C E Q G Mass(Dalton) 103.01 129.04 128.06 57.02

Amino acid H I L K Mass (Dalton) 137.06 113.08 113.08 128.09

Amino acid M F P S Mass (Dalton) 131.04 147.07 97.05 87.03

Amino acid T W Y V Mass (Dalton) 101.05 186.08 163.06 99.07

Page 13: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The hypothetical spectrum of P

Let A=a1 a2 … an be a sequence of amino acids, we introduce two notations:

||A||b = 1+||A||

||A||y =19+||A||

Page 14: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The hypothetical spectrum of P (2)

Let bi be the mass of the b-ion of P with i amino acids, then

bi = ||a1 a2 …ai||b (1 i < k). Let yi be the mass of the y-ion of P with i amino acids, then

yi =||ak-i+1 …ak ||y (1 i < k).

Clearly, yk-i +bi =20+||P||

Page 15: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The hypothetical spectrum of P (3)

Around each y-ion peak, it is possible to have other peaks.

For each y-ion with mass x, the corresponding x-ion and z-ion weigh x+26 and x-17.

An ion may loss a water to generate a peak at mass x-18.

An ion with mass x usually has a peak at x+1 corresponding to the isotopic ion which contains a C13 in it.

Page 16: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The hypothetical spectrum of P (4)

Therefore, for each y-ion with mass x, there are possible peaks at the masses in the following set.

Y(x)={x-18,x-17,x,x+1,x+26}

Similarly for each b-ion with mass x, the possible masses are from the following set.

B(x)={x-28,x-18,x,x+1,x+17}

Page 17: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The hypothetical spectrum of P (5)

Therefore, the hypothetical spectrum of the peptide P has peaks at each mass in the following set.

S(P)= 0<i< n B(bi) Y(yi)

Page 18: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The de novo sequencing problem Let P be a peptide and M=||P||+20. Given a solution containing peptide P, a

tandem mass spectrometer can measure a peak list L.

L is a set of 2-mers {(xi ,hi )| 0 < i < n+1} where 0 < x1 < … < xn are the masses and hi is the intensity of the peak at xi .

The total mass of P=M-2 can also be measured.

Page 19: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The de novo sequencing problem (2)

The masses given by the spectrometer are not accurate.

The maximum error varies from 0.01 dalton to 0.5 dalton depending on the type of spectrometer used.

Page 20: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The de novo sequencing problem (3)

Let be the error of the spectrometer. Let S be a set of masses, we say a peak

(x,h) in L is supported by S if there is a y in S such that |x-y| < .

The subset of peaks in L supported by S is denoted by LS .

LS ={(x,h) L|there is y S s.t. |x-y|< }

Page 21: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The de novo sequencing problem (4)

Therefore LS(P) consists of all the peaks in L that are supported by the masses of the hypothetical ions of P

The more peaks with high intensity are in LS(P) , the more likely L is the mass spectrum of P.

Page 22: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

The de novo sequencing problem (5)

For any peak list L’, we define h(L’)= (x,h) L’ h

The de novo sequencing problem is defined as the follows.

Given a mass spectrum L, a positive number M, and an error bound , to construct a peptide P so that | ||P||+20-M | < and h(LS(P) ) is maximized.

Page 23: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Algorithms

There are two major difficulties of the de novo sequencing problem.

First, each fragmentation may produce a pair of ions.

This means that both ends of the spectrum must be consider at the same time.

Page 24: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Algorithms (2)

Second, the types of the peaks is unknown and a peak may be matched by zero, one or two different types of ions.

When a peak is matched by two ions, the height of the peak can only be counted once

Page 25: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Algorithms (3)

The straightforward approach to “grow” the peptide from one terminal to the other does not work.

We use a more sophisticated dynamic programming algorithm for the de novo sequencing problem.

Our algorithm gradually “grow” a prefix and a suffix of the optimal solution in a carefully designated pathway until the prefix and the suffix are sufficiently long to form the optimal solution.

Page 26: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Experiments

Our model and algorithm account for most of the ion types that have been observed in practice.

Overlap of two different ions are correctly modeled.

Tolerant the mass error and handle the missing ions in the spectrum.

Page 27: Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Experiments (2)

Experimental results demonstrated that our algorithm performed extremely well.

The program has been integrated into a software package, peaks, which is now online accessible at http://www.BioinformaticsSolutions.com