Upload
horatio-parker
View
220
Download
2
Tags:
Embed Size (px)
Citation preview
Example data – MALDI-TOF
m/z1000 4500
Inte
nsity
1800
0
D:\Users\Fenyo\Desktop\ATP.txt (15:42 02/03/11)Description: none available m/z2280 2400
Inte
nsi
ty
700
0
D:\Users\Fenyo\Desktop\ATP.txt (15:46 02/03/11)Description: none available
m/z1300 1460In
ten
sity
45
0
D:\Users\Fenyo\Desktop\ATP.txt (15:50 02/03/11)Description: none available
m/z1444.0 1458.0
Inte
nsi
ty
35
0
D:\Users\Fenyo\Desktop\ATP.txt (15:54 02/03/11)Description: none available
m/z2378.0 2394.0
Inte
nsi
ty
700
0
D:\Users\Fenyo\Desktop\ATP.txt (16:07 02/03/11)Description: none available
Peptide intensity vs m/z
Fragment intensity vs m/z
Example data – ESI-LC-MS/MS
Time
m/z
m/z
% R
ela
tive
Ab
un
da
nce
100
0250 500 750 1000
[M+2H]2+
762
260 389 504
633
875
292405 534
9071020663 778 1080
1022
MS/MS
Peptide intensity vs m/z vs time
Fourier Transform
from numpy import *x=2.0*pi*arange(1000.0)/100000.0sin1 = sin(1000.0*x)sin2 = 0.2*sin(10000.0*x)sin12=sin1+sin2
fft12=fft.rfft(sin12)
Frequency
Inverse Fourier Transform
from numpy import *x=2.0*pi*arange(1000.0)/100000.0sin1 = sin(1000.0*x)sin2 = 0.2*sin(10000.0*x)sin12=sin1+sin2fft12=fft.rfft(sin12)
sin12_=fft.irfft(fft12,len(sin12))
Frequency
A Peak
centroid
full width at half
maximum (FWHM)
area
height
maximum
meanvarianceskewnesskurtosis
Inte
nsit
y
A Gaussian Peak
def gaussian(x,x0,s):return exp(-(x-x0)**2/(2*s**2))
x = linspace(-1,1,1000)y=gaussian(x,0,0.1)ffty=fft.rfft(y)
Frequency
A skewed peak
def pdf(x): return 1/sqrt(2*pi) * exp(-x**2/2)
def cdf(x): return (1 + erf(x/sqrt(2))) / 2
def skew(x,e=0,w=1,a=0): t = (x-e) / w return 2 / w * pdf(t) * cdf(a*t)
Frequency
Normal noise
x = linspace(-1,1,1000)y=0.2*random.normal(size=len(x))
If the noise is not normally distributed, try to find a transform that makes it normal
Frequency
Skewed noise
x=random.uniform(-1.0,1.0,size=10*len(x))y=random.uniform(0.0,1.0,size=10*len(x))yskew=skew(x,-0.1,0.2,10)/max(yskew)yn_skew=x_test[y<yskew][:len(x)]
Frequency
Convolution
http://en.wikipedia.org/wiki/Convolution
)()())(*( tgftgf
Describes the response of a linear and time-invariant system to an input signal
The inverse Fourier transform of the pointwise product in frequency space
Smoothing
w=ones(2*width+1,'d')convolve(w/w.sum(),y,'valid‘)
Frequency Frequency Frequency
Inte
nsit
y
Adaptive Background Correction (unsharp masking)
wlk
wlk
kIw
dwdlI )(
12),,('
Unsharp masking
Original
wi = linspace(1,window_len,window_len)w = 1 / ( 2*r_[wi[::-1],0,wi] + 1 )x_ = x - d*convolve(w/w.sum(),x,'valid')
Savitsky-Golay smoothingPolynomial order = 3
Bin size = 25
Bin size = 75
Bin size = 150
Polynomial order = 5 Polynomial order = 7
Background Subtraction Using Smoothing
Bin size = 100 Bin size = 200 Bin size = 300
Smooting Smooting Smooting
Background subtractionBackground subtractionBackground subtraction
Root Mean Square Deviation (RMSD)
22
2
//||
))((w
wlkIkI
The Root Mean Square Deviation (RMSD) is often constant for the noise and larger for the peak if the window size is approximately the size of the peak.
Background Subtraction using RMSDBin size = 100 Bin size = 200 Bin size = 300
RM
SD
RM
SD
RM
SD
Inte
nsit
y
Inte
nsit
y
Inte
nsit
y
Convolution, Cross-correlation, and Autocorrelation
http://en.wikipedia.org/wiki/Convolution
Convolution describes the response of a linear andtime-invariant system to an input signal.
The inverse Fourier transform of the pointwise product in frequency space.
Cross-correlation is a measure of similarity of two signals.
It can be used for finding a shift between two signals.
Auto-correlation is the cross-correlation of a signal with itself.
It can be used for finding periodic signals obscured by noise.
Cross-correlation and autocorrelation
)()())(( tgftgf
http://en.wikipedia.org/wiki/Convolution
)()())(*( tfftff
How similar are two signals?
Dot product),...,,(
21 aaa nA
),...,,(21 bbb n
B
cos
BA
BA iiiba
Identical vectors: 1,0 BAPerpendicular vectors: 0,
2 BA
)()()0)(( gfgf
The dot product is the came as the cross-correation at zero:
What are the characteristics of the dot product?
10 3 1 0.3 0.1 S/N 10
100
1000
Dimensions
Signal+Noise
Noise
Coincidence – enhances the signal
The signal to noise can be dramatically increased by measuring several independent signals of the same phenomenon and combining these signals.
Ideal signal
Product of the four measurements
Four measurements
Coincidence – supresses interference
Ideal signal
Product of the four measurements
Four measurements with interference
Peak Finding
The derivative of a function is zero at its minima and maxima.
The second derivative is negative at maxima and positive at minima.
Peak Finding
1. Characterize the signal and the noise2. Make a model of the data3. Select detection method4. Select parameters using simulations
Inte
nsit
y
Peak Finding: Characterizing the noise
Inte
nsit
y
Removing the peaks by looking for outliers in the root mean square deviation (RMSD)
RMSD
Peak Finding: Model of data
points=1000x = linspace(-1,1,points)y=noise*random.normal(size=len(x))y+=signal*gaussian(x,0,0.01)
S/N=1 S/N=2 S/N=4
Peak Finding: Detection method
S/N=1 S/N=2 S/N=4
Peaks can be detected by finding maxima in the moving average with a window size similar to the peak width
wlk
wlk
kIlS )()(
Peak Finding: Detection method – moving average
S/N=1
S/N=2
S/N=4
Bin size = 5 Bin size = 20 Bin size = 80 Signal
Peak Finding: Detection method – RMSD
S/N=1
S/N=2
S/N=4
Bin size = 5 Bin size = 20 Bin size = 80 Signal
Peak Finding: Information about the Peak
centroid(mean)
full width at half
maximum (FWHM)
area
height
maximum
meanvarianceskewnesskurtosis
Inte
nsit
y
Information about a Peak
)(
)(
xf
xxf
)(xfarea
Centroid or mean
)(xfA peak is defined by
))(max( xfheight
To calculate any of these measures we needto know where the peak starts and ends.
Estimating peptide quantity
Peak heightCurve fittingPeak area
Peak heightCurve fitting
m/z
Inte
ns
ity
0
5
10
15
20
25
30
0.8 0.85 0.9 0.95 1
3 points
0
20
40
60
80
100
120
140
0.8 0.85 0.9 0.95 1
3 points
5%
Acquisition time = 0.05s
5%
Sampling
What is the best way to estimate quantity?
Peak height - resistant to interference- poor statistics
Peak area - better statistics - more sensitive to
interference
Curve fitting - better statistics- needs to know the peak
shape- slow
Summary
Fourier transform - transformation to frequency space and back
Signal – how do we detect and characterize signals?
Noise – how do we characterize noise?
Modeling signal and noise
Simulation to select thresholds and select parameters
Filters – fitering by low-pass (i.e. smoothing) and high-pass filters
(e.g. adaptive background correction)
Detection methods based on moving average and RMSD
Convolution - describes the response of a linear and
time-invariant system to an input signal
Cross-correlation is a measure of similarity of two signals
Autocorrelation can be used for finding periodic signals obscured by
noise
The dot product can be used to determine how similar two signals
are
Coincidence measurements enhance the signal and supresses noise
The quantity associated with a peak – height and area
Sampling – how often do we need to sample a peak to get a good
estimate of its area?