Upload
aubrey-wilkins
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Variable Penalty Dynamic Time WarpingFor Aligning Chromatography Data
David Clifford
Research Scientist
June 2009
CSIRO Issues in aligning multiple - MS spectra
Talk Outline
• Gas Chromatography Mass Spectrometry• Examples and Properties
• Dynamic time warping – origins in speech recognition
• Uses in the 21st century aligning GC-MS data
• Central Idea of the talk – variable penalty DTW, joint work with Glenn Stone
• Results of alignment and How to do it
CSIRO Issues in aligning multiple - MS spectra
Gas Chromatography
• Separates a gas into its constituent parts• These elute from machine over period of 40 minutes• Measures quantity several times a second
• Does not identify compounds• Gold standard in analytical chemistry• Slow process, expensive technology
CSIRO Issues in aligning multiple - MS spectra
Uses of Gas Chromatography
• Wine Chemistry• Meat quality• Metabolomic studies• Data format is similar to Liquid Chromatography-MS etc
CSIRO Issues in aligning multiple - MS spectra
Goal of this talk
• How can we align the two signals• How can we align many signals• Dynamic time warping – yes but it overdoes the warping• Variable penalty DTW – balances warping with
alignment needs• VPdtw package now available on CRAN
CSIRO Issues in aligning multiple - MS spectra
Calling for a taxi….
• Matches what you say with database of placenames
• Dynamic time warping was invented in the late 60s early 70s to do this kind of matching.
• DTW can expand or contract your words to match placenames
• DTW is natural choice for matching speech• Speed of speech differs between individuals
• Um’s and ah’s need to be cut out etc.
• DTW is a very fast algorithm, achieves global optimum
CSIRO Issues in aligning multiple - MS spectra
Linear Transformation (Shift and Stretch)
R E F E R E N C E
Q
U
E
R
Y
CSIRO Issues in aligning multiple - MS spectra
Asymmetric Dynamic Time Warping
R E F E R E N C E
Q
U
E
R
Y
CSIRO Issues in aligning multiple - MS spectra
Sakoe-Chiba DTW (bound on shift)
Memory efficient variation of DTW – faster method
R E F E R E N C E
Q
U
E
R
Y
CSIRO Issues in aligning multiple - MS spectra
Dynamic Time Warping
Guaranteed global optimum, but lots of non-diagonal moves
R E F E R E N C E
Q
U
E
R
Y
CSIRO Issues in aligning multiple - MS spectra
Why do we need to care about this
Analysis is based on peak area – and overwarping will affect peak shape and area.
Overwarping introduces artificial features into data.
Overwarping occurs due to too many non-diagonal moves
Solution #1: penalise non-diagonal moves
Solution #2: variable penalty dependent on size of peaks
CSIRO Issues in aligning multiple - MS spectra
Variable penalty DTW
• Minimise over paths w
• Choose penalty vector using a dilation of the signals• Large penalty with large peaks• Minimise this function using dynamic programming• Easy to implement
• How does it compare to DTW, constant penalty DTW, and parametric time warping?
n
i
thii
n
ii diagonalnonismoveitwxty
11
)(|)))(()((|
CSIRO Issues in aligning multiple - MS spectra
Key Ingredient for VPdtw
• Penalty vector – proportional to a dilation of the signal.• There is some subjectivity here to balance the need for
alignment with the affect on raw signals.
CSIRO Issues in aligning multiple - MS spectra
Before Alignment – can’t see detail but
Elution Time (s)
To
tal I
nte
nsi
ty
400 600 800 1000 1200 1400 1600 1800 2000 2200
104
105
106
107
CSIRO Issues in aligning multiple - MS spectra
VPdtw package – now on CRAN, GPL 2
• VPdtw, dilation, plot.VPdtw, print.VPdtw• result <- VPdtw(reference, query, penalty,
maxshift = 350)• print(result)• plot(result,”Before”)• plot(result,”After”)• plot(result,”Shifts”)• plot(result)
• Many queries, one penalty• One query, many penalties• Reference can be NULL
CSIRO Issues in aligning multiple - MS spectra
Summary
• Introduced GC-MS data• This talk is really about improving data quality • Improvement via alignment
• without data reduction
• without unnatural features
• via fast computation
• VPdtw available on CRAN • Faster
• Better than available alternatives
CSIRO Issues in aligning multiple - MS spectra
References
DTW:
Vintsyuk, T. K. Kibernetika 1968 4 81 - 88
Sakoe, H., and Chiba, S. Proceedings of the International Congress on Acoustics, Budapest, Hungary, 1971; paper 20 c 13.
Parametric Time Warping:
Eilers, P.H.C. Anal. Chem. 2004 76 404 - 411
Alignment Using Variable Penalty Dynamic Time Warping by Clifford, Stone, Montoliu, Rezzi, Martin, Guy, Bruce and Kochhar. Anal. Chem., 2009, 81 (3), pp 1000–1007
Thank you
Statistical Bioinformatics - AgribusinessDavid CliffordResearch ScientistCSIRO Division of Mathematics, Informatics and Statistics
Phone: +61 2 9325 3210Email: [email protected]: www.csiro.au/science/org/CMIS.html
Contact UsPhone: 1300 363 400 or +61 3 9545 2176
Email: [email protected] Web: www.csiro.au
CSIRO Issues in aligning multiple - MS spectra
VPdtw package – print(result)
Reference is NULL. Query column # 13 is chosen at random.Query matrix is made up of 16 samples of length 5000.Single Penalty vector supplied by user.Max allowed shift is 150.
Cost Overlap Max Obs Shift # Diag Moves # Expanded # DroppedQuery #1: 1521.10 4994 51 4996 47 2Query #2: 1708.30 4996 53 5000 49 0Query #3: 1479.60 4998 59 5000 57 0Query #4: 1302.30 4998 62 5000 60 0Query #5: 1505.40 4996 61 5000 57 0Query #6: 1296.80 4997 60 5000 57 0Query #7: 1420.80 5000 61 5000 62 0Query #8: 1484.20 5000 59 5000 60 0Query #9: 1424.30 5000 51 5000 53 0Query #10:1306.30 4997 42 5000 39 0Query #11:1193.30 4994 29 4990 28 5Query #12: 225.04 4999 13 4998 13 1Query #13: 0.00 5000 0 5000 0 0Query #14: 266.09 4944 56 4894 2 53Query #15: 746.93 4937 63 4880 4 60Query #16: 345.87 4914 86 4836 0 82