31
Variable Penalty Dynamic Time Warping For Aligning Chromatography Data David Clifford Research Scientist June 2009

Variable Penalty Dynamic Time Warping For Aligning Chromatography Data David Clifford Research Scientist June 2009

Embed Size (px)

Citation preview

Variable Penalty Dynamic Time WarpingFor Aligning Chromatography Data

David Clifford

Research Scientist

June 2009

CSIRO Issues in aligning multiple - MS spectra

Talk Outline

• Gas Chromatography Mass Spectrometry• Examples and Properties

• Dynamic time warping – origins in speech recognition

• Uses in the 21st century aligning GC-MS data

• Central Idea of the talk – variable penalty DTW, joint work with Glenn Stone

• Results of alignment and How to do it

CSIRO Issues in aligning multiple - MS spectra

Gas Chromatography

• Separates a gas into its constituent parts• These elute from machine over period of 40 minutes• Measures quantity several times a second

• Does not identify compounds• Gold standard in analytical chemistry• Slow process, expensive technology

CSIRO Issues in aligning multiple - MS spectra

Uses of Gas Chromatography

• Wine Chemistry• Meat quality• Metabolomic studies• Data format is similar to Liquid Chromatography-MS etc

CSIRO Issues in aligning multiple - MS spectra

Goal of this talk

• How can we align the two signals• How can we align many signals• Dynamic time warping – yes but it overdoes the warping• Variable penalty DTW – balances warping with

alignment needs• VPdtw package now available on CRAN

CSIRO Issues in aligning multiple - MS spectra

Before and After Alignment

CSIRO Issues in aligning multiple - MS spectra

Calling for a taxi….

• Matches what you say with database of placenames

• Dynamic time warping was invented in the late 60s early 70s to do this kind of matching.

• DTW can expand or contract your words to match placenames

• DTW is natural choice for matching speech• Speed of speech differs between individuals

• Um’s and ah’s need to be cut out etc.

• DTW is a very fast algorithm, achieves global optimum

CSIRO Issues in aligning multiple - MS spectra

No alignment

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Alignment by Shift

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Linear Transformation (Shift and Stretch)

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Parametric Time Warping

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Asymmetric Dynamic Time Warping

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Sakoe-Chiba DTW (bound on shift)

Memory efficient variation of DTW – faster method

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Dynamic Time Warping

Guaranteed global optimum, but lots of non-diagonal moves

R E F E R E N C E

Q

U

E

R

Y

CSIRO Issues in aligning multiple - MS spectra

Paths found with two different penalties

CSIRO Issues in aligning multiple - MS spectra

Why do we need to care about this

Analysis is based on peak area – and overwarping will affect peak shape and area.

Overwarping introduces artificial features into data.

Overwarping occurs due to too many non-diagonal moves

Solution #1: penalise non-diagonal moves

Solution #2: variable penalty dependent on size of peaks

CSIRO Issues in aligning multiple - MS spectra

Variable penalty DTW

• Minimise over paths w

• Choose penalty vector using a dilation of the signals• Large penalty with large peaks• Minimise this function using dynamic programming• Easy to implement

• How does it compare to DTW, constant penalty DTW, and parametric time warping?

n

i

thii

n

ii diagonalnonismoveitwxty

11

)(|)))(()((|

CSIRO Issues in aligning multiple - MS spectra

Key Ingredient for VPdtw

• Penalty vector – proportional to a dilation of the signal.• There is some subjectivity here to balance the need for

alignment with the affect on raw signals.

CSIRO Issues in aligning multiple - MS spectra

Before Alignment – can’t see detail but

Elution Time (s)

To

tal I

nte

nsi

ty

400 600 800 1000 1200 1400 1600 1800 2000 2200

104

105

106

107

CSIRO Issues in aligning multiple - MS spectra

Check Alignment #1

CSIRO Issues in aligning multiple - MS spectra

Check Alignment #2

CSIRO Issues in aligning multiple - MS spectra

Check Alignment #3

CSIRO Issues in aligning multiple - MS spectra

How far are points moved by alignment?

CSIRO Issues in aligning multiple - MS spectra

VPdtw package – now on CRAN, GPL 2

• VPdtw, dilation, plot.VPdtw, print.VPdtw• result <- VPdtw(reference, query, penalty,

maxshift = 350)• print(result)• plot(result,”Before”)• plot(result,”After”)• plot(result,”Shifts”)• plot(result)

• Many queries, one penalty• One query, many penalties• Reference can be NULL

CSIRO Issues in aligning multiple - MS spectra

Comparisons – Time

CSIRO Issues in aligning multiple - MS spectra

Summary

• Introduced GC-MS data• This talk is really about improving data quality • Improvement via alignment

• without data reduction

• without unnatural features

• via fast computation

• VPdtw available on CRAN • Faster

• Better than available alternatives

CSIRO Issues in aligning multiple - MS spectra

References

DTW:

Vintsyuk, T. K. Kibernetika 1968 4 81 - 88

Sakoe, H., and Chiba, S. Proceedings of the International Congress on Acoustics, Budapest, Hungary, 1971; paper 20 c 13.

Parametric Time Warping:

Eilers, P.H.C. Anal. Chem. 2004 76 404 - 411

Alignment Using Variable Penalty Dynamic Time Warping by Clifford, Stone, Montoliu, Rezzi, Martin, Guy, Bruce and Kochhar. Anal. Chem., 2009, 81 (3), pp 1000–1007

Thank you

Statistical Bioinformatics - AgribusinessDavid CliffordResearch ScientistCSIRO Division of Mathematics, Informatics and Statistics

Phone: +61 2 9325 3210Email: [email protected]: www.csiro.au/science/org/CMIS.html

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

CSIRO Issues in aligning multiple - MS spectra

VPdtw package – plot(result,”Before”)

CSIRO Issues in aligning multiple - MS spectra

VPdtw package – plot(result,”After”)

CSIRO Issues in aligning multiple - MS spectra

VPdtw package – print(result)

Reference is NULL. Query column # 13 is chosen at random.Query matrix is made up of 16 samples of length 5000.Single Penalty vector supplied by user.Max allowed shift is 150.

Cost Overlap Max Obs Shift # Diag Moves # Expanded # DroppedQuery #1: 1521.10 4994 51 4996 47 2Query #2: 1708.30 4996 53 5000 49 0Query #3: 1479.60 4998 59 5000 57 0Query #4: 1302.30 4998 62 5000 60 0Query #5: 1505.40 4996 61 5000 57 0Query #6: 1296.80 4997 60 5000 57 0Query #7: 1420.80 5000 61 5000 62 0Query #8: 1484.20 5000 59 5000 60 0Query #9: 1424.30 5000 51 5000 53 0Query #10:1306.30 4997 42 5000 39 0Query #11:1193.30 4994 29 4990 28 5Query #12: 225.04 4999 13 4998 13 1Query #13: 0.00 5000 0 5000 0 0Query #14: 266.09 4944 56 4894 2 53Query #15: 746.93 4937 63 4880 4 60Query #16: 345.87 4914 86 4836 0 82