61
Applied Physics Thesis Delft University of Technology Breath analysis by quantum cascade laser spectroscopy Author: Olav Grouwstra Direct supervisor: A. Reyes Reyes Supervisor: N. Bhattacharya Professor: H.P. Urbach March 5, 2015

Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Embed Size (px)

Citation preview

Page 1: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Applied Physics Thesis

Delft University of Technology

Breath analysisby quantum cascade laser

spectroscopy

Author:Olav Grouwstra

Direct supervisor:A. Reyes Reyes

Supervisor:N. Bhattacharya

Professor:H.P. Urbach

March 5, 2015

Page 2: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Contents

1 Introduction 31.1 Gas analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Quantum cascade laser spectroscopy . . . . . . . . . . . . . . . . . . . . 31.3 Previous work on the project . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Goals and thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Quantum Cascade Laser Spectroscopy 52.1 Absorbance spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 The quantum cascade laser . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Multipass cavity . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.3 Distance meter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Data Calibration 133.1 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Four measurement signals . . . . . . . . . . . . . . . . . . . . . . 133.1.2 Data acquisition and pre-processing . . . . . . . . . . . . . . . . . 153.1.3 Need for additional calibration . . . . . . . . . . . . . . . . . . . . 163.1.4 Wavenumber calibration . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 CO2 and H2O calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Method for calibration to CO2 and H2O . . . . . . . . . . . . . . 263.2.2 CO2 and H2O calibration results . . . . . . . . . . . . . . . . . . 27

4 Molecules and Concentrations 304.1 List of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.2 Molecules selected to distinguish between data sets . . . . . . . . 314.1.3 Selection based on measurable concentration . . . . . . . . . . . . 34

4.2 Concentration determination . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.1 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.2 Input and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Ethanol estimation by CO2 removal . . . . . . . . . . . . . . . . . . . . . 364.3.1 Motivation for CO2 removal . . . . . . . . . . . . . . . . . . . . . 364.3.2 Implementation of CO2 removal for ethanol estimation . . . . . . 37

1

Page 3: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

5 Disease prediction by machine learning 405.1 Machine learning for classification . . . . . . . . . . . . . . . . . . . . . . 405.2 Classification by trial-and-error . . . . . . . . . . . . . . . . . . . . . . . 40

5.2.1 Basic explanation of some algorithms used . . . . . . . . . . . . . 415.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Conclusions and outlook 436.1 QCL setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2 Data calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.3 Determination of molecules . . . . . . . . . . . . . . . . . . . . . . . . . . 446.4 Determination of concentrations . . . . . . . . . . . . . . . . . . . . . . . 446.5 Categorizing health status by classification . . . . . . . . . . . . . . . . . 45

Appendices 48

A Results for a sample of a healthy patient 49A.1 Standard molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49A.2 Molecules found by p-region analysis . . . . . . . . . . . . . . . . . . . . 49A.3 Concentrations found for a single healthy patient . . . . . . . . . . . . . 53

B List of molecules 58B.1 Breath molecules with absorbance in the 1020 to 1100 cm−1 region. . . . 58

2

Page 4: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

1. Introduction

1.1 Gas analysis

Gas analysis is done to find the different molecules present in a gas. Identifying thecomposition of a substance is vital for various practical applications such as medicaldiagnostics and environmental monitoring[1][2]. In this work the focus lies on gas analysisapplied to medical diagnosis.

The diagnosis of diseases is helped by many different techniques such as blood tests orradiological studies like magnetic resonance imaging (MRI), as they give an idea of what isinside the human body. Blood tests can provide this knowledge since changes in metabolicprocesses significantly influence the compounds in human blood. For example, diabetesincreases the amount of acetone in the bloodstream. Hence knowing the concentrationof acetone could help diagnose diabetes. The compounds in blood diffuse through thelungs into the breath we exhale, and diagnosis can therefore also be done by analysis ofhuman breath.

Breath analysis as opposed to blood tests has the benefit of being non intrusive,thereby being much more easily implemented by medical professionals. Non intrusivemedical instruments also undergo less strict control by regulatory parties like the Foodand Drug Administration (FDA) and the European Medicines Agency (EMA). Whilediagnosis through imaging techniques might give a better idea about the origin of adisease, it does not give information on the influence of a disease on the metabolicprocesses.

1.2 Quantum cascade laser spectroscopy

Quantum cascade laser (QCL) spectroscopy is only one among a variety of spectroscopymethods, each of which has its pros and cons.

Currently one of the most common techniques used for gas analysis is gas chromatog-raphy – mass spectrometry (GC-MS). Although GC-MS achieves high accuracy, it hascertain limitations. GC-MS is a specific test, meaning it requires a priori knowledge ofwhat to test for[3]. This means it is limited in use to only cases where the symptoms arealready present, and thus requires pre-diagnosis. GC-MS also requires sample prepara-tion, limiting its use to trained professionals. A gas analysis method using a quantumcascade laser for spectroscopy is treated in this work. This form of analysis does not needsample preparation and is therefore more easily accessible. A priori knowledge of what

3

Page 5: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

molecules to test for is also not necessary for the measurement itself, which makes it apromising tool for laying statistical relations between concentrations of molecules andthe health status of individuals.

Photoacoustic spectroscopy (PAS) is another useful technique. It works, like QCLspectroscopy, by shining light on a sample. However in PAS it is the absorbed light thatcreates the signal by thermally expanding the sample, thereby creating a sound wavewhich can be measured. As in the case of QCL spectroscopy, PAS requires no samplepreparation. But PAS requires the signal to go through more energy conversions, therebymaking it susceptible to vibrational noise[4].

Electronic noses are also used for the detection of gasses. These are however madespecifically to detect specific molecules, as opposed to broadband QCL spectroscopywhich can detect any molecule with an absorbance spectrum overlapping with the em-mission spectrum of the QCL.

1.3 Previous work on the project

A spectroscopy setup using a quantum cascade laser (QCL) is described and the subse-quent data processing is extensively treated. The setup is made and extensively treatedby A. Reyes Reyes as his PhD project funded by Fundamenteel Onderzoek Materie(FOM) foundation[5]. Z. Hou developed the data acquisition under supervision of A.Reyes Reyes[3]. Measurements processed in this research are from breath samples of35 healthy volunteers, and 35 asthmatic volunteers. Two different breath samples areused from each volunteer. These samples are from children from the Sophia Children’shospital in Rotterdam, all with their parents’ informed consent. From the absorbancespectra obtained from these samples a uncertainty in the wavenumbers is found. In thiswork calibration is done through post-processing in order to reduce this uncertainty.

1.4 Goals and thesis structure

The goal of this work is twofold. Firstly to decrease the uncertainties with which thecompounds and its concentrations are determined. The second goal is to find a reli-able way to distinguish and identify breath of healthy and asthmatic children from thecompounds present and their concentrations.

This thesis works towards these goals by first explaining the general working principleof the setup and theory behind QCL spectroscopy, where the parts having the most influ-ence on the noise and resolution of the signal are highlighted. The absorbance measuredspectrum and its calibration are then treated. This is followed by an explanation of howthe present species and their concentrations of various molecules are found. Finally away of distinguishing between samples of healthy and asthmatic patients is treated, andthe overal results are discussed.

4

Page 6: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

2. Quantum Cascade LaserSpectroscopy

In this chapter the relation between the intensity of the laser light, the absorbance ofthe gas, and the concentrations of the molecules in the gas is explained. The necessaryparameters to measure and the equipment that is used to measure these parameters areexplained as well. Measurements on the gas are performed with a spectroscopy setupusing a quantum cascade laser (QCL). The various components and conditions of themeasurement setup that influence the determination of the gas compounds and con-centrations are explored. In order to understand how these components influence theeventual concentration, the path of light through the components is described startingfrom the quantum cascade laser and ending at the mercury cadmium telluride (HgCdTe)detectors. Getting the absorbance spectrum from a breath sample takes around 60 min-utes, 5 minutes of which is comprised by the actual scanning of the gas by the setup.Transferring and loading the large datasets between computers and into processing soft-ware takes up the most time. The setup as it is used for sample measurements is buildby A. Reyes Reyes[5].

2.1 Absorbance spectrum

In order to obtain information on the gas through spectroscopy, the interaction betweenlight and matter needs to be established. The relation of the concentrations of thecompounds in the gas to the light passing through the gas is given by the Beer-Lambertlaw[6, 7]:

Io(ν, C) = Ii(ν)10−A(ν,C) (2.1)

where Io is the beam intensity after interacting with the gas, Ii the beam intensity beforeinteracting with the gas, and ν and C denote the dependence on the wavenumber andthe concentrations of the compounds respectively. A is the absorbance, as given by

A(ν, C) =n∑i=1

εi(ν)cilpath (2.2)

withC = (c1, c2, ..., cn) (2.3)

where n is the amount of molecular species, and ci and εi(ν) denote the concentra-tion and molar absorptivity of each molecular species in the gas. The vector C is used

5

Page 7: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

simply as an abbreviated way to indicate the dependence of the absorbance A on theconcentrations of all molecular species. The absorbance is also dependent on the inter-action length lpath of the light with the gas. In order to determine the concentrationsci of the molecules, all other parameters in Equation 2.1 and Equation 2.2 should beknown. The light is provided by the QCL, and the intensities Io and Ii are obtained byprocessing the signals measured by the detectors in the setup. The molar absorptivityεi(ν) is a measure for how strongly a certain molecule species absorbs light at a givenwavelength. It is an intrinsic property of a molecule and can be considered unique toeach molecule. Hence the molar absorptivity can be looked up in the Northwest In-frared database (NWIR)[8] and the high resolution transmission molecular absorptiondatabase (HITRAN)[9], which are put together by the Pacific Northwest National Labo-ratory (PNNL) and the Harvard-Smithsonian Center for Astrophysics respectively. Theinteraction length lpath is a property of the setup, and more specifically of the cavity, andcan be measured from the cavity as described in subsection 2.2.2.

2.2 Setup

To find the concentration of compounds within a gas the spectroscopy setup depicted inFigure 2.1 is used. The laser beam is produced at the QCLs. From the QCL the beamfalls onto a beam splitter, causing one of the beams to go straight to the monitor detector,which is used to measure the intensity spectrum as emitted by the QCL. The other beamfrom the beam splitter travels to the multipass cavity. The beam gets partially absorbedby the gas present in the cavity, and then travels onwards to the main detector. Thedistance meter is placed in such way that its beam coincides with that of the QCLs inorder to measure the optical path length in the cavity. The two detectors relay theirsignals to the NI-PCI 5922 acquisition card, which in turn sends the signals to thecomputer.

6

Page 8: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Figure 2.1: Diagram of the quantum cascade laser spectroscopy setup as used for mea-suring. Courtesy of Z.Hou[3]

2.2.1 The quantum cascade laser

The laser system used is Block Engineering’s LaserScope which consists of two QCLs thatare arranged such that together they can scan over the wavenumber range of 832 cm−1

to 1263 cm−1[10]. As the LaserScope is by itself designed to be a spectroscopic system,it has an internal detector and a set of two QCLs which are portrayed in Figure 2.1 asthe monitor detector and the QCLs. The main specifications on the LaserScope can befound in Table 2.1, where it should be noted that the spectral resolution stated is theoutput when the signal is processed using Block Engineering’s own software.

Table 2.1: Specifications of the quantum cascade laser as given by Block Engineering[10].

Parameter SpecificationSpectral region 1267 to 833 cm−1 (8 to 12 µm)Tuning Continuous or single wavelengthEmission type PulsedPulse repetition frequency 200 kHzMinimum average power 0.5 mWMaximum average power 12 mWBeam diameter 2.5 mm x 5 mm ellipseSpectral resolution Down to 1 cm−1

By way of its design the QCLs emit different wavenumbers of light simultaneously[11].The HgCdTe detectors detect the intensity of the light without differentiating betweenthe various wavenumbers. Since intensity per wavenumber is a necessity, a Littrow con-figuration with a diffraction grating mounted on a piezo is used, as seen in Figure 2.2[3].

7

Page 9: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Figure 2.2: Diagram of a Littrow configuration external cavity diode laser with backextraction. Courtesy of Z.Hou[3]

This Littrow configuration uses a diffraction grating to steer the first order diffractionof the light with the desired wavenumber back through the laser diode, and into the setup.The wavenumber selected using the grating depends on the spacing of the grating andthe incident angle of the light on the grating. The incident angle is controlled by applyinga voltage over the piezo.

The intensity of the light emitted by the QCL is furthermore influenced by modehopping, which can be caused by temperature fluctuations in the gain medium of thelaser or fluctuations in the injection current. In chapter 3 a closer look is taken at thesignal from the QCL and its noise. The intensity spectrum of the QCL system is givenin Figure 2.3. The general shape of this signal portrays the intensity spectrum the laseroutputs. The intensity drop around 1010 cm−1 indicates the switch between the twoQCLs that together scan over the entire wavenumber range.

8

Page 10: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

850 900 950 1,000 1,050 1,100 1,150 1,200 1,250

0.2

0.4

0.6

0.8

1

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)Spectrum emitted by the QCL

Figure 2.3: Intensity spectrum of the laser system used: Block Engineering’s LaserScope.

2.2.2 Multipass cavity

A cavity is used to achieve a large interaction length lpath between the gas and thebeam. The cavity is depicted in Figure 2.4. The beam enters the cavity through a ZnSewindow. It then reflects from the two mirrors at either end of the cavity, until the beamfalls on the window again and exits the cavity. The mirrors (Aerodyne Research, Inc.)are astigmatic, meaning each mirror has two different radii of curvature which allowsfor more reflections to take place in order to have a larger interaction distance, lpath[12].A large interaction distance is desirable since it increases the absorbance. A multipasscavity is used in favor of a resonance cavity since the high reflection mirrors as used in aresonance cavity are not available for the wide wavelength range used in this experiment.

9

Page 11: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Figure 2.4: Diagram of the multi-pass cavity. The ZnSe window is located on the left inyellow, and the two mirrors in blue on either end of the cavity. Courtesy of Z.Hou[3]

The length of interaction of the light with the gas lpath is measured using a laserdistance meter to be 54.36 meter[3]. A multipass cavity with such a long path length isused in order to have a large absorbance. The mirrors in the cavity and the ZnSe windowleading in to the cavity also absorb 94.85% of the light[3, p. 25]. Part of the light is ab-sorbed in the cavity as according to the absorbance profiles of the molecules constitutingthe gas. The remaining light exits the cavity and is measured at the main detector. Inorder to clean the cavity between measurements a series of mass flow controllers and adiaphragm pump are used. This way contamination in the cavity is prevented[5].

2.2.3 Distance meter

The distance meter (Leica DISTO D2) emits light pulses along the path as depicted bythe red dashed line in Figure 2.1. The path the pulse travels outside the cavity is simplymeasured with a ruler. By looking at Figure 2.1 it can be seen that by subtractingthe length of the path the pulse travels outside of the cavity from the total length ascalculated by the distance meter, the optical path length inside the cavity is obtained[5].This optical path length inside the cavity is also checked by measuring the time a pulsefrom the QCLs takes straight to a detector compared to the time a pulse from the QCLstakes when it passes through the cavity and then onto a detector. The delay betweenthe pulses measured is given in Figure 2.5, from which the same result of 54.36 cm isobtained.

10

Page 12: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Figure 2.5: Delay time between a pulse going straight to the a detector and a pulsepassing through the cavity for three different wavenumbers. (a) 950 cm−1, (b) 1000cm−1, (c) 1150 cm−1. Courtesy of A. Reyes Reyes[5]

2.2.4 Detection

A HgCdTe detector measures the light before it enters the gas, and another HgCdTemeasures the light when it exits the gas. These are called the monitor and main detectorrespectively, and the signals obtained from these detectors will henceforth be referred toas the monitor signal and the main signal. The specifications of the monitor detector aregiven in Table 2.2.

Table 2.2: Specifications of the monitor detector as given by Block Engineering[13].

Parameter SpecificationDetector element TE cooled HgCdTeActive area 1x1 mm2

Window BaF2

Wavelength range 6–12 µmTotal gain (typical) 800 V/WCut-off frequency range 10 Hz – 20 MHzOperating temperature (typical) 223 KSaturation threshold power (at 8 µm, 20 MHz speed) 5 mW (peak)

The main detector is from VIGO System S.A. with part number PVI-4TE-10.6-0.3x0.3-TO8-BaF2, the specifications of which are given in Table 2.3.

11

Page 13: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table 2.3: Specifications of the main detector as given by VIGO System S.A.[14].

Parameter SpecificationDetector element TE cooled HgCdTeActive area 0.3 × 0.3 mm2

Window BaF2

Wavelength range 2–12 µm

Detectivity at λpeak ≥ 4.0 × 109 cm·√Hz

W

Detectivity at λopt ≥ 2.0 × 109 cm·√Hz

W

Optimal wavelength 10.6 µmOperating temperature (typical) 195 K

The intensity of the light emitted by the QCL IQCL as seen in Figure 2.6 varies permeasurement due to temperature fluctuations and noise on the voltage applied over thepiezo. Therefore the signals of multiple measurements need to be normalized in order tobe compared to each other, for which the two detectors are used.

1,173.35 1,173.4 1,173.45 1,173.5 1,173.55 1,173.6 1,173.65 1,173.7 1,173.75 1,173.8

7

7.5

8

8.5

9

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Measurements of emitted intensity of the QCLs

Figure 2.6: Intensity spectrum of various measurements with the monitor detector.

12

Page 14: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

3. Data Calibration

3.1 Wavenumber calibration

As mentioned in chapter 2 the intensity of the light emitted by the QCLs is subject tonoise. This noise is corrected for through normalisation by using the monitor and maindetector. In this chapter an attempt is made to reduce the standard deviation in themonitor and the main signals as caused by an uncertainty of what wavenumber is emittedfrom the QCLs. The wavenumber corresponding to the intensity measured at the detec-tors is determined by the angle of the grating mounted on the piezo, as mentioned insubsection 2.2.1. The wavenumber attributed to the measured beam intensities at a pointin time is determined by the voltage on the piezo, which should predictably determinethe angle of the grating. Since there is noise on the voltage over the piezo element thebehaviour of the piezo is not completely predictable, and thus the angle of the gratingcan not be told precisely by the voltage at a point in time on the piezo. This gives anuncertainty to the wavenumbers attributed to the measured beam intensities, as can beseen in Figure 3.1. The wavenumber data in Figure 3.1 is obtained by recording filesmade by Block Engineering’s firmware which measures the voltage over the piezo andcontains knowledge of what voltage corresponds to what wavenumber. UnfortunatelyBlock Engineering’s firmware outputs its spectrum at lower wavenumber resolution thanthe acquisition described in subsection 3.1.2 using the NI-PCI 5922 acquisition card.

This wavenumber uncertainty in turn increases the standard deviation of the ab-sorbance when multiple measurements are compared. QCL systems are a relatively newdevelopment and the QCLs in the Laserscope should be among the best available cur-rently, hence there is no certainty that significant improvements can be made with theknowledge on QCL design and piezo elements currently available. With this, and thecomplicated nature of the electronics in the LaserScope QCL system, it could take along time to replace the piezo, for which there is no time in the scope of the project.Instead wavenumber calibration is done in order to reduce this uncertainty in absorbanceby calibration of multiple measurements.

3.1.1 Four measurement signals

As established in section 2.2 a main signal and a monitor signal are obtained duringeach measurement. Multiple measurements are performed per sample. In order to getinformation on the gas, measurements are made using the main detector when a gas is

13

Page 15: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Measuring time - wavenumber relation

200 400 600 800 1,000

850

900

950

1,000

Time stamp (a.u.)

Wavenu

mb

er

(cm

-1)

(a)

468 468.5 469 469.5 470

920.8

921

Time stamp (a.u.)W

avenum

ber

(cm

-1)

(b)

Figure 3.1: Effect of the voltage noise on the wavenumber emitted of several measure-ments over time. (a) Several scans superposed. (b) A zoom into a section of the figureshown in (a).

inside the cavity. These measurements will be called gas measurements from now on.In order to single out any effects due to the different paths the beams take to the mainand monitor detector, measurements are also done without gas in the cavity. Thesemeasurements without gas will henceforth be referred to as reference measurements.These reference measurements are also used to establish the beam intensity withoutinteraction with the gas, which is Ii in Equation 2.1. The effects of the components ofthe setup such as the cavity, the lenses, and the mirrors on the beam to the main detectorare all packed together in a factor fpath1(ν). This factor is defined such that multiplyingthe intensity of the beam as it exits the QCLs by this factor gives the intensity of thebeam as it arrives at the main detector, and the factor therefore takes values anywherebetween 0 and 1 for the wavenumber range. fpath2(ν) and fgas(ν, C) are similarly definedfor the effect of the elements between the QCLs and the monitor detector and for theeffect when a gas sample is present.

Due to effects in the QCLs, the signals vary in intensity between different measure-ments. Therefore during both gas and reference measurements monitor signals are alsorecorded. The monitor signals are necessary to be able to reduce the effects of the QCL,as the QCL affects the signal of the monitor and the main detector similarly and cantherefore be identified. For a clear overview the various effects on the signal in the twotypes of measurement of both detectors are laid out in Table 3.1. Wavenumber depen-dence in Table 3.1 is indicated by ν and ν‘, to indicate that different measurements cannot be compared wavenumber-wise. This is a result from the uncertainty caused by ef-

14

Page 16: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

fects in the QCL.

Table 3.1: The contributions to the intensity as measured by the two detectors.

Signal type Gas measurement Reference measurementMain IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C) IQCL(ν)fpath1(ν)Monitor IQCL(ν‘)fpath2(ν‘) IQCL(ν)fpath2(ν)

3.1.2 Data acquisition and pre-processing

The signal from the detectors is processed before it is calibrated, as by the methoddeveloped by Z. Hou[3]. The light emitted by the QCL is a series of pulses lasting 333ns and spaced approximately 5600 ns apart. These light pulses go through the setup andend up at the detectors, where the subsequent signals are then sampled at a rate of 10MHz, or 1 sample per 100 ns, by the NI-PCI 5922 acquisition card. Since the saving ofthe data from the memory of the acquisition card to the computer is one of the mosttime consuming part of the data acquisition, the data is filtered using LabVIEW suchthat only the peaks and valleys of the pulse signal are saved to the computer. For this,LabVIEW’s peak detection is used at a width of 3 points in order to get an amplitudeerror for the peaks of 0.39%. As the valleys are essentially parts where the laser is emittingno light but there is still a signal, the signal strength of the valleys surrounding a peakare subtracted from the peak to normalize out any bias due to electronics of the detector.A final resolution of 0.05 cm−1 is achieved. For each sample ten gas measurements andten reference measurements are done and processed this way. The spectrum of one ofthese processed monitor signals looks as shown in Figure 3.2. The entire spectrum spansfrom 832 cm−1 to 1263 cm−1.

15

Page 17: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,3000

0.2

0.4

0.6

0.8

1

1.2

1.4·109

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Monitor signals of gas and reference measurement

Figure 3.2: A monitor signal after pre-processing as described in subsection 3.1.2.

3.1.3 Need for additional calibration

The absorbance is obtained by inverting Equation 2.1 such as:

A(ν, C) = − log10

(Io(ν, C)

Ii(ν)

). (3.1)

Using Table 3.1 the measured signals are substituted for the beam intensity with gasin the cavity Io and the beam intensity gas in the cavity Ii:

Io(ν‘, C) =Imain,gasImon,gas

=IQCL(ν‘)fpath1(ν‘)fgas(ν‘, C)

IQCL(ν‘)fpath2(ν‘)=fpath1(ν‘)fgas(ν‘, C)

fpath2(ν‘)(3.2a)

Ii(ν) =Imain,refImon,ref

=IQCL(ν)fpath1(ν)

IQCL(ν)fpath2(ν)=fpath1(ν)

fpath2(ν), (3.2b)

where the wavenumber dependence as indicated by ν and ν‘ shows that different mea-surements can not be compared wavenumber-wise as shown at the start of section 3.1.

16

Page 18: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

The division by the monitor signals is done to remove the influence of the QCLs’ inten-sity instability in IQCL(ν) and IQCL(ν‘). By substituting Equation 3.2 into Equation 3.1Equation 3.3 is obtained:

A(ν, ν‘, C) = − log10

fpath1(ν‘)fgas(ν‘,C)

fpath2(ν‘)

fpath1(ν)

fpath2(ν)

= − log10

[fpath1(ν‘)fpath2(ν)fgas(ν‘, C)

fpath1(ν)fpath2(ν‘)

].

(3.3)Note that the absorbance A(ν, ν‘, C) is still dependent on the influences of the op-

tical path because the two separate wavenumber dependencies cannot be resolved. Themean absorbance from ten measurements of a single sample with the standard deviationcalculated per wavenumber are shown in Figure 3.3.

Mean absorbance and standard deviation of ten measurements

800 1,000 1,200−0.2

0

0.2

0.4

Wavenumber (cm-1)

Ab

sorb

ance

(a)

1,172 1,173 1,174 1,175 1,176

0

0.1

Wavenumber (cm-1)

Abso

rban

ce

(b)

Figure 3.3: Mean absorbance (red line) and standard deviation (dashed blue lines) of tenmeasurements. (a) Entire scan range. (b) A zoom into a section of the scan shown in(a).

The large standard deviation in Figure 3.3 is partly because the multiple measure-ments over which the mean is taken cannot be compared properly, since the relationbetween the wavenumbers of different measurements is not known. When looking closelyat Figure 3.4a and Figure 3.4b, this wavenumber discrepancy is seen as the measurementsdo not overlap.

17

Page 19: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

1,172 1,174 1,176

0.6

0.8

1

·109

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Uncalibrated monitor measurements

(a)

1,172 1,174 1,176

1

1.5

2

·108

Wavenumber (cm-1)In

tensi

ty(c

ounts

)

Uncalibrated main measurements

(b)

Figure 3.4: Ten uncalibrated intensity measurements of a single sample. Five of these arefrom measurements with a gas in the cavity, and five are from a reference measurement.(a) Measured signals at the monitor detector. (b) Measured signals at the main detector.

This local horizontal shift from one measurement to another results in an increase instandard deviation of the eventual absorbance and concentration. The absorbance of asingle sample is calculated from multiple measurements with gas inside the cavity andthe reference measurements without gas in the cavity. All these measurements need tobe aligned to the same wavenumber values.

3.1.4 Wavenumber calibration

In order to calibrate the various monitor and main signals, a basis to which they canall be calibrated is chosen. The mean of all the main signals is not a useful basis tocalibrate towards since the main signals of the reference measurements differ from themain signals of the gas measurements because in the latter a gas is present and absorbspart of the light. The mean of all the monitor signals is used as basis for calibrationsince the monitor signals are nearly identical for all measurements because they do notenter the cavity and therefore are not affected by the gas in the cavity. Since the monitorand main signal of a single measurement suffer from the same wavenumber discrepanciesfrom the QCL, the transformations that calibrate the monitor signals also calibrate theirrespective main signals.

Calibration starts with the monitor signals, and is done towards the mean per wavenum-ber of all the monitor signals of a sample, as given in Figure 3.5 as a dashed red line.The positions of the peaks and valleys of this mean are then determined, as denoted byblack boxes in Figure 3.5. Each monitor signal is calibrated separately to the mean of

18

Page 20: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

the monitor signals. Calibration of a monitor signal as denoted by the blue line in Fig-ure 3.5 towards a single valley of the mean is shown in Figure 3.5b and Figure 3.5c. Theposition of the valley of the mean used to calibrate is marked by a vertical dashed greyline, and the two surrounding peaks of the mean are marked by vertical dashed blacklines in Figure 3.5b. The minimum value of the monitor signal between the two positionsof peaks of the mean is taken, as marked by the star in Figure 3.5c. This minimum valueis then taken as the signal’s valley corresponding to the valley of the mean as marked bythe grey line. This valley of the monitor signal is then shifted towards the correspondingvalley of the mean of all the monitors. This process is repeated for all valleys of the mean.The peaks of the monitor signal are calibrated in a similar fashion. Every signal is nowlocally shifted so its peaks and valleys match that of the mean signal. All data betweenpeaks and valleys is stretched and compressed by means of resampling to fit the adjustedpeaks and valleys. Since the data between a peak and a valley is mostly straight, linearinterpolation is sufficient for resampling.

19

Page 21: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Calibration of a monitor signal

1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176

6

7

8

9

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ou

nts

)

Mean of all monitorsA monitor signalPeaks and valleys of mean

(a)

1,174 1,174.5 1,175

7

8

9

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

(b)

1,174 1,174.5 1,175

7

8

9

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

(c)

Figure 3.5: Calibration of a monitor signal to a valley of the mean.

The complete calibration of a monitor signal following the procedure shown in Fig-ure 3.5 can be seen in Figure 3.6.

20

Page 22: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

1,172 1,172.5 1,173 1,173.5 1,174 1,174.5 1,175 1,175.5 1,176

6

7

8

9

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Calibrated monitor signal

Mean of all monitorsA monitor signalPeaks and valleys of mean

Figure 3.6: A single monitor signal calibrated towards the mean of all monitor signals ofa sample.

The miss-calibrated blue peak in Figure 3.6 around 1175.2 cm−1 is a consequence ofthe minimum of the monitor signal between the surrounding peaks of the mean beinglocated at a higher wavenumber rather than a lower wavenumber of this monitor peak.Since the indices of the peaks and valleys of the monitor signal found are later sorted withthe assumption of peaks and valleys alternating, the small blue peak gets miss-assignedas a valley and is consequently put at the place of the valley of the mean at 1175.2 cm−1

instead of peak of the mean at 1175.4 cm−1. This can be solved by comparing the indicesof the monitor peaks with those of the monitor valleys, and replacing the index of themonitor valleys of which the index is larger than that of the subsequent peak by theindex of the minimum between the monitor peak and the monitor peak before that.

Figure 3.7 demonstrates how the wavenumber calibration procedure calibrates tenmeasurements of a sample towards each other. Five of the signals are from gas measure-ments, and five are from reference measurements. Figure 3.7c and Figure 3.7d are thewavenumber calibrated versions of Figure 3.7a and Figure 3.7b respectively.

21

Page 23: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

1,172 1,174 1,176

0.6

0.8

1

·109

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Uncalibrated monitor measurements

(a)

1,172 1,174 1,176

1

1.5

2

·108

Wavenumber (cm-1)In

tensi

ty(c

ounts

)

Uncalibrated main measurements

(b)

1,172 1,174 1,176

0.6

0.8

1

·109

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Calibrated monitor measurements

(c)

1,172 1,174 1,1761

1.5

2

·108

Wavenumber (cm-1)

Inte

nsi

ty(c

ounts

)

Calibrated main measurements

(d)

Figure 3.7: Ten monitor signals and their respective main signals shown uncalibratedand calibrated. (a) Uncalibrated monitor signals. (b) Uncalibrated main signals. (c)Calibrated monitor signals. (d) Calibrated main signals.

The calibration done this way does not align the signals perfectly, as can be seenin Figure 3.7. The majority of the monitor peaks and valleys as seen in Figure 3.7c

22

Page 24: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

are aligned perfectly, and slightly unaligned peaks can be attributed to the calibrationmethod only looking for a valley or peak of the signal between two peaks or valleys ofthe mean. When the valley or peak of the signal is located exactly on a peak or valley ofthe mean, the closest point to the peak is most likely the highest and therefore will bealigned as a peak, giving a slight miss-alignment.

For each sample ten gas measurements and ten reference measurements are done,giving a total of twenty monitor signals and twenty main signals. The drop in standarddeviation of these signals as a result of the calibration can be seen in Table 3.2. Thedrop in standard deviation is calculated for signals from reference and gas measurementsseparately, and for the collection of signals of both reference and gas measurements. Thenumbers indicate the percentage drop from the uncalibrated signals to the calibratedsignals. From Table 3.2 it seems that the calibration works better for monitor signalsthan it does for main signals. This can be attributed partly to the fact that the monitorsignals served as the basis of the calibration, and when looking at Figure 3.7 it does seemthat the calibration works better for the monitor signals. However, since each monitor andits respective main signal experience the same wavenumber distortions from the QCLs,and the exact transform used on each monitor signal is also used on its respective mainsignal, the large difference in calibration result can not entirely be attributed to the basisof calibration. Rather the standard deviation in the main signals is already relativelyhigher than in the monitor signals, and the wavenumber uncertainty is a smaller partin the total standard deviation of the main signals than in the monitor signals. Thuscalibrating for the wavenumber uncertainty decreases less the total standard absorbanceof the main signals.

Table 3.2: Decrease in standard deviation for the monitor and main signals as result of thewavenumber calibration. This decrease in standard deviation is calculated separately forjust gas measurements, just reference measurements, and gas and reference measurementstogether.

Signal type Gas measurements(%)

Referencemeasurements (%)

Gas & referencemeasurements (%)

Main 7.04 6.50 6.46Monitor 16.6 17.0 18.4

It is also interesting to note that the standard deviation of the main signals improvesless when both reference and gas measurements are taken into account as opposed to onlyone type of measurement. This can be explained because the gas and reference measure-ments of the main signal should be different due to the absorption of the main signalby the gas in the gas measurement. Because the gas and reference measurements are sodifferent, the total standard deviation when taking both gas and reference measurementsinto account is greater than that of the separate measurements, and hence the standarddeviation caused by the wavenumber uncertainty is a smaller part.

After aligning all these signals as explained in subsection 3.1.4, ten absorbances—eachfrom a combination of the four signals from Table 3.1—are calculated as according toEquation 3.1, Equation 3.2, and Equation 3.3. However since the wavenumbers of the

23

Page 25: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

various measurements are now calibrated,

ν = ν‘, (3.4)

and thus the absorbance as in Equation 3.3 becomes

A(ν, C) = − log10

[fpath1(ν)fpath2(ν)fgas(ν, C)

fpath1(ν)fpath2(ν)

]= − log10 [fgas(ν, C)] . (3.5)

The standard deviation of these ten absorbances decreases from 0.0313 before calibra-tion to 0.0291 after calibration, a decrease of 7.06%. For a large part of the absorbancespectrum this is still a relatively large standard deviation as seen in Figure 3.8, wherethe mean absorbance found this way for one sample is given.

850 900 950 1,000 1,050 1,100 1,150 1,200 1,250

0

0.5

1

1.5

2

2.5

·10−1

Wavenumber (cm-1)

Abso

rban

ce

Absorbance calibrated by wavenumber

Figure 3.8: The wavenumber calibrated absorbance as calculated from the measurementsof a single sample.

3.2 CO2 and H2O calibration

When looking at the absorbance spectrum of a breath sample as in Figure 3.8 there area few regions that stand out by their relatively high absorbance. From Equation 3.1 it

24

Page 26: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

can be seen that high molar absorptivities ε(ν) in a region and high concentrations ofmolecules with such molar absorptivities contribute to a high local absorbance. Whenchecking for molecules known to have relatively high concentrations in breath such asCO2 with about 4% per volume, H2O with about 5% per volume, and N2 with about78% per volume, a cause of the high absorbance regions is easily identified, as seen inFigure 3.9. The molar absorptivity spectrum overlays of CO2 and H2O in Figure 3.9 arescaled with factors 60 and 5 · 104 respectively to clearly show the matching regions. CO2

and H2O molar absorptivities are taken from HITRAN database[9].

850 900 950 1,000 1,050 1,100 1,150 1,200 1,250

−0.5

0

0.5

1

1.5

2

2.5

·10−1

Wavenumber (cm-1)

Abso

rban

ce(b

ase-

10)

Absorbance with overlaid CO2 and H2O molar absorptivity

Experimental dataCO2

H2O

Figure 3.9: Absorbance from wavenumber calibrated data overlaid with spectra of CO2

and H2O from HITRAN database[9].

When looking more closely at these matching regions between the absorbance spec-trum and CO2 and H2O, it is seen in Figure 3.10 that even after the wavenumber calibra-tion the absorbance spectrum does not completely match CO2 and H2O from literature.

25

Page 27: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083

0

0.5

1

1.5

2

·10−1

Wavenumber (cm-1)

Abso

rban

ce(b

ase-

10)

Closer look at absorbance with CO2 and H2O overlaid

Experimental dataCO2

H2O

Figure 3.10: Absorbance from wavenumber calibrated data overlaid with spectra of CO2

and H2O from HITRAN database[9].

A method similar to the wavenumber calibration is used in order to calibrate theabsorbance spectrum to the database spectra of CO2 and H2O. The difference is thatnow only the peaks are matched to CO2 and H2O, and all data points in between,including valleys, are linearly interpolated between the matched peaks.

3.2.1 Method for calibration to CO2 and H2O

Input and preprocessing

This method uses the absorbance spectrum as obtained from the wavenumber calibra-tion, the accompanying wavenumber range, and molar absorptivity spectra from CO2

and H2O. Since the method used for the calibration to CO2 and to H2O are almost iden-tical, only the CO2 case is described here. The CO2 spectrum is taken from HITRANdatabase[9], and in the range of 832 cm−1 to 1263 cm−1 with a resolution of 0.05 cm−1.The CO2 spectrum is scaled by a factor 60 so the height of its peaks approximatelymatches that of the experimental data.

26

Page 28: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

CO2 peak and absorbance peak identification

To facilitate the calibration of the experimental data to the CO2 spectrum from theHITRAN database, the peaks and valleys of the CO2 database spectrum serve as refer-ence points. The peaks and valleys of the CO2 database spectrum are identified usinga peakfinder[15] algorithm. The peakfinder algorithm finds peaks above a certain ad-justable threshold, based on its value compared to the surrounding data points. Thepeakfinder algorithm then returns the indices and intensities of all data points that itidentifies. In case the first and last peak are before and respectively after the first andlast identified valley, the first and last peak in the spectrum are removed. This way therange to adjust always starts and ends with a valley of the CO2 database spectrum.

Now that all peaks and valleys of the CO2 database spectrum are found, they areused in order to find the absorbance peaks and valleys of the experimental data. The firstabsorbance peak is identified by taking the maximum absorbance between the first twovalleys of the CO2 database spectrum. The neighbouring absorbance valley is identifiedby taking the minimum between the first absorbance peak and the next peak of theCO2 database spectrum. The next valleys/peaks of the experimental data are found bylooking between the last peak/valley found from the experimental data and the nextpeak/valley from CO2 of the database, alternating between finding a peak and a valley.This process is repeated to the last identified valley of the database CO2 spectrum. Thisway all absorbance valleys are found between two peaks, and all absorbance peaks arefound between two valleys. Thus the same amount of absorbance peaks and valleys fromthe experimental data are identified as peaks and valleys of the CO2 database spectrum,making it easy to match them1.

Peak calibration

All the peaks and valleys of the CO2 database spectrum are now put in a single list, andordered by the index. The same is done for the peaks and valleys of the experimentaldata. The peaks of the experimental data list are now matched to coincide to the peaksof CO2, and all the measurement data in between two peaks is interpolated to fit betweenthe surrounding CO2 peaks. This same procedure is applied to H2O. The result of thesecalibrations is seen in Figure 3.11.

3.2.2 CO2 and H2O calibration results

Except for the sides of the CO2 and H2O spectra where the peak heights are around thenoise level, nearly all CO2 and H2O peaks seem to be correctly calibrated. This is alsoindicated by Figure 3.11, though some peculiarities are worth mentioning.

Around 1081 cm−1 the measured absorbance has no peak, yet according to the CO2

spectrum it should have. One reason for the absence of an absorbance peak could bethat the calibration algorithm miss-calibrated, and the next absorbance peak at 1082.3

1Since there is one more CO2 valley than CO2 peak, the amount of found absorbance peaks matchesCO2 peaks. However, two less absorbance valleys are found than there are CO2 valleys. Hence the firstand last CO2 valley are removed.

27

Page 29: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,0830

0.5

1

1.5

2

2.5

3

·10−1

Wavenumber (cm-1)

Abso

rban

ce(b

ase-

10)

Closer look at absorbance calibrated to CO2 and H2O

Experimental data adjusted for CO2 and H2OCO2

H2O

Figure 3.11: Absorbance calibrated to CO2 and H2O overlaid with spectra of CO2 andH2O from HITRAN database[9].

cm−1 should actually be at 1081 cm−1. This can be rejected on the basis that the spacingof the other absorbance peaks seems to match well with the spacing of the CO2 peaks,even before calibration as seen in Figure 3.10. The question remains—taking that nomiss-calibration took place around 1081 cm−1—why there is no peak when a peak wouldbe expected from looking at the CO2 database spectrum. This might be explained bynoise, however when looking at the measured absorbance’s constituent monitor and mainsignals the noise is not any higher than at other places.

A related peculiarity, namely the height of the measured absorbance peaks not match-ing with the heights of the CO2 peaks in general, is also at first ascribed to noise. However,when comparing absorbances of multiple different samples as in Figure 3.12, it can beseen that the height of each peak is about equal between samples. This indicates that thedifference in height of the various measured absorbance peaks to the CO2 peaks is not amatter of noise, but rather a consistent wavenumber dependent influence from a part ofthe setup. A source for these mismatching absorbances has not yet been pinpointed.

One event of miss-calibration can still be argued to have taken place. In Figure 3.10the closest thing to an absorbance peak near the CO2 database peak at 1081.1 cm−1 isthe peak located at 1081.8 cm−1, which is not calibrated to the CO2 database peak as

28

Page 30: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

seen in Figure 3.11. From Figure 3.10 it can be seen that this small peak was not detectedand calibrated since it is located outside the region where the search was performed, fromthe valley in the absorbance spectrum at 1081.0 cm−1 to the valley in the CO2 spectrumat 1081.6 cm−1. This might be solved by increasing the search region, though this couldcause problems for other peaks. It can also be argued that since the small peak at 1081.8cm−1 is actually located closer to the CO2 peak at 1082.3 cm−1 than to the peak at1081.1 cm−1, it is not necessarily better to match it to the peak at 1081.1 cm−1. It canalso not be said whether the absorbance peak that would be expected near the smallpeak would be located at the same position, and thus no conclusion can be made if thispeak would also not have been calibrated to its corresponding CO2 peak at 1081.1 cm−1.

1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,0830

0.5

1

1.5

2

2.5

3

·10−1

Wavenumber (cm-1)

Abso

rban

ce(b

ase-

10)

Comparison of height of CO2 peaks to measured samples

Experimental dataExperimental dataExperimental dataCO2 spectrum

Figure 3.12: Measured absorbances of three different samples calibrated to CO2 andH2O. The molar absorptivity overlay of CO2 is scaled by 60. CO2 molar absorptivity istaken from the HITRAN database[9].

29

Page 31: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

4. Molecules and Concentrations

In this chapter a list of molecules is assembled. This list contains the molecules that arelikely present in breath. The list is assembled in such a way to find molecules with con-centration differences between the healthy volunteers and the asthmatic volunteers thatcontributed their breath for this research. Seventy breath samples of healthy volunteersand seventy breath samples of asthmatic volunteers are used. The absorbance spectraof these samples were calibrated according to chapter 3. In this chapter the calibratedspectra are checked for the concentrations of the molecules in the list. Lastly, a specificmethod for acquiring a more accurate concentration for ethanol is described.

4.1 List of molecules

To check the absorbance spectra for concentrations, a list of molecules with known molarabsorptivities is assembled. The initial list consists of all molecules with known molarabsorptivities from the PNNL[8] database and the HITRAN[9] database. Currently thesedatabases provide information on a total of 456 different molecules.

From the 456 molecules available in the PNNL and HITRAN databases it is necessaryto select those likely present in breath. Not all of the molecules are expected to be presentin breath, as a lot of them are neither organic nor known to be in the atmosphere in anamount detectable by the current setup. Seeing that the absorbance still has uncertaintyafter the calibration, the concentrations determined will have a related uncertainty. Asa result the concentrations of non-present molecules will not necessarily be set to zero,and will therefore influence the concentrations determined for molecules that are presentin the gas sample. Thus before determining the concentrations it is vital to get a list ofmolecules which excludes these excessive molecules. In order to distinguish between thehealthy and asthmatic groups, the list should also contain molecules likely to have largedifferences in concentration between the groups.

The list is assembled in two steps. Firstly by determining the molecules that havebeen previously reported to be present in breath, the standard molecules. The secondstep is adding those molecules likely to show different concentrations between the samplesof the healthy and the asthmatic volunteers. In the case when one is not interested infinding differences between data sets the list of standard molecules should be extendedin another way. This can be done by selecting molecules based on their absorptivity anda set minimal absorbance.

30

Page 32: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

4.1.1 Standard molecules

This first group of molecules is a combination of atmospheric molecules[16] and moleculesknown to be in breath[17] with high absorbance in the relevant wavenumber region of 832cm−1 to 1263 cm−1. Molecules such as CO2, H2O, ethanol, and acetone are the majorcontributors to the absorbance spectrum of breath in the relevant wavenumber region.These molecules must therefore be included in the list regardless of whether they arelikely to have differences in concentration between the healthy and the asthmatic group.The list of molecules chosen this way can be found in Table 4.1.

Table 4.1: List of standard molecules used.

MoleculeAcetic acidAcetoneAmmonia anhydrousBenzeneCarbon dioxideCarbon monoxideEthaneEthyl alcoholEthyleneFormaldehyde monomerMethanePentanePropaneWater

4.1.2 Molecules selected to distinguish between data sets

To determine whether a gas was exhaled by a healthy person or by someone with asthma,a recognizable difference in measurement results is desirable. P-values are used to quan-tify this difference between the measurements from healthy people and asthmatic people.Since p-values are often misunderstood, an explanation of how p-values are calculatedand the significance of p-values to this research is given.

P-values

The p-value is defined as the probability of obtaining a certain result or more under theassumption of a certain hypothesis. In the context of these breath measurements thenull hypothesis is that being asthmatic has no effect on the concentrations of molecules,and will give identical absorption profiles as being healthy. Thereby for each wavenum-ber the combined absorbances of multiple samples of healthy and asthmatic people areexpected to be two groups of points spread around the same absorbance value. A normaldistribution is assigned to both the asthmatic and the healthy samples, and the mean

31

Page 33: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

and standard deviations of both is calculated. Now, assuming the null hypothesis is true,the mean and standard deviation of the asthmatic group should be the same as thatof the healthy group. The p-value per wavenumber is found by calculating the chancethat the absorbances of the asthmatic samples get the values they are, assuming the nullhypothesis which claims that the asthmatic samples follow the distribution of the healthysamples.

Assume the null hypothesis is not necessarily true, but rather that the alternative—having asthma does give a different absorbance spectrum—could also be true. Thepreferred hypothesis is the one for which the measured absorbance is most likely tooccur. Although no p-values are calculated for the alternative hypothesis, a small p-value for the null hypothesis is still used as an indicator that samples of asthmatic andhealthy patients are inherently different. The p-values between the 70 absorbances ofhealthy patients on the one hand and the 70 absorbances from asthmatic patients on theother hand are obtained this way for every wavenumber. This is done using an algorithmfrom K. Brand[18] that uses the topTable function from the LIMMA package in R.

P-region analysis

In p-region analysis the compounds most likely to have different concentrations betweenthe two data sets are sought. This is done by checking the p-values corresponding to thetwo data sets and finding regions of consecutive low p-values. The reasoning behind thisis as follows: organic molecules such as the ones expected in human breath are largerand more complicated than for example H2O and CO2, and thereby have a spectrumconsisting of broad smooth profiles. If the concentration of such a molecule differs greatlyfrom one data set to the other, this would be clear by low p-values in a broad band. Thusto find these molecules, the p-values corresponding to two data sets need to be below acertain threshold, for a certain region of consecutive wavenumber values. For these so-called p-regions, a list of compounds is checked for their absorptivity. Compounds withan average absorptivity above a certain threshold within any of these regions are likelycandidates for the difference in the measured intensity between two sample groups, andare listed to be checked for their concentration in the data set. This list includes not onlythe molecules, but also in which wavenumber regions they are present. Two p-regionsare found using a minimal of 10 consecutive data points with p-values of 0.05 or lower,one located between 1181.80 cm−1 and 1182.55 cm−1 and another from 1261.40 cm−1 to1262.05 cm−1. The complete set of compounds from the HITRAN and PNNL databasesis checked for average molar absorptivity above 10−4ppm−1m−1 in these regions. A tablewith the regions and a list of molecules found this way can be seen in Appendix A.2.

The p-values between the absorbance spectra of the healthy and the asthmatic pa-tients vary over narrow wavenumber regions, as can be seen in Figure 4.1. The red linein Figure 4.1 indicates the upper limit of 0.05 for which p-values are selected to indicatea significant difference in absorbance between healthy and asthmatic. The p-region from1181.80 cm−1 to 1182.55 cm−1 can be identified using the red line. The p-values varyover wavenumber regions that are narrow compared to the width of absorbance regionsas expected from molecules. This is explained by the mean-to-standard deviation ratio

32

Page 34: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

of 0.77071 over the wavenumber range of ten absorbance spectra, as calculated from asingle gas sample of a healthy person. Even though the p-values are calculated betweentwo groups of 70 samples each, the large variations in p-values as observed in Figure 4.1seem to indicate that either the samples within each group do not have much in commonwith each other, or that the noise on the signal is too large to make out significant differ-ences between the two groups. Hence it seems that the p-values might not be accurateindicators of subtle differences in concentrations of molecules between the healthy andasthmatic groups when using measurements with a large uncertainty. Regardless of this,relevant candidate molecules have been found using this method. An example is Freon-114 which is identified as being present in both p-regions and is used as a propellant inmetered-dose inhalers, which are the most commonly used delivery systems for treatingasthma[19].

1,172 1,174 1,176 1,178 1,180 1,1820

0.2

0.4

0.6

0.8

1

Wavenumber (cm-1)

P-v

alue

P-values of absorbance spectra of 70 healthy vs. 70 asthmatic patients

P-values of healthy vs. asthmaticP-region selection threshold

Figure 4.1: P-values as calculated between 70 absorbance spectra of healthy patients and70 absorbance spectra of asthmatic patients.

1The mean-to-standard deviation ratio over the entire spectrum of one sample is 0.7707. It is worthmentioning that the mean-to-standard deviation ratio in the regions with high absorbance of CO2

and H2O is lower. The mean-to-standard deviation ratio can also vary considerably over the differentmeasurements.

33

Page 35: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

4.1.3 Selection based on measurable concentration

The list of molecules can be shortened by checking their absorbances. Since moleculeswith a very low absorbance in the wavenumber region of interest will not be accuratelyestimated at the current noise level, these molecules can be eliminated. It is known fromZ. Hou[3, p. 71] that the sensitivity of the system is in the order of 1 ppmv, and from A.Reyes Reyes[5] that the sensitivity can even reach in the order of hundreds of ppbv. Thissensitivity is then used to establish a lower limit on the concentration for all molecules.In this case the chosen limit is even lower at 0.001 ppmv to make sure no molecules areeliminated unnecessarily. Absorbances are calculated for all molecules from this limit inconcentration and their molar absorptivities.

From an initial concentration estimation using only the standard molecules, a lowerlimit for the absorbance is set to 50 · 10−7. For all molecules the absorbance calcu-lated using the lower limit of concentration is then checked against this lower limit ofabsorbance. Molecules with absorbance that does not reach above the lower limit ofabsorbance at any point are dismissed, and the other molecules can be put on the listand further checked for their concentrations. A list of 199 molecules that are reported tobe present in breath is checked this way[17]. When the entire spectrum is checked thisway, some 130 molecules pass. Adapting this method for the wavenumber range of 1020to 1100 cm−1, which is the major absorptivity region for ethanol, some 69 molecules passas found in Appendix B.1. This list is used for determining the concentration of ethanol,as further explained in section 4.3.

Since the absorptivity of a molecule that is present might be lower than that of anabsent molecule, there is no right choice for the absorbance and concentration limit,and the selection based on these limits ends up being a trade-off. The limits can bechosen stricter with the risk of excluding molecules that are present in the gas andthereby decreasing the accuracy of the determined concentration, but with the benefit ofexcluding molecules that are not actually present in the gas, and thereby increasing theaccuracy of the concentration determined.

Since the method for finding differences in concentration between the healthy andasthmatic group filters on minimum absorbance and more specific wavenumber ranges,it is stricter and therefore preferred when possible.

4.2 Concentration determination

4.2.1 method

Henceforth only the list of molecules found in section 4.1 are used to determine theconcentrations. At this stage the strictness of the list can be adjusted by only allowingmolecules present in a certain amount or more of the previously determined regions of lowp-values to be processed further. In order to process the molecules further and comparethem to the measured data, the information of the molecules from the databases is firsttranscribed to match the format of the measured data, i.e. in wavenumber range andresolution. This comes down to removing all the information outside of the relevantwavelength region, and matching the wavenumber spacing as used in the measured data

34

Page 36: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

through interpolation. Furthermore the measured data and molecules need to be inthe same physical quantity, for which in this case absorbance is the most useful. Theaggregated absorbance of the gas is determined by the terms of Equation 2.2 and for asingle molecule by Equation 4.1

A(ν, cmol) = εmol(ν)cmollpath, (4.1)

where the absorbance A(ν, cmol) is determined by the concentration cmol, the opticalpath length of the laser in the gas lpath, and the molar absorptivity εmol. Since thepath length is measured from the setup and the molar absorptivity of the molecules isobtained from the databases, the absorbance is left as a function of the concentration.All calculations are done over the set wavelength region of 832 cm−1 to 1263 cm−1 asdetermined by the range of QCL. The concentrations are determined through what isessentially a curve fitting problem as the absorbances of the molecules can be laid overthe measured absorbance and then best fitted to match by tweaking the concentrations ofthe molecules. This curve-fitting problem can be expressed as a non-linear least squaresproblem as in Equation 4.2:

r(ν, C) = Aspectrum(ν) −n∑i=1

εi(ν)cilpath (4.2)

where the terms on the right hand side from left to right are the measured spectrumAspectrum(ν) and the sum is the spectrum to be fitted to it. Now take

S(C) =1263cm−1∑ν=832cm−1

r(ν, C)2, (4.3)

where r is a residue quantity, and S is the quantity to be minimized. The spacing for thesum is 0.05 cm−1. The square of the residue quantity r is taken so that larger residues perwavenumber influence S more heavily, and are therefore avoided by the minimization al-gorithm. This non-linear least squares problem is solved using the Levenberg–Marquardtalgorithm[20] which tweaks the concentrations from a given set of initial concentrations.

4.2.2 Input and results

The method described in the previous subsection is applied on the molecules found withthe p-region analysis, where all molecules with presence in one or more regions are checkedfor their concentration with the measured absorbance. The initial concentration of Ta-ble 4.2 are used.

Table 4.2: Table of initial concentrations used to calculate the concentrations as given inAppendix A.3.

Molecule Concentration (ppmv)Carbon dioxide 1000Water 1000Standard molecules 1Other molecules 0.01

35

Page 37: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

The Levenberg-Marquardt algorithm is implemented through the lsqnonlin function inMATLAB, with its options argument as: options = optimset(’Display’,’off’,’TolFun’,1e-15).

The absorbances of six molecules with some of the highest concentrations are plottedover the measured absorbance in Figure 4.2. These are the molecules benzene, carbondioxide, ethane, ethyl alcohol, ethylene, and water, as also reported to be present inbreath by B. de Lacy Costello[17]. Concentrations found for all molecules for this sampleare given at Appendix A.3.

800 850 900 950 1,000 1,050 1,100 1,150 1,200 1,250 1,300−1

−0.5

0

0.5

1

1.5

·10−1

Wavenumber (cm-1)

Abso

rban

ce(b

ase-

10)

Measured absorbance spectrum overlaid with molecule absorbances

AbsorbanceBenzeneCarbon dioxideEthaneEthyl alcoholEthyleneWater

Figure 4.2: Measured absorbance of breath of a healthy patient. The measured ab-sorbance is overlaid with absorbances of molecules with some of the highest concentra-tions.

4.3 Ethanol estimation by CO2 removal

4.3.1 Motivation for CO2 removal

As discussed in subsection 3.2.2, the peaks of the measured absorbance and the peaksof the absorbance of CO2 from the calculated concentrations do not match in height as

36

Page 38: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

seen in Figure 4.3.

1,076 1,077 1,078 1,079 1,080 1,081 1,082 1,083 1,084

0

0.5

1

1.5

2

2.5·10−1

Wavenumber (cm-1)

Abso

rban

ce(b

ase-

10)

Closer look at the measured absorbance and calculated CO2 and ethanol

Experimental dataCarbon dioxideEthyl alcoholWater

Figure 4.3: Measured absorbance of the breath of a healthy patient. The absorbance isoverlaid with the spectra of water, carbon dioxide, and ethyl alcohol. This figure showshow the measured carbon dioxide peaks are far from the calculated ones.

This difference in height of the CO2 peaks and the measured peaks also affects theestimation of concentration for the other major molecule present in this region, namelyethanol. Hence it could prove useful to remove CO2 and its uncertainties in order toget a better estimate of the ethanol concentration, much in the same way as A. ReyesReyes[21] did for acetone.

4.3.2 Implementation of CO2 removal for ethanol estimation

The method described in section 4.2 is used to determine the concentrations of themolecules found using subsection 4.1.1 and subsection 4.1.3. The calculated absorbancecontributions of all molecules except for that of CO2 are subtracted from the measuredabsorbance, as seen in Figure 4.4b. This is done to single out the absorbance contributionof CO2 as best as possible, so it can more easily be removed. All that is left is noise aroundzero absorbance and the absorbance contribution of CO2. The peaks from CO2 are now

37

Page 39: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

identified using a peak detection algorithm, and then fitted by normal distributions. Thisway the actual measured CO2 peaks are fitted including any deviations they have fromthe CO2 spectrum according to the database. The fits are then subtracted from thesignal, and all that is left is the difference between the measured absorbance and theabsorbance contributions of all molecules as calculated using section 4.2, which is mainlynoise around zero absorbance as seen in Figure 4.4c. The contributions of all moleculesother than CO2 are now summed back on this in order to get the full measurement backminus any absorbance by CO2, as shown in Figure 4.4d. From this absorbance spectrumthe concentrations are once again determined using the method described in section 4.2.

The newly calculated ethanol concentration is 2% lower than that calculated beforethe use of this method. Since this method of ethanol estimation has not been tested onany samples for which the ethanol concentration is known before hand, not much can besaid on the accuracy of this method. The same method is however applied successfullyby removing water peaks to determine the acetone concentration in the region 1150 cm−1

to 1250 cm−1[21, 22].

38

Page 40: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Step by step removal of CO2 peaks

1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100

0

1

2·10−1

Abso

rbance

(base

-10) Experimental data

(a)

1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100

0

1

2·10−1

Ab

sorb

an

ce(b

ase

-10) Experimental data minus database molecules, with CO2

(b)

1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100−0.4

−0.2

0

0.2

0.4

0.6·10−1

Abso

rban

ce(b

ase

-10) Experimental data minus database molecules, removed CO2

(c)

1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100−0.4

−0.2

0

0.2

0.4

0.6·10−1

WavenumberAbso

rbance

(base

-10) Experimental data with database molecules, removed CO2

(d)

Figure 4.4: Step by step removal of the CO2 peaks.

39

Page 41: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

5. Disease prediction by machinelearning

For the breath samples measured, an analytical study based on p-values was performedto predict and to categorize the asthmatic and healthy samples[18]. This study didnot get any definitive results for prediction. Multiple articles in literature propose thatmethods more complicated than just using the p-values could give a better results[23][24].Finding the molecules which differentiate between a healthy and non-healthy subject,and predicting diseases of a patient are both mentioned as possibilities. Since neither ofthese sources applies machine learning to actually predict disease, and the application ofmachine learning performed on breath data is not widespread[17], an attempt to do sois made. The analysis is performed on 70 absorbance spectra from healthy patients, and70 absorbance spectra from asthmatic patients.

5.1 Machine learning for classification

Algorithms performing classification by machine learning look for patterns in the data ofthe different samples. Because for the current set of samples it is known whether theyoriginate from healthy or asthmatic patients, supervised learning methods are applied.This means the machine learning algorithm is trained to recognize patterns of healthyand asthmatic samples by feeding it a part of all samples. In this case 98 of 140 samplesare used to train the algorithms and the other 42 to test the accuracy of the algorithm.

5.2 Classification by trial-and-error

Since the scope of this project does not allow for a study of the details of machine learn-ing and the construction of an algorithm specifically made for these datasets, existingalgorithms are used. Based on an extensive study of what kind of algorithms are mostuseful in the field of breath analysis, the focus initially lies on support vector machines(SVMs)[23]. Use is made of the website mlcomp.org, which was made specifically for up-loading datasets, uploading machine learning algorithms, and running these algorithmson the datasets[25]. This website expresses the rate of success in a single number, theerror, which is the amount of falsely classified samples divided by the total amount ofsamples. Since most algorithms run in less than 5 minutes and they can run simul-taneously, many different algorithms can be tested in a short amount of time. Aside

40

Page 42: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

from the complete set of absorbances of 140 samples over the wavelength range of 832to 1263 cm−1, various subsets of this are tested as well. Among these are the set of 36wavenumbers for which the p-values are below 0.005, the set of concentrations of 167compounds as derived from the 140 absorbances, and the set of 30 compounds with thehighest concentrations as derived from the 140 absorbances.

5.2.1 Basic explanation of some algorithms used

The basic concept of some of the better working algorithms as seen in Table 5.1 isexplained here. Commonly, these algorithms reduce each spectrum to a set of featureswhich are then used to represent the various spectra as points in a feature space. Theentire space is then divided into volumes with borders based on the geometrical distancesbetween the training spectra: these are the classifiers. The test spectra are then portrayedas points in this space in the same way as the training spectra and classified accordingto which volume they inhabit. The way the spectra are portrayed as points in a space,and the way the classifiers are built up from these points is different for each algorithm.

Method 1 in Table 5.1, the Normalized Gaussian radial basis function network, usesat its core a method of k-means clustering. k-means clustering aims to partition theabsorbance spectra into k clusters in which each spectrum belongs to the cluster withthe nearest mean.

Possibly the simplest classification, the nearest neighbour algorithms of methods 2and 3 in Table 5.1 classify the test spectra based on the closest training spectra in thefeature space.

Method 4, 5, and 6 in Table 5.1 are support vector machines. They represent thespectra as points in space, mapped so that the spectra of the separate categories aredivided by a clear gap that is as wide as possible. New spectra are then mapped intothat same space and predicted to belong to a category based on which side of the gapthey fall on.

Table 5.1: Best error rates for eight algorithms out of 41 run on the entire dataset ofabsorbances of 70 healthy and 70 asthmatic patients. Errors have possible values between0 and 1. Names of algorithms are as stated on the website[25].

Method Description of algorithm Name of algorithm Error1 Normalized Gaussian radial basis

function networkRBFNetwork weka nominal 0.167

2 K-nearest neighbours classifier IBk weka nominal 0.1903 Nearest neighbours classifier IB1 weka nominal 0.1904 Linear support vector machine for

multiclass classificationsvmlight multiclass-linear 0.190

5 Support vector machine for multi-class classification

svmlight multiclass 0.190

6 Support vector machine for binaryclassification

svmlight-linear 0.214

41

Page 43: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

5.3 Results

Results over the various datasets seem to generally be best for algorithms such as nearestneighbor and support vector machines. A trend is also seen in that the larger data setsget less miss-classifications. The smallest errors are found for the unmodified data set ofall absorbances over the entire wavelength range, which are given in Table 5.1.

The similarity of the errors from Table 5.1 can be explained by the small test setof 42 samples. The various nearest neighbour algorithms function quite similarly, andthe same can be said for the support vector machines. As the results from Table 5.1are the best from a total of 41 different algorithms, they corroborate the use of SVMsas adviced by the study on machine learning algorithms for breath analysis[23]. Thebest algorithm is the Normalized Gaussian radial basis function network with an error of0.167, or 7 miss-classified samples out of the 42 test samples. The exact reason why thisalgorithm performs better than the others has not been determined. The errors statedare the combined errors of healthy samples falsely classified as asthmatic and the otherway around. By running the algorithms outside of mlcomp.org the separate errors ofboth types of miss-classification can likely be obtained, which increases the knowledge ofthe accuracy of a certain type of classification.

Without in-depth knowledge of the algorithms and further testing, there is no guar-antee that they will also work on further datasets. The importance of these preliminaryresults lies in the clear quantitative results expressed as the error. Since with little knowl-edge a success rate of 83.3% is obtained, further research in machine learning applied forbreath classification could yield more successful algorithms that can be applied with thecurrent QCL spectroscopy setup in medical diagnosis.

42

Page 44: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

6. Conclusions and outlook

Here the main problems from the previous chapters and their solutions are discussed.Additionally, improvements for further research are made.

6.1 QCL setup

The intensity fluctuations in the light emitted by the QCL caused by mode hopping areresolved by normalisation using the monitor and main detector as described in chapter 2.

One way to improve the system and forego the need of all the calibrations described inthis thesis is to use the monitor signal to measure the actual wavenumber. A Michelson-Morley interferometer was considered to be put in when the setup was still being built.It was omitted since there were plans to move the setup and a Michelson-Morley inter-ferometer would be too sensitive for vibrations. Both a diffraction spectrometer and acell reference can be used as well, the latter being favorable as it is the cheaper option.

The sensitivity can also be improved by increasing the delay the multi-pass cavityapplies to the beam such that both the beams going to the monitor and the main detectorcan instead be measured using one detector. This improves the signal since measuringon two separate detectors introduces two independent sources of noise, whereas a singledetector does not[26].

6.2 Data calibration

The wavenumber uncertainty caused by noise in the voltage over the piezo is resolved bycalibration done in post-processing as described in chapter 3. Despite this calibration,the standard deviation over ten absorbances spectra from a single sample is still relativelylarge.

Even after the wavenumber calibration the resulting absorbance spectrum does notmatch with CO2 and H2O peaks from the databases. This is caused by the high variationof the twenty monitor measurements of which the mean is the basis of the wavenumbercalibration. Regardless, wavenumber calibration is still useful as an intermediate calibra-tion step, as it calibrates more details than the calibration to CO2 and H2O performedafterwards. One cause for this variation between measurements is expected to be thenoise on the voltage applied over the piezo. Other contributions are temperature fluc-tuations that influence the laser performance, and the diffraction grating that reflects asmall range of wavelength rather than a single wavelength back into the laser diode.

43

Page 45: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

The height of the measured absorbance peaks does not generally match with theheights of the CO2 peaks from the database, as seen in Figure 3.12. It can be seen thatthe height of each peak is about equal between samples. This indicates that the differencein height of the various measured absorbance peaks to the CO2 peaks is not a matter ofnoise, but rather a consistent wavenumber dependent influence from a part of the setup,or possibly a molecule in breath. A source for these mismatching absorbances has notyet been pinpointed.

6.3 Determination of molecules

A general disadvantage of choosing molecules based on their molar absorptivity is thatsome molecules might have a large impact on the eventual absorbance due to a high con-centration, despite having a low absorptivity. They would therefore be falsely eliminatedfrom the list of molecules that is checked.

There is also the drawback of excluding molecules which might be an important part ofthe absorbance, but are not selected since they do not have adequate molar absorptivity inthe p-regions. This makes the subsequent determination of concentrations less accurate.It is possible to allow for more p-regions, either by decreasing the minimum amount ofconsecutive points needed or heightening the maximum p-values of which these regionsconsist. This way more molecules will have some absorptivity in one of the regions andtherefore will not be scrapped.

Both problems have to do with selecting molecules based on intensity, and they canbe resolved in advance by putting molecules that play an important role in the eventualabsorbance in the standard list of molecules. This requires a priori knowledge of theimportant molecules which is obtained from literature[17] and simply by performingthe least squares method to get the concentration on a large list of molecules. Theconcentrations will not be as accurate for a large list, but all molecules that contributeconsiderably to the spectrum can be picked out for later runs with strict molecule lists.

6.4 Determination of concentrations

From the shapes of CO2 and H2O in Figure 4.2 it can be seen that the concentrationscalculated using the lsqnonlin function are in the right order of magnitude. Since thestandard deviation over several measured absorbances of a single sample is still relativelyhigh even after wavenumber calibration, the calculated concentrations for the moleculeswith lower concentrations are not very accurate. The average concentrations found forseveral molecules over all samples are given in Figure 6.1. The difference in amount ofH2O present can be explained by a filter in the setup which catches the moisture inbreath before it enters the cavity. The large amount of ethanol measured for the samplescan be explained by the cleaning using ethanol of the mouthpieces through which thebreath samples are collected. The standard deviations portrayed are combinations of theuncertainties from the measurements and the variation between breath of the patients.

44

Page 46: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

CO2 H2O ACETONE NH3 CH4 ETHANOL

0.001

0.01

0.1

1

10

100

1,000

10,000

100,000

Con

centr

atio

n(p

pm

v)

Average molecule concentrations over all samples

Healthy samples Asthmatic samples VSL healthy

Figure 6.1: Average concentrations of molecules for healthy and asthmatic children com-pared to VSL data of healthy adults.

6.5 Categorizing health status by classification

An algorithm with a classification success rate of 83.3% is obtained. Further research inmachine learning applied for breath classification could yield more successful algorithmsthat can be applied with the current QCL spectroscopy setup in medical diagnosis.

The accuracy of the classification can be improved by looking more closely into theused algorithms, and improving them using a-priori knowledge of the samples such asregions of relatively high signal-to-noise ratio. Wavenumber regions can be given weightsbased on the signal-to-noise ratios so the algorithm depends more heavily on these accu-rate regions.

The error rate could also tell a lot more if the amount of false positives and falsenegatives is given. This would give separate error rates based on whether the algorithmjudges the sample as healthy or asthmatic. It is possible that one of these error rates ismuch lower than the other, thereby increasing the certainty of the result above the totalerror rate.

45

Page 47: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Bibliography

[1] I. Horvath and J.C. de Jongste. Exhaled Biomarkers. European Respiratory Society,2010.

[2] Erik Swietlicki, Sanjiv Puri, Hans-Christen Hansson, and Hans Edner. Urban airpollution source apportionment using a combination of aerosol and gas monitoringtechniques. Atmospheric Environment, 30(15):2795–2809, 1996.

[3] Zhe Hou. Breath analysis using infrared spectroscopy. August 2013.

[4] Ellen Holthoff, John Bender, Paul Pellegrino, and Almon Fisher. Quantum cascadelaser-based photoacoustic spectroscopy for trace vapor detection and molecular dis-crimination. Sensors, 10(3):1986–2002, 2010.

[5] A Reyes-Reyes, Z Hou, E Van Mastrigt, R C Horsten, J C De Jongste, M W Pi-jnenburg, H P Urbach, and N Bhattacharya. Multicomponent gas analysis usingbroadband quantum cascade laser spectroscopy. Optics express, 2014.

[6] DF Swinehart. The beer-lambert law. Journal of chemical education, 39(7):333,1962.

[7] Silvia E Braslavsky. Glossary of terms used in photochemistry, (iupac recommenda-tions 2006). Pure and Applied Chemistry, 79(3):293–465, 2007.

[8] Northwest infrared https://secure2.pnl.gov/nsd/nsd.nsf/welcome.

[9] Hitran database http://www.cfa.harvard.edu/hitran/.

[10] Block engineering laserscope for gas analysis.

[11] Jerome Faist, Federico Capasso, Deborah L Sivco, Carlo Sirtori, Albert L Hutchin-son, and Alfred Y Cho. Quantum cascade laser. Science, 264(5158):553–556, 1994.

[12] J Barry McManus, Paul L Kebabian, and MS Zahniser. Astigmatic mirror multipassabsorption cells for long-path-length spectroscopy. Applied Optics, 34(18):3336–3348, 1995.

[13] Mct ir detector module.

[14] Pvi-4te detector specifications.

46

Page 48: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

[15] Matlab peakfinder code http://www.mathworks.com/matlabcentral/fileexchange/25500-peakfinder.

[16] Atmosphere of earth - http://en.wikipedia.org/wiki/atmosphere of earth.

[17] B de Lacy Costello, A Amann, H Al-Kateb, C Flynn, W Filipiak, T Khalid, D Os-borne, and N M Ratcliffe. A review of the volatiles from the healthy human body.Journal of Breath Research, 8(1):014001, 2014.

[18] Esther Van Mastrigt, Adonis Reyes-Reyes, Karl Brand, Nandini Bhattacharya, H.P.Urbach, Andrew Stubbs, Johan C. De Jongste, and Marielle W. Pijnenburg. Ex-haled Breath Profiling In Children With Asthma And Cystic Fibrosis: A LaserSpectroscopy-Based Method Focused On Hydrocarbons, pages A5639–A5639–. Amer-ican Thoracic Society, May 2014.

[19] Chet L. Leach. Approaches and challenges to use freon propellant replacements.Aerosol Science and Technology, 22(4):328–334, 1995.

[20] JorgeJ. More. The levenberg-marquardt algorithm: Implementation and theory. InG.A. Watson, editor, Numerical Analysis, volume 630 of Lecture Notes in Mathe-matics, pages 105–116. Springer Berlin Heidelberg, 1978.

[21] Adonis Reyes-Reyes, Roland C. Horsten, H. Paul Urbach, and Nandini Bhat-tacharya. Study of the exhaled acetone in type 1 diabetes using quantum cascadelaser spectroscopy. Analytical Chemistry, 87(1):507–512, 2015. PMID: 25506743.

[22] Florian Adler, Piotr Mas lowski, Aleksandra Foltynowicz, Kevin C. Cossel, Travis C.Briles, Ingmar Hartl, and Jun Ye. Mid-infrared fourier transform spectroscopy witha broadband frequency comb. Opt. Express, 18(21):21861–21872, Oct 2010.

[23] A Smolinska, A-Ch Hauschild, R R R Fijten, J W Dallinga, J Baumbach, and F Jvan Schooten. Current breathomics—a review on data pre-processing techniquesand machine learning in metabolomics breath analysis. Journal of Breath Research,April 2014.

[24] J. D. Malley, A. Dasgupta, and J. H. Moore. The limits of p-values for biologicaldata mining. BioData Min, 6(1):10, 2013.

[25] Mlcomp - http://mlcomp.org/.

[26] D.D. Nelson, J.H. Shorter, J.B. McManus, and M.S. Zahniser. Sub-part-per-billiondetection of nitric oxide in air using a thermoelectrically cooled mid-infrared quan-tum cascade laser spectrometer. Applied Physics B, 75(2-3):343–350, 2002.

47

Page 49: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Appendices

48

Page 50: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

A. Results for a sample of ahealthy patient

A.1 Standard molecules

Table A.1: List of standard molecules used.

MoleculeAcetic acidAcetoneAmmonia anhydrousBenzeneCarbon dioxideCarbon monoxideEthaneEthyl alcoholEthyleneFormaldehyde monomerMethanePentanePropaneWater

A.2 Molecules found by p-region analysis

Table A.2: Table of molecules that are present in either or both of the p-regions. Thep-regions were found using a maximum p-value of 05 for at least 10 consecutive datapoints or 0.5 cm−1. These molecules have a mean absorptivity in this region of at least10−4.

1181.8-1182.55 1261.4-1262.051-Chloro-1 1 2 2-tetrafluoroethane (R124A) 1 11-Hexanoic acid 1 01 1 1-Chlorodifluoroethane (Freon-142B) 1 0

49

Page 51: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.2: Table of molecules that are present in either or both of the p-regions. Thep-regions were found using a maximum p-value of 05 for at least 10 consecutive datapoints or 0.5 cm−1. These molecules have a mean absorptivity in this region of at least10−4.

1181.8-1182.55 1261.4-1262.051 1 1-Trifluoroacetone 1 01 1 1-Trifluoroethane (R143A) 1 11 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 1 11 2-Dibomotetrafluoroethane (Freon-114B2) 1 01 2-Dibromoethane (EDB) 1 01 2-Dichloro-1-fluoroethane (R141) 1 01 2-Dichloro-1 1 2-trifluoroethane (F132A) 1 01 2-Dimethoxyethane 1 01 4-Dioxane 0 12-Chloro-1 1 1-trifluoroethane (R133A) 1 12-Chloropropane 0 12-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 1 02-Ethoxyethyl acetate 1 12-Methoxyethanol 1 02H 3H-Perfluoropentane 1 12 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 1 12 2-Difluorotetrachloroethane (F112A) 1 02 2 2-Trifluoroethanol 1 12 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 1 13-Methyl-2-pentanone 1 0Acetic acid 1 1Acetic anhydride 1 0Acetid acid dimer 0 1Acetone cyanohydrin 1 0Aniline 0 1Benzyl chloride 0 1Bromochlorodifluoromethane (Freon-12B1) 1 0Butyl acetate 0 1Butyric acid 1 0Carbonyl fluoride 0 1Chloromethyl ethyl ether 1 1Chloromethyl methyl ether 1 0Chloropentafluoroethane (R115) 1 1Chlorosulfonyl isocyanate (CSI) 1 0Diborane 1 0Dichloromethane 0 1Diethyl ether 1 0

50

Page 52: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.2: Table of molecules that are present in either or both of the p-regions. Thep-regions were found using a maximum p-value of 05 for at least 10 consecutive datapoints or 0.5 cm−1. These molecules have a mean absorptivity in this region of at least10−4.

1181.8-1182.55 1261.4-1262.05Diethyl sulfide 0 1Diisopropyl ether 1 0Diisopropylamine 1 0Diketene 0 1Dimethoxymethane 1 0Dimethyl carbonate 0 1Dimethyl ether 1 0Dimethylcarbamoyl chloride 0 1Dipropylene glycol methyl ether 1 1Ethyl acetate 0 1Ethyl acrylate 1 1Ethyl bromide (Halon-2001) 0 1Ethyl butyrate 1 1Ethyl chloroformate 1 0Ethyl formate 1 0Ethyl methyl ether 1 0Ethyl tert-butyl ether 0 1Ethyl trifluoroacetate 1 1Ethylvinyl ether (EVE) 1 0Formic acid dimer 1 0Freon-113 (1 1 2-Trichlorortrifluoroethane) 1 0Freon-114 (1 2-dichlorortetrafluoroethane) 1 1Freon-134a (1 1 1 2-tetrafluoroethane) 1 0Freon-218 (octafluoropropane) 0 1Freon-C318 (octafluorocyclobutane) 0 1Furan 1 0Glycoaldehyde 0 1Guaiacol 1 1Hexachloro-1 3-butadiene 1 0Hexafluoroacetone 1 1Hexafluoroethane (Freon-116) 0 1Hexafluoroisobutylene 1 1Hexafluoropropene 1 0Hydrogen peroxide 0 1Isobutyl acetate 0 1Isopentyl acetate 0 1Isopropyl acetate 0 1

51

Page 53: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.2: Table of molecules that are present in either or both of the p-regions. Thep-regions were found using a maximum p-value of 05 for at least 10 consecutive datapoints or 0.5 cm−1. These molecules have a mean absorptivity in this region of at least10−4.

1181.8-1182.55 1261.4-1262.05Methacryloyl chloride 0 1Methanesulfonyl chloride 1 0Methyl acetate 0 1Methyl acrylate 1 1Methyl benzoate 1 1Methyl formate 1 0Methyl iodide 0 1Methyl isoamyl ketone 1 0Methyl isobutyl ketone (MIBK) 1 0Methyl isobutyrate 1 1Methyl methacrylate 1 0Methyl pivalate 1 0Methyl propionate 1 1Methyl propyl ketone 1 0Methyl salicylate 1 1Methylchloroformate (MCF) 1 0Methyldichlorodisilanes (mixed isomers) 0 1Methylethyl ketone 1 0Methyltrichlorosilane 0 1Methylvinyl ketone 1 1N N’-Dimethyl formamide (DMF) 0 1N N-Diethylaniline 0 1N N-diethyl formamide 0 1Nitrogen dioxide and dinitrogen tetroxide 0 1Octanoic acid 1 0Paraldehyde 1 0Pentafluoroethane (f125) 1 0Perfluorobutane 1 1Perfluoroisobutylene (PFIB) 1 0Phenol 1 1Propargyl chloride 0 1Propionic acid (and some dimer) 1 0Propyl acetate 0 1Propylene glycol 0 1Sulfuryl chloride 1 0Sulfuryl fluoride 0 1Tetrafluoromethane 0 1

52

Page 54: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.2: Table of molecules that are present in either or both of the p-regions. Thep-regions were found using a maximum p-value of 05 for at least 10 consecutive datapoints or 0.5 cm−1. These molecules have a mean absorptivity in this region of at least10−4.

1181.8-1182.55 1261.4-1262.05Trichlorofluroethylene 1 0Trifluoroacetic acid 1 1Trifluoroacetic anhydride 1 1Trifluoroacetyl chloride 1 1Trifluoromethane (Freon-23) 1 0Trifluoromethylsulfur pentafluoride 0 1Trifluoronitrosomethane 1 1Trimethylamine 1 0Vinayl acetate 0 1bis(2-Chloroethyl) ether 0 1cis-1 3-Dichloropropene 0 1m-Cresol 1 0n-Amyl acetate 0 1o-Toluidine 0 1tert-Amyl methyl ether 1 0

A.3 Concentrations found for a single healthy pa-

tient

In order to find the concentrations, the following options argument is used for the lsqnon-lin function in MATLAB:options = optimset(’Display’,’off’,’TolFun’,1e-15)

Table A.3: Table of initial concentrations used to calculate the concentrations as givenin Appendix A.3.

Molecule Concentration (ppmv)Carbon dioxide 1000Water 1000Standard molecules 1Other molecules 0.01

53

Page 55: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.4: Concentrations found using the method described in section 4.2. These areof a single sample from a healthy patient. The standard list of molecules as given insection A.1 and the list of molecules as given by section A.2 are used.

Molecule concentration (ppmv)Acetic acid 2.22045e-14Acetone 2.22045e-14Ammonia anhydrous 2.22045e-14Benzene 1.79671Carbon dioxide 0.912382Carbon monoxide 1538.28Ethane 2.66622Ethyl alcohol 0.946815Ethylene 0.109119Formaldehyde monomer 2.22045e-14Methane 2.22045e-14Pentane 2.22045e-14Propane 2.22045e-14Water 0.09880241-Chloro-1 1 2 2-tetrafluoroethane (R124A) 2.22045e-141-Hexanoic acid 2.22045e-141 1 1-Chlorodifluoroethane (Freon-142B) 0.004787731 1 1-Trifluoroacetone 2.22045e-141 1 1-Trifluoroethane (R143A) 2.22045e-141 1 1 2 3 3 3-Heptafluoropropane (HFC227EA) 2.22045e-141 2-Dibomotetrafluoroethane (Freon-114B2) 2.22045e-141 2-Dibromoethane (EDB) 2.22045e-141 2-Dichloro-1-fluoroethane (R141) 2.22045e-141 2-Dichloro-1 1 2-trifluoroethane (F132A) 2.22045e-141 2-Dimethoxyethane 2.22045e-141 4-Dioxane 2.22045e-142-Chloro-1 1 1-trifluoroethane (R133A) 2.22045e-142-Chloropropane 2.22045e-142-Chlroro-1 1 1 2-tetrafluoroethane (HCFC-124) 2.22045e-142-Ethoxyethyl acetate 2.22045e-142-Methoxyethanol 2.30625e-142H 3H-Perfluoropentane 2.22045e-142 2-Dichloro-1 1 1-trifluoroethane (Freon-123) 2.22045e-142 2-Difluorotetrachloroethane (F112A) 0.01049792 2 2-Trifluoroethanol 2.22045e-142 2 4-Trimethyl-1 3-pentanediol isobutyrate (Texanol) 2.22045e-143-Methyl-2-pentanone 2.22045e-14Acetic acid 2.22045e-14

54

Page 56: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.4: Concentrations found using the method described in section 4.2. These areof a single sample from a healthy patient. The standard list of molecules as given insection A.1 and the list of molecules as given by section A.2 are used.

Molecule concentration (ppmv)Acetic anhydride 2.22045e-14Acetid acid dimer 2.22045e-14Acetone cyanohydrin 2.22045e-14Aniline 0.692194Benzyl chloride 2.22045e-14Bromochlorodifluoromethane (Freon-12B1) 2.22045e-14Butyl acetate 2.22045e-14Butyric acid 2.22045e-14Carbonyl fluoride 2.22045e-14Chloromethyl ethyl ether 2.22045e-14Chloromethyl methyl ether 2.22045e-14Chloropentafluoroethane (R115) 0.0115797Chlorosulfonyl isocyanate (CSI) 2.22045e-14Diborane 0.114448Dichloromethane 2.22045e-14Diethyl ether 0.0152475Diethyl sulfide 2.22045e-14Diisopropyl ether 2.22045e-14Diisopropylamine 2.22045e-14Diketene 2.22045e-14Dimethoxymethane 2.22045e-14Dimethyl carbonate 2.22045e-14Dimethyl ether 2.22045e-14Dimethylcarbamoyl chloride 2.22045e-14Dipropylene glycol methyl ether 2.22045e-14Ethyl acetate 2.22045e-14Ethyl acrylate 0.039846Ethyl bromide (Halon-2001) 2.22046e-14Ethyl butyrate 2.22045e-14Ethyl chloroformate 2.22045e-14Ethyl formate 2.22045e-14Ethyl methyl ether 2.22045e-14Ethyl tert-butyl ether 0.195303Ethyl trifluoroacetate 2.22045e-14Ethylvinyl ether (EVE) 2.22045e-14Formic acid dimer 2.22045e-14Freon-113 (1 1 2-Trichlorortrifluoroethane) 2.22045e-14Freon-114 (1 2-dichlorortetrafluoroethane) 2.22045e-14

55

Page 57: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.4: Concentrations found using the method described in section 4.2. These areof a single sample from a healthy patient. The standard list of molecules as given insection A.1 and the list of molecules as given by section A.2 are used.

Molecule concentration (ppmv)Freon-134a (1 1 1 2-tetrafluoroethane) 2.22045e-14Freon-218 (octafluoropropane) 2.22045e-14Freon-C318 (octafluorocyclobutane) 2.22045e-14Furan 2.22045e-14Glycoaldehyde 2.22045e-14Guaiacol 2.22045e-14Hexachloro-1 3-butadiene 2.22045e-14Hexafluoroacetone 2.22045e-14Hexafluoroethane (Freon-116) 2.22045e-14Hexafluoroisobutylene 2.22045e-14Hexafluoropropene 2.22045e-14Hydrogen peroxide 2.22045e-14Isobutyl acetate 2.22045e-14Isopentyl acetate 2.22045e-14Isopropyl acetate 2.22045e-14Methacryloyl chloride 0.101246Methanesulfonyl chloride 0.0190093Methyl acetate 2.22045e-14Methyl acrylate 2.22045e-14Methyl benzoate 2.22045e-14Methyl formate 2.22045e-14Methyl iodide 2.22045e-14Methyl isoamyl ketone 2.22045e-14Methyl isobutyl ketone (MIBK) 2.22045e-14Methyl isobutyrate 2.22045e-14Methyl methacrylate 2.22045e-14Methyl pivalate 0.0277697Methyl propionate 2.22045e-14Methyl propyl ketone 2.22045e-14Methyl salicylate 2.22045e-14Methylchloroformate (MCF) 2.22046e-14Methyldichlorodisilanes (mixed isomers) 2.22045e-14Methylethyl ketone 2.22045e-14Methyltrichlorosilane 2.22046e-14Methylvinyl ketone 2.22045e-14N N’-Dimethyl formamide (DMF) 2.22045e-14N N-Diethylaniline 2.22045e-14N N-diethyl formamide 2.22045e-14

56

Page 58: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table A.4: Concentrations found using the method described in section 4.2. These areof a single sample from a healthy patient. The standard list of molecules as given insection A.1 and the list of molecules as given by section A.2 are used.

Molecule concentration (ppmv)Nitrogen dioxide and dinitrogen tetroxide 2.22045e-14Octanoic acid 2.22045e-14Paraldehyde 2.22045e-14Pentafluoroethane (f125) 2.22045e-14Perfluorobutane 2.22045e-14Perfluoroisobutylene (PFIB) 0.000842043Phenol 2.22045e-14Propargyl chloride 0.250046Propionic acid (and some dimer) 2.22045e-14Propyl acetate 2.22045e-14Propylene glycol 2.22045e-14Sulfuryl chloride 2.22045e-14Sulfuryl fluoride 2.22046e-14Tetrafluoromethane 2.22045e-14Trichlorofluroethylene 2.22045e-14Trifluoroacetic acid 2.22045e-14Trifluoroacetic anhydride 0.00520583Trifluoroacetyl chloride 2.22045e-14Trifluoromethane (Freon-23) 2.22045e-14Trifluoromethylsulfur pentafluoride 2.22045e-14Trifluoronitrosomethane 2.22045e-14Trimethylamine 2.22045e-14Vinayl acetate 2.22045e-14bis(2-Chloroethyl) ether 2.22045e-14cis-1 3-Dichloropropene 2.22045e-14m-Cresol 2.22045e-14n-Amyl acetate 2.22045e-14o-Toluidine 2.22045e-14tert-Amyl methyl ether 2.22045e-14

57

Page 59: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

B. List of molecules

B.1 Breath molecules with absorbance in the 1020

to 1100 cm−1 region.

Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1 regionon 199 molecules that should be present in breath[17] .

Molecules1-Hexanoic acid1-Pentanol (n-amyl alcohol)1-Propanol1 2-Diclorobenzene1 3-Dichlorobenzene1 4-Dichlorobenzene1 4-Dioxane2-Butoxyethanol2-Ethoxyethyl acetate2-Ethyl-1-hexanol2-Methoxyethanol2-methyl-1-propanol (IBA isobutanol)2-Methyl-2-propanal (Isobutenal)2-Methylfuran2 3-Butanedione3-methylfuranAcetic acidAcetic anhydrideAcetolAllene (1 2-propadiene)Allyl alcoholAmmonia anhydrousBenzeneBenzyl alcoholButyl acetateButyric acid

58

Page 60: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1 regionon 199 molecules that should be present in breath[17] .

MoleculesCadaverineChlorobenzeneCyclopenteneCyclopropaneDiethyl etherDimethoxymethaneDimethyl etherDimethyl sulfoxideDipropylene glycol methyl etherEthyl acetateEthyl acrylateEthyl alcoholEthyl butyrateEthyl tert-butyl etherEthyleneEthylvinyl ether (EVE)FuranFurfuralFurfuryl alcoholGlycoaldehydeGuaiacolIsobutyl acetateIsopentyl acetateIsopropyl acetateIsopropyl alcoholMethyl acetateMethyl alcoholMethyl methacrylateMethyl propionateMethylaminen-Butyl alcoholN N’-Dimethyl formamide (DMF)N N-diethyl formamideOctanoic acidPropionic acid (and some dimer)Propylene glycolPropylene sulfidePyridinesec-Butyl alcohol

59

Page 61: Breath analysis by quantum cascade spectroscopy - Master thesis by Olav Grouwstra

Table B.1: Molecules as filtered using subsection 4.1.3 in the 1020 to 1100 cm−1 regionon 199 molecules that should be present in breath[17] .

Moleculestert-Butyl methyl etherThiophenetrans-CrotonaldehydeTrimethylamine

60