58
Eindhoven University of Technology MASTER Near-ear vital signs camera Balmaekers, B.M. Award date: 2015 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Eindhoven University of Technology

MASTER

Near-ear vital signs camera

Balmaekers, B.M.

Award date:2015

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

Page 2: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

near-earVital Signs Camera

Master’s Thesis

B.M. Balmaekers

Department of Mathematics and Computer ScienceElectronic Systems Group, Department of Electrical Engineering

Committee members:prof. dr. ir. G. de Haan

V. Jeanne, MScdr. ir. S. Stuijk

dr. G. Dubbelman

publishing date: 29-06-2015

Eindhoven, June 2015

Page 3: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin
Page 4: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Abstract

Personal health monitoring has become increasingly popular in recent years, as evidenced by thegrowing availability of electronic consumer products such as step counters and fitness trackers.Yet, while the general awareness of personal health monitoring as a means to an improved well-being is rising, the need to buy, wear and use special devices still forms a roadblock for wideadoption.This research explores the possibility to use a smartphone to automatically record a user’s vitalsign metrics, such as heart rate. More specifically, the feasibility of using the phone’s front cameraduring phone calls to obtain a heart rate estimate is explored using an Apple iPhone. The mainconcept is based on remote photoplethysmography (rPPG), which measures the volume changes ofoxygenated hemoglobin in the skin. However, this work points out the importance of optimizationalong the full signal chain from data acquisition, through preprocessing and PPG extraction tofinal estimate. Di↵erent methods to extract the PPG signal from the video signal are compared.In addition, several methods to improve the input signal through optimized use of the hardware, aswell as to improve overall accuracy through pre- and postprocessing, are introduced. Additionally,the motion sensors present on the platform are leveraged to assess the user’s movement during arecording.Other e↵orts of the research focus on the implementation of some of these methods and algorithmsto run on the mobile device, which is limited in memory, computational power and energy.The results of this research show that it is possible to extract heart and breathing rate metricswhen the phone is held to the user’s ear, by actively controlling the screen and camera, andprocessing the resulting data to select segments of su�cient quality. Finally, these methods areshown to be able to run on the device itself, resulting in a fully contained software solution thatcan run on devices many consumers already own.

near-earVital Signs Camera

iii

Page 5: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin
Page 6: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Contents

Contents v

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Vital signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Photoplethysmography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Remote photoplethysmography . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related work 5

2.1 PPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 rPPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Initial exploration 7

3.1 Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Backlight as a light source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Screen wavelength filter and camera wavelength sensitivity . . . . . . . . . 93.3 Dynamic camera setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Experimental set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4.1 Experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Video-based methods 21

4.1 Assessing performance of existing algorithms . . . . . . . . . . . . . . . . . . . . . 214.1.1 Independent component analysis . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Chrominance-based algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.3 Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.4 Singular Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.5 Implementation and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 234.1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.1.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Partial image processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2.1 Non-uniform segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2.2 Uniform segmentation (block processing) . . . . . . . . . . . . . . . . . . . 284.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

near-earVital Signs Camera

v

Page 7: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CONTENTS

4.3 Motion detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.1 Video analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Motion sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.3 Stable period selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Video motion compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4.1 Block-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4.2 Planar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4.3 Method of assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Backlight strobing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5.2 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.6 Heart rate and respiratory rate interaction . . . . . . . . . . . . . . . . . . . . . . . 384.6.1 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Implementation 41

6 Conclusions 43

6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Bibliography 45

vi near-earVital Signs Camera

Page 8: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

List of Figures

1.1 Overview of the 4 main challenges in the scope of the problem statement. . . . . . 3

3.1 a) Custom app for testing screen and camera settings and recording raw video data.b) Custom app for testing real-time performance of various algorithms. . . . . . . . 8

3.2 Setup for screen wavelength filter and camera wavelength sensitivity experiments. . 103.3 Color spectra emitted by backlight and blocked by color filter. . . . . . . . . . . . . 103.4 Camera captures with di↵erent screen colors as the only illuminant (left-to-right:

red, green, blue, white). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 Channel intensity distribution for red screen light. . . . . . . . . . . . . . . . . . . 113.6 Channel intensity distribution for green screen light. . . . . . . . . . . . . . . . . . 123.7 Channel intensity distribution for blue screen light. . . . . . . . . . . . . . . . . . . 123.8 Channel intensity distribution for white screen light. . . . . . . . . . . . . . . . . . 133.9 Montage of frames of a recording with automatic ISO adjustment. . . . . . . . . . 153.10 Average green channel signal of a recording with automatic ISO adjustment. . . . 153.11 Experimental setup for both ’static’ and ’talking’ recordings. . . . . . . . . . . . . 173.12 Frames from the recordings of each subject (rows) for each condition (columns). . . 19

4.1 SNR for 5 static subjects using di↵erent algorithms. . . . . . . . . . . . . . . . . . 254.2 SNR for 5 talking subjects using di↵erent algorithms. . . . . . . . . . . . . . . . . 264.3 SNR for 5 walking subjects using di↵erent algorithms. . . . . . . . . . . . . . . . . 264.4 a) Still from a recording, showing dynamic range across di↵erent parts of the ear. b)

Mask superimposed over frame, indicating selected pixels for ranked order bright-ness processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Example of a highlight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Results of selected pixels excluding 40% pixels with lowest or highest brightness,

together with reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7 Correlation between absolute di↵erence with reference heart rate and di↵erent block

metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.8 Motion quantification over duration of video recording (video-based). . . . . . . . . 324.9 Thresholded motion quantification over duration of video recording. . . . . . . . . 324.10 Removal of stable streaks below a minimal length. . . . . . . . . . . . . . . . . . . 334.11 Motion compensation masks using 3DRS for static subject and window length of

4s (a) and 8s (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.12 Example of alternating image capture with and without backlight. . . . . . . . . . 364.13 a) Trace of green channel. b) Traces with stride, showing individual traces with

and without backlight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.14 a) Spectra of traces with and without backlight. b) Spectrum of extracted component. 374.15 Pulse oximeter PPG signal showing low-frequency e↵ect of breathing on the pulse

signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.16 Frequency spectrum of pulse oximeter PPG signal showing heart rate peak side-

bands resulting from modulation by the breathing rate. . . . . . . . . . . . . . . . 394.17 Detection of spectra that exhibit respiratory rate and heart rate interaction pattern. 39

near-earVital Signs Camera

vii

Page 9: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

LIST OF FIGURES

5.1 Example of implemented demonstration App. . . . . . . . . . . . . . . . . . . . . . 42

6.1 Overview of the e↵orts performed in this research across the signal processing chain. 44

viii near-earVital Signs Camera

Page 10: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

List of Tables

3.1 Experimental reference data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1 Results using di↵erent algorithms on static subjects, along with reference. . . . . . 244.2 Results using di↵erent algorithms on talking subjects, along with reference. . . . . 244.3 Results using di↵erent algorithms on walking subjects, along with reference. . . . . 244.4 Absolute error with respect to reference [BPM] for full-frame processing as well as

ranked brightness region selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.5 Results for selecting video segments with limited motion. . . . . . . . . . . . . . . 334.6 Results for block-based and planar motion compensation on 4s windows. . . . . . . 354.7 Results for block-based and planar motion compensation on 8s windows. . . . . . . 354.8 Results for heart rate detection based on interaction with breathing rate. . . . . . 40

near-earVital Signs Camera

ix

Page 11: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin
Page 12: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Chapter 1

Introduction

Recently, there has been a clear increase in interest in personal health monitoring, as evidencedby numerous consumer-oriented devices to track health-related metrics such as heart rate, oxy-genation levels, pedometry and sleep patterns [1]. However, these require additional expenses, aswell as commitment to regular use or wear. Still, personal health monitoring can be very valuable,since certain health metrics can be indicative of current conditions such as stress [2], or potentialphysical [3] or even psychological risks [4]. Additionally, aggregated health data could be used toimprove the e↵ect of urban planning on people’s well-being [5].Therefore, even though gaining insight into one’s health could ultimately result in the motivationof lifestyle changes, for initial and wide adoption of these new technologies, they should be asaccessible as possible. Ideally, metrics should be collected without requiring any attention fromor changed behavior of the user. As a particular application of this idea, this project aims toexplore how a smartphone can be used to automatically collect physiological metrics while mak-ing a phone call. The starting point for this will be heart rate detection using the phone’s frontcamera, leveraging heart rate extraction methods that will be further detailed in the remainder ofthis Chapter. The envisioned application of the results of this project is a smartphone App thatallows for automatic heart rate measurements during regular phone calls. Users could then simplyinstall the application and let it run on the background. No further interaction is then required,but users can open the app to see the results of previous measurements as well as trends. Thismakes the user interaction entertaining rather than a chore, and might thereby be more likely tomotivate people to learn more about their daily health-related habits.For this project, the Apple iPhone has been used as a platform. The iPhone covers a significantmarket share with only a limited number of di↵erent hardware models (for the fourth quarter of2014, global market share was 20.4% with only 4 di↵erent models being sold [6]). This allows anyhardware-specific results of this project to immediately be applicable to a wide market.

The remainder of this chapter o↵ers an overview of some of the basic concepts involved, as wellas an overview of the particular challenges of the project.

near-earVital Signs Camera

1

Page 13: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 1. INTRODUCTION

1.1 Vital signs

Vital signs are measurements of the body’s most basic functions that can help in assessing aperson’s general health. The 4 most widely used vital signs are:

• heart rate

• respiratory rate

• blood pressure

• body temperature

The range of values for each metric that is considered to be ’normal’ depends on a person’sage, sex, general fitness and activity level, and can furthermore di↵er between individuals for avariety of reasons. Establishing a personal baseline for these metrics can therefore be valuable byproviding context for their values when a person develops medical symptoms and visits a medicalprofessional.

1.2 Photoplethysmography

The classic way of monitoring heart rate is through electrocardiography (ECG). However, a fullECG requires attaching up to 10 electrodes to the subject, which is both uncomfortable and timeconsuming. While ECG provides very detailed information about the electrical activity of theheart, if less detailed information such as only the heart rate is required, a device that can besimply clipped onto the subject’s finger can be used. Such a device, referred to as a pulse oximeter,relies on photoplethysmography instead of electrical activity.Photoplethysmography (PPG) is based on the light absorbing characteristics of oxygenated hemo-globine. When light passes through subcutaneous tissue, part of it is absorbed by skin tissue, andpart of it is absorbed by the oxyhemoglobin in the blood vessels. Because the volume of oxy-hemoglobin varies due to the cardiac cycle, the amount of absorbed light will reflect this variation.By measuring the light reflected by (or transmitted through) the skin, a signal containing thisvariation can be obtained. This signal will contain a sum of the static (DC) component causedby the skin absorbance, as well as the varying (AC) component caused by the varying volume ofoxyhemoglobin.The variation of the volume of oxyhemoglobine is the result of complex underlying physiologicalprocesses, including cardiac activity and respiration. Correct processing and interpretation of thePPG therefore also potentially allows multiple vital signs to be extracted from it, such as heartrate, breathing rate, and oxygen saturation levels.

1.2.1 Remote photoplethysmography

Since PPG relies on light (rather than electrical signals as in an ECG), this method can alsopotentially allow for camera-based applications. However, a regular PPG measurement device,by being in direct contact with the skin and providing its own light source, will only record thispart of the light coming from this light source that has travelled through or is reflected by theskin. Therefore, even though only part of this light will have been absorbed in some amount byoxyhemoglobin, the signal to noise ratio will be fairly high. However, when increasing the distancebetween the light sensor and the skin, the light sensor will also pick up ambient light. This ambientlight could be constant, but could also exhibit periodic or non-periodic transients. In either casethis decreases the signal to noise ratio. Additionally, when the sensor and light source are nolonger a�xed to the skin surface, even small body movements will result in changes in the amountof light that is reflected by the skin into the light sensor. This introduces additional noise in thesignal, which makes remote PPG (rPPG) extraction much more challenging than a regular PPG.

2 near-earVital Signs Camera

Page 14: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 1. INTRODUCTION

1.3 Problem statement

The goal of this project is to assess the potential for using a smartphone for automatically collectingvital signs of a user. As a specific starting point, the detection of a user’s heart rate while makinga phone call will be explored.

1.4 Overview

Figure 1.1: Overview of the 4 main challenges in the scope of the problem statement.

Based on the Problem statement, the following 4 main challenges are identified:

• Acquisition

• Pre-processing

• PPG extraction algorithms

• Post-processing

For each challenge, di↵erent methods will be explored. While discussing these methods, theirrelation to each of these challenges will be illustrated using the graphic shown in Figure 1.1.

Because solutions to these challenges can a↵ect each other, exploring all methods pertaining toone challenge at a time is not the best approach. The explored methods are therefore furthergrouped in two chapters:

• Initial explorations, where Acquisition will be the main focus

• Video-based methods, where methods pertaining to all challenges with regard to video pro-cessing will be discussed. The focus is on the latter 3 challenges.

near-earVital Signs Camera

3

Page 15: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin
Page 16: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Chapter 2

Related work

2.1 PPG

Various camera-based heart rate extraction algorithms have been proposed in previous work,roughly to be divided in contact PPG and remote PPG. Contact camera-based PPG requires asubject to place their fingertip directly on the camera lens [7][8][9]. Karen et. al. proposed methodsto assess the optimal Region Of Interest (ROI) in the camera frame [10], based on metrics resultingfrom a segmentation-based peak-detection method [11].

2.2 rPPG

Regarding remote PPG (rPPG), Sun et. al. used video of the palm of the hand illuminated witha custom infrared lighting setup [12]. Most other e↵orts in rPPG use footage of a subject’s face,either using one of the color channels averaged over a region of interest (ROI) [13] or using theaveraged Red, Green and Blue (RGB) color channels as input to the Independent ComponentAnalysis (ICA) decomposition method to obtain a cleaner PPG component [14][15][16][17]. Sincesubject movement can lead to artifacts, some of these methods use face detection to automaticallyselect the ROI in each frame. This is however not applicable in the current project.De Haan et. al. have proposed methods to compensate for the motion of skin relative to the sensorcaused by physical motion [18]. One method is based on the assumption that variations caused bymotion relative to the light source should be equal for all color channels, and therefore the ratiobetween them should remain unchanged. This ratio should therefore be a more motion robustsignal. However, since the total observed color is a combination of di↵use and specular reflectionand their respective ratio to the resulting color can change with motion as well, this can still causea change in the ratio between color channels. Therefore, they also propose a method that maps theRGB signals to two orthogonal chrominance signals, which are both equally a↵ected by motion,but whose ratio will change as a result blood volume changes. Although these algorithms are stillbased on the assumption that motion causing pixels to focus on di↵erent parts of the skin can beeliminated by spatial averaging (over an ROI that is selected based on face detection), which doesnot hold in the case of this project, they are an interesting starting and reference point for theresearch in this project.

2.3 Challenges

A specific application of PPG extraction algorithms to close-up footage of the ear has not beenfound. Additionally, no work on mobile PPG extraction has been found that is not based eitheron a form of facial tracking, or on direct contact between the subject and the camera. The specificapplication that is subject of this study is therefore believed to be original and innovative.

near-earVital Signs Camera

5

Page 17: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 2. RELATED WORK

This also illustrates the challenge of this problem explored in this project, since it aims to operatewithout direct contact between sensor and skin (introducing motion artifacts and less favorableand controllable lighting conditions), but without being able to use face tracking for stabilization.

6 near-earVital Signs Camera

Page 18: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Chapter 3

Initial exploration

In this chapter, the initial e↵orts are presented. These consist of an exploration of which datais available given the platform, as well as how best to collect data, both for further use in theresearch performed in this project, as well as in a potential final application.Specifically, the properties and degrees of freedom pertaining to the hardware are explored, suchthat a suitable solution can be selected to perform data acquisition on several subjects.

3.1 Platform

An iOS application has been created to explore the platform and assess the constraints it imposes(Fig. 3.1a). Firstly, the camera performance and accessibility was investigated. The operatingsystem (OS) allows access to camera frames through a call-back function (a function that is calledupon a certain event, such as arrival of a new sample in a queue or bu↵er). Since this is asyn-chronous, the timing of the call-backs are not directly usable to assess the actual timing betweenframes. However, each frame is assigned a time stamp. This allowed confirmation that the framerate was stable. The image bu↵er further allows access to the raw image data, with 32 bits perpixel: 8 bits per Red, Green and Blue color channel, and an unused ’alpha’ channel. Writing theframe to storage memory for later access however, results in dropped frames. This is solved byperforming this in a separate thread.Next, access to the motion sensors is tested. The platform contains a 3-axis accelerometer and a3-axis gyroscope. Access to sensor values is possible either through polling or through a call-backfunction. For an initial assessment, a compound metric of all three accelerometer axes is polledand stored in the unused ’alpha’ byte for each frame.

Regarding camera settings, the seventh version of the operating system (iOS 7) only allows twooptions: ’automatic settings’, or ’fix current settings’. A possible solution is explored by makinguse of the proximity sensor, which is used by default to disable the screen when holding the phoneto the ear. On event of ’close proximity’ (the OS only allows a binary proximity state), a timer isset for 1 second. On firing of the timer, the camera settings are then fixed. This allows the camerasettings to automatically be adjusted to the ’phone call’ conditions, but are then prevented fromchanging during the recording.The most recent version of the OS, iOS 8, allows more specific camera settings. The possibilitiesthis enables will be discussed in the next Chapter.

Lastly, to obtain an impression of the computational power available, several chrominance-basedPPG extraction algorithms (see Section 4.1.2) have been implemented on the device (Fig. 3.1b).These (unoptimized) algorithms were able to run in real-time, resulting in a CPU usage of ⇡ 20%.This means that, even though energy consumption should of course always be minimized for useon a battery-powered device, plenty of computational power is available for additional processing

near-earVital Signs Camera

7

Page 19: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

Figure 3.1: a) Custom app for testing screen and camera settings and recording raw video data.b) Custom app for testing real-time performance of various algorithms.

if necessary. This allows the remainder of this research to focus on accuracy, with only a smalleremphasis on power optimization.

8 near-earVital Signs Camera

Page 20: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

3.2 Backlight as a light source

When making a phone call, much of the ambient light that could be reflected or absorbed bythe ear might get blocked by the phone itself. To improve the signal, the goal is therefore to usethe screen backlight as an additional light source.

3.2.1 Screen wavelength filter and camera wavelength sensitivity

Since the light absorption of oxyhemoglobin is wavelength-dependent [19], selecting a particularwavelength as the light source, as well as using a sensor with a particular wavelength sensistivity,could potentially improve the signal-to-noise ratio (SNR). It is therefore interesting to look furtherinto the wavelengths that can be emitted by the backlight, and the wavelength sensitivity of thecamera sensor.

Emission spectrum

The screen of the iPhone (and many other commercially available smartphones) consists of twomain parts :

• a ’white’ backlight, consisting of blue Light Emitting Diodes (LEDs) and a phosphor coatingto generate a broad spectrum of wavelengths

• a roster of color filters, each letting through only one of the colors Red, Green and Blue

The spectrum of the light coming out of the screen, depends therefore on the spectrum of thelight initially generated by the backlight, and the absorption spectrum of each color filter. Toassess the available range of frequencies, the screen spectrum was measured with an OceanViewSpectrometer in a room that could be completely darkened with the screen set to fully Red, Greenand Blue respectively. The light spectra that can be generated, are limited to any sum of thesethree individual spectra. The measured spectra are shown in Figure 3.3.

Camera wavelength sensitivity

The specificity with which a certain wavelength can be selected, ultimately depends on the imagesensor. The image sensor consists of an array of three di↵erent types of light sensors, each ofwhich is most sensitive to light with wavelengths corresponding to either Red, Green or Blue.However, these sensors are sensitive to a band of frequencies rather than one single specific fre-quency. Moreover, since some of these bands overlap, some wavelengths can generate a signal inmore than one type of sensor, albeit in di↵erent amounts. Therefore, it is interesting to assess thesensitivity of the sensors to each of the possible (orthogonal) spectra that can be generated by thedisplay. It is likely that they have a behavior similar to that of Figure 3.3. However, since thiscannot directly be measured, it has to be approximated.To achieve this, in the same darkened room as for the emitted spectrum experiment (Figure 3.2),each of the separate colors Red, Green, Blue and White (all subpixels) are shone on a white pieceof paper using the display, and an image is captured using the front phone camera (with fixedchannel gains, i.e. fixed white balance). The resulting images are shown in Figure 3.4. For eachof these images, the intensity levels of all pixels are divided between ten linear bins for each colorchannel (red, green, blue). The results are shown in Figures 3.5, 3.6, 3.7 and 3.8.

near-earVital Signs Camera

9

Page 21: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

Figure 3.2: Setup for screen wavelength filter and camera wavelength sensitivity experiments.

Figure 3.3: Color spectra emitted by backlight and blocked by color filter.

From these figures, we can see that the ’Red’ sensor is also slightly sensitive to the ’Green’backlight, and just barely sensitive to the ’Blue’ backlight. The ’Green’ sensor is also reason-ably sensitive to the ’Red’ backlight, and slightly less so to ’Blue’. Lastly the ’Blue’ sensor is alsosomewhat sensitive to ’Green’, and slightly less sensitive to ’Red’.This means that the ’Red’ channel appears to have the smallest sensitivity band, followed by blueand green respectively. More specifically, the ’Green’ sensor has the broadest sensitivity which,given Figure 3.3, has to have a minimum range of approximately 460� 590nm. Since the relativePPG amplitude is highest at ⇡ 550nm [19], the following two conclusions follow:

• The Green camera channel will be more sensitive to PPG modulation than the other channels

10 near-earVital Signs Camera

Page 22: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

• Since the Green channel is also sensitive to light emitted by the ’Blue’ and ’Red’ backlight,but these wavelengths will only exhibit a small PPG modulation, the ’Blue’ and ’Red’illumination will likely decrease the total PPG modulation of the signal in the Green channel.Therefore, assuming no other noise sources, the best Signal-to-Noise ratio will be achievedby using only Green backlight and selecting the Green image sensor.

Figure 3.4: Camera captures with di↵erent screen colors as the only illuminant (left-to-right: red,green, blue, white).

Figure 3.5: Channel intensity distribution for red screen light.

Discussion

It is likely that other sources of noise will be present in a recording, such as camera motion andenvironmental light. Using the full range of backlight and camera channels might be useful intrying to suppress this noise. This will be further explored in the next Chapter.

near-earVital Signs Camera

11

Page 23: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

Figure 3.6: Channel intensity distribution for green screen light.

Figure 3.7: Channel intensity distribution for blue screen light.

12 near-earVital Signs Camera

Page 24: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

Figure 3.8: Channel intensity distribution for white screen light.

near-earVital Signs Camera

13

Page 25: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

3.3 Dynamic camera setting

Since a phone is held close to the ear during a phone call, the amount of light available to the camerasensor could be limited. However, the exact amount is a function of the environmental light andthe exact distance between the ear and camera, and might di↵er between every call or even duringa call. Because both a lack of light and a saturation of the camera signal decreases the PPG signalstrength, it is important to optimize the camera settings dynamically, if possible. However, whileit is possible to enable automatic camera settings on the iPhone, continuously changing settingscould actually introduces a new source of noise. Additionally, automated settings optimized forvisually pleasing pictures or footage might not be optimal for PPG extraction. Therefore, a customoptimization is devised and implemented.

3.3.1 Methodology

There are several settings a↵ecting how a frame is captured: exposure, ISO level, and ’whitebalance’ or channel gains. Since a high frame-rate is desirable for PPG extraction [17], the framerate is fixed to the maximum supported rate. This automatically limits the maximum exposure.Since exposure determines the amount of photons actually contributing to the resulting sample,a long exposure is desirable as well. To obtain a stable exposure time, the exposure time is set tohalf the inter-frame period. Finally, to obtain a signal value that fits within the 8 bits availableper color channel, the ISO level can be adjusted. This is done by calculating the mean color valuesof a frame in real-time, and adjusting the ISO levels accordingly. However, since changing settingswill lead to a signal segment that cannot be used for processing, changes should not occur toooften either. This is remedied in two ways:

• After an ISO level change, another change cannot occur in the next two seconds.

• The color means and associated ISO levels contain hysteresis, such that no oscillation betweenISO levels will occur

3.3.2 Results

Figure 3.9 shows a montage of frames from a recording with automatic ISO adjustment. Notethat although ISO already started out relatively high, as visible in the white top left frames, whenthe phone was actually put up to the ear, this resultated in near-black images. As a result, theISO was automatically increased further, leading to a more satisfactory image. The mean greenchannel time series of this segment is shown in Figure 3.10.

14 near-earVital Signs Camera

Page 26: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Figure 3.9: Montage of frames of a recording with automatic ISO adjustment.

Figure 3.10: Average green channel signal of a recording with automatic ISO adjustment.

Page 27: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin
Page 28: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

3.4 Experimental set-up

Figure 3.11: Experimental setup for both ’static’ and ’talking’ recordings.

To test the e↵ect of the di↵erent methods and their robustness to di↵erent conditions, dataacquisition was performed using the developed iPhone app as well as a reference pulse oximeter.The camera recordings where performed at 60fps, with a 1

120s exposure and neutral white balance.The ISO is fixed at 1050 for these recordings to obtain recordings with stable conditions duringthe full length of the recording. The backlight is set to full brightness, and the screen is set tofully white to accommodate algorithms that rely on all three color channels to reject artifacts.Recordings were then performed under the following three conditions:

1. Static: the subject is sitting down, holding the phone in their right hand. Their left hand isresting on a table, with a pulse oximeter connected to their left index finger.

2. Talking: as Static, except the subject is reading a poem during the recording. This simulatesmovement induced by speech, as might occur during a phone call.

3. Walking: the subject is walking indoors at regular pace, holding the phone in their righthand. The subject walks at the same pace for 1 minute before starting the recording, toallow the heart rate to settle. Heart rate is then measured immediately after the recordingusing the pulse oximeter, as its performance is unreliable during walking.

Recordings lasted for 1 minute in all cases.

near-earVital Signs Camera

17

Page 29: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 3. INITIAL EXPLORATION

3.4.1 Experimental data

Five subjects (age 23-28) with di↵erent skin tones participated in the experiment. The referenceheart rates (HR) are shown in Table 3.1. For the ’Static’ and ’Talking’ experiments, this is theaverage heart rate during the recording. For the ’Walking’ experiment, this is the heart ratemeasured directly after the recording.Additionally, for ’Walking’, the average walking pace is shown. This can help assess whether thedetected heart rate is likely to be dominated by the walking pace. Breathing rate (BR) is shownfor these recordings as well. Both walking and breathing pace are determined manually basedon audio recorded simultaneously with the camera signal by the application developed for theexperiment.Figure 3.12 shows frames from each of the recordings. This already illustrates the di↵erence insignal obtained between di↵erent subjects, as a result of exact individual positioning of the phonewith respect to the ear, as well as skin tone.

When not mentioned otherwise, this is the set of test data used to obtain the results for thedi↵erent methods discussed in the following chapters.

Subject HR [BPM] pace [SPM] BR [BPM]1s 54 - -1t 72 - -1w 77 105 172s 62 - -2t 61 - -2w 73 103 153s 69 - -3t 75 - -3w 96 106 184s 59 - -4t 67 - -4w 69 115 225s 84 - -5t 93 - -5w 112 121 19

Table 3.1: Experimental reference data.

18 near-earVital Signs Camera

Page 30: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Figure 3.12: Frames from the recordings of each subject (rows) for each condition (columns).

Page 31: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin
Page 32: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Chapter 4

Video-based methods

In this chapter, video-based methods are explored. First, existing algorithms for PPG extractionare applied to this new application to assess their performance under these conditions. Thisprovides a starting and reference point for further, more specific optimization.

4.1 Assessing performance of existing algorithms

As a reference, several existing PPG extraction algorithms are evaluated. As these algorithmshave been designed to rely on a static camera and trackable faces remaining in the frame, theirperformance might not be optimal for close-up use where camera motion and movement of relativeROI outside of the frame is dominant. However, they do implement di↵erent motion-rejectiontechniques, and are therefore a good starting point both as a reference as well as for identifyingspecific weaknesses.

4.1.1 Independent component analysis

Poh et al. [14] proposed performing Independant Component Analysis (ICA) on all three RGBcolor channels. ICA is a method to separate multiple signals in as many independent components,minimizing Gaussianity. Since PPG and (course) motion are expected to be generally unrelated,they are also expected to be separated into di↵erent components. The challenge then remains ofselecting the component deemed to carry the PPG signal. Since the PPG signal is highly periodic,the component with the highest periodicity is selected.

4.1.2 Chrominance-based algorithms

De Haan et al. [18] proposed several chrominance-based algorithms. These are proposed asalternatives to Blind Source Separation (BSS) techniques, such as ICA, as an alternative thatdoes not rely on identification of the correct component. This is important both when motion isperiodic as well, e.g. during walking, or when there are relatively fast changes in the environmentthat prevent making an accurate long observation required for frequency analysis. The di↵erentalgorithms will be explained below.

near-earVital Signs Camera

21

Page 33: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

RoverG

The initial assumption is that, while changes in oxyhemoglobin concentration will a↵ect the ab-sorption of certain wavelengths more than others, changes due to motion with respect to the lightsource will a↵ect the amount of reflected light proportionally for all color channels. Therefore, aratio of two normalized color channels would not be a↵ected. This results in the equation

S =Gn

Rn� 1 (4.1)

XoverY

While RoverG deals with the di↵use part of the light, there is also a part of the light that is reflectedfrom the skin, referred to as specular reflection. The assumption here is that this specular reflectionadds a proportional fraction across all color channels. This fraction is identical for all channelsassuming white light. A change in this reflection would therefore e↵ectively result in a change inbrightness, but not color. The color channels are now first mapped to two orthogonal chrominancechannels, which would both be equally a↵ected by changes in specular reflection. The ratio ofthese signals should therefore be una↵ected by changes in reflection due to motion:

X = R�G (4.2)

Y = 0.5R+ 0.5G�B (4.3)

S =Xn

Yn� 1 (4.4)

Fixed

Since XoverY assumes white illumination, its performance might su↵er under other types of illu-mination. A solution is proposed by introducing a fixed skin-tone, i.e. a standardized skin-tonethat has a low error with regard to actual skin-tone, assuming white illumination. Conversely, bymultiplying the color channels by this normalized skin tone, the algorithm can be made to workcorrectly independent of the color of the illuminant. The normalized signals then become:

Xs = 3Rn � 2Gn (4.5)

Ys = 1.5Rn +Gn � 1.5Bn (4.6)

S =Xs

Ys� 1 (4.7)

Xsmin↵Ys

In the Fixed algorithm, the PPG signal is expected to be present in the di↵erence between theXs and Ys signals. However, this di↵erence might be small compared to the fluctuations of Xs

and Ys themselves. In turn, if the standardized skin tone is slightly o↵, this could still result in adisturbance in the resulting signal. Therefore, a final improvement is proposed by equalizing thestandard deviation of both chrominance signals. For best results, this is performed on band-passedversions Xf and Yf of these signals. The resulting signal, referred to as Xsmin↵Ys, then becomes:

S = Xf � ↵Yf , (4.8)

with

↵ =�(Xf )

�(Yf )(4.9)

22 near-earVital Signs Camera

Page 34: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

4.1.3 Empirical Mode Decomposition

Empirical Mode Decomposition (EMD) is a decomposition method that decomposes a single signal(as opposed to e.g. ICA, which uses multiple input signals) into a set of small-band components.The number of components is data-dependent. Assuming that the heart rate signal is small-band,i.e. a near-constant frequency (for a suitably small time window), this method could potentiallyprovide an improved separation of signal from artifacts. [20]Each component resulting from EMD is an Intrinsic Mode Function (IMF). An IMF is a signalwith these two properties:

1. The number of extrema and zero-crossings di↵er by at most 1

2. The mean between the envelope of the local maxima and that of the local minima is zero

The process with which EMD constructs these IMFs is called sifting. First, the upper and lowerenvelopes of the input signal are generated by connecting the extrema with cubic splines. Themean of the envelopes is then subtracted from the signal, resulting in a new signal. This processis then repeated for the new signal, until the remaining signal is an IMF (or a stopping criterion isreached). This IMF is than subtracted from the original signal, after which this process is repeatedto construct more IMFs, until the remaining signal has become monotonic (or its amplitude fallsbelow a certain threshold).

4.1.4 Singular Spectrum Analysis

Similar to EMD, Singular Spectrum Analysis (SSA) decomposes an input signal into multiplesmall-band components. It further aims to separate components in trends, periodic signals andnoise. [21] In the case of SSA, noise is assumed to be non-periodic. This means that if thesignal actually does contain periodic noise, some kind of additional classification of the resultingcomponents is still required.SSA consists of first constructing a matrix consisting of delayed versions of the input signal, whichare called lagged vectors. The resulting matrix is called the trajectory matrix. Then, SingularValue Decomposition (SVD) is applied to this matrix, resulting in Eigentriples. These are thengrouped such that reconstructions with a small Euclidian distance are merged together. Theresulting matrices are then diagonally averaged to construct the final decomposed time series.

4.1.5 Implementation and evaluation

The chrominance algorithms have been implemented in Matlab. For ICA, EMD and SSA, existingMatlab implementations have been used [22][23]. Since the latter three methods result in multiplecomponents, further processing is necessary to select one component for heart rate extraction. ForICA, the component with the highest periodicity within the heart rate range (0.6-3 Hz) is selected.For EMD and SSA, this is less appropriate, because components are small-band and might not beproportionally scaled. Instead, for these methods, the first component (i.e. the one carrying thehighest frequency) where the dominant frequency falls within the heart rate range is selected.

The di↵erent algorithms are finally compared using two separate metrics:

• Signal-to-Noise Ratio (SNR): the power of the frequency spectrum around the referenceheart rate frequency (and the first harmonic) is compared to the power in the remainder ofthe spectrum (within the heart rate range).This allows to determine whether the heart rate frequency is significantly represented in thesignal.

• Mean error (ME): the heart rate is expected to be represented by the peak frequency withinthe heart rate range. By comparing this frequency with the reference frequency, the accuracyof the signal can be assessed.

near-earVital Signs Camera

23

Page 35: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

The reason for using these two metrics is that, while the peak frequency might not represent theheart rate, the heart rate could still be present in the signal to a certain extent. Alternatively,even if the peak frequency does represent the heart rate, di↵erent algorithms might provide adi↵erent relative strength. Using both metrics together allows assessment of the performance ofall algorithms relative to each other in either case.

4.1.6 Results

Static Ref Green RoverG XoverY Fixed Xsmin↵Ys ICA EMD SSA1 54 48 47 63 50 65 47 98 452 62 72 80 50 85 52 77 140 383 69 51 70 56 99 58 58 84 454 59 60 77 77 67 84 53 60 605 84 84 55 84 63 65 92 107 39

ME - 7 15 10 17 15 9 32 21

Table 4.1: Results using di↵erent algorithms on static subjects, along with reference.

Talking Ref Green RoverG XoverY Fixed Xsmin↵Ys ICA EMD SSA1 72 48 102 72 82 67 99 64 402 61 49 71 54 61 81 68 99 493 75 52 71 76 93 78 81 68 414 67 66 79 85 86 56 84 160 675 93 59 76 60 67 88 103 91 59

ME - 19 15 12 15 9 13 30 22

Table 4.2: Results using di↵erent algorithms on talking subjects, along with reference.

Walking Ref Green RoverG XoverY Fixed Xsmin↵Ys ICA EMD SSA1 77 55 63 68 52 72 68 111 552 73 105 51 55 53 65 77 105 1053 96 46 73 52 58 55 99 40 384 69 60 60 62 49 89 54 119 605 112 61 48 55 48 56 66 121 121

ME - 33 26 27 33 26 15 36 26

Table 4.3: Results using di↵erent algorithms on walking subjects, along with reference.

Tables 4.1, 4.2 and 4.3 show the BPM estimates for 5 subjects for the di↵erent algorithmsbeing evaluated under each of the three conditions, along with the the value based on a pulseoximeter signal, and the mean error between these values. Figures 4.1, 4.2 and 4.3 show the SNRfor these algorithms across the same 5 subjects.

4.1.7 Discussion

For the Static condition, frequency analysis of the green channel o↵ers the lowest mean error,followed by ICA and the XoverY algorithm. However, for the Talking condition, Xsmin↵Ys

o↵ers the lowest mean error, followed by XoverY. The simple green channel analysis now performssignificantly worse. For the Walking condition, the ICA mean error is lowest, followed by RoverG

24 near-earVital Signs Camera

Page 36: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.1: SNR for 5 static subjects using di↵erent algorithms.

and Xsmin↵Ys. The green channel again does not perform well on mean error in this conditioneither, and looking at the SNR graphs, it is clear that this method is least robust to disturbances.Between ICA, RoverG, XoverY and Xsmin↵Ys, taking into consideration that Talking is likelya better representation of people making small involuntary movements during a phone call thanduring Static (although subjects where not specifically instructed to keep still in either condition),Xsmin↵Ys performs best, having the lowest mean error in that condition and generally abetterSNR than the other algorithms, safe for ICA. The performance of ICA remains notable, especiallyfor the Walking condition.

4.1.8 Conclusion

The Xsmin↵Ys algorithm has the best performance in the ’phone call’ configuration compared toother chrominance methods and ICA. The results also show that this algorithm o↵ers both a goodstarting and reference point, as well as leaving room for improvement.

near-earVital Signs Camera

25

Page 37: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.2: SNR for 5 talking subjects using di↵erent algorithms.

Figure 4.3: SNR for 5 walking subjects using di↵erent algorithms.

26 near-earVital Signs Camera

Page 38: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

4.2 Partial image processing

Because of the ear’s geometry, and the possible variation in position of the phone with respectto the ear, it is possible that some parts of the image contain a clearer PPG signal than other parts.Averaging over the full frame in such a case means that the resulting signal is of lesser qualitythan if only the correct parts of the image were selected. The challenge is detecting which partof the image to select, since there is no a-priori knowledge about the spatial quality distribution.In the first subsection, non-uniform segmentation will be discussed. The remainder of this sectionwill then treat uniform, i.e. block-based processing.

4.2.1 Non-uniform segmentation

Because of the non-uniform geometry of the ear, and its relation to the camera and light source,di↵erent parts of the image could have di↵erent signal parameters. For example, even if the overallimage might have good lighting conditions, some parts could be under- or overexposed. Theseparts will likely have a lower SNR then well-exposed parts of the image, leading to a lower averageSNR. To counter this e↵ect, ranked-order filtering can be used. A ranked-order filter uses elementsbased on their ranking of a certain feature, such as brightness. In this case, the mean color valuecan be calculated based on only a subset, e.g. 60%, of the pixels, where both the dimmest andthe brightest pixels are discarded.The percentage of the number of pixels used for this step constitutes yet another opportunityfor optimization, as this a↵ect its performance. This has however not been performed as part ofthis project. While generally a lower percentage will allow filtering out larger erroneous parts ofthe image while not necessarily negatively a↵ecting the performance in case most of the imageis correctly lit, a larger percentage is chosen in this particular case to still take advantage of thespatial averaging this allows, in view of the general low-light (and therefore noisy) conditions ofthe application.

Figure 4.4: a) Still from a recording, showing dynamic range across di↵erent parts of the ear. b)Mask superimposed over frame, indicating selected pixels for ranked order brightness processing.

Because the pixel brightness will vary over time, ranking the pixel intensities for each frameand selecting the mid 60% would likely result in di↵erent pixels being selected between frames.This would e↵ectively mean that the ROI could (drastically) change every frame, which is not adesirable behavior here. Therefore, windowing is applied and only pixels that remain in the 60%subset over the full window are selected as ROI. Figure 4.4 shows an example of a frame witha dynamic range related to the ear geometry, and the pixels with a medium intensity across a

near-earVital Signs Camera

27

Page 39: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

400-frame window.The pixels in this ROI are then processed with the Xsmin↵Ys algorithm. Finally, majority votingis used across the windows to determine the heart rate estimate for the recording.

4.2.2 Uniform segmentation (block processing)

For pixel brightness, the image could directly be segmented using the feature itself as segmenta-tion requirement. Essentially, in that case the segmentation and processing steps are combined.For derived features however, such as local SNR (regardless of relative brightness), a separatesegmentation needs to be performed before the processing step. This can be done by dividing theframe in smaller blocks. For every block, one or more metrics can then be derived, and based ontheir value, a decision to include a block for further processing can be made.

Metrics and classification

• Majority voting: each block can be interpreted as an individual video signal that can beprocessed to obtain a heart rate value. Then, the most common heart rate value across allblocks could be interpreted as being correct. This most basic method of block processingassumes that the highest signal-to-noise ratio is only present in a minority of the blocks. Thismight be the case in some situations, e.g. at frame boundaries or small parts of the framethat are not optimally exposed, even if settings are optimal for the full frame in general.By excluding these outliers, the overall SNR could be improved. However if, conversely,a minority of blocks would have the highest SNR, using these method, these would bediscarded. The challenge in such a case would be to properly classify these blocks, which isthe focus of the remaining methods

• SNR: by assessing the block SNR, the blocks with the highest SNR can be selected forfurther processing. In this instance, since the actual signal is yet unknown, the SNR hasto be defined as the ratio between the dominant frequency peak in a restricted frequencyrange, and the remaining power in that frequency range.

• Spatial brightness variability: the spatial makeup of a block can contain information aboutthe likelihood that it will contain a clean PPG signal. For example, due to the ear’s geometry,some parts of the image could contain ’highlights’, i.e. parts of the ear closest to the cameraand usually with low perfusion, see Figure 4.5. These blocks are likely to contain a lowPPG signal strength, but can induce strong noise due to high-contrast parts moving in andout of the block (or frame) due to motion of the camera (and light) relative to the ear. Bylooking at the brightness variance across a block, a measure for it’s spatial smoothness canbe obtained, and high-contrast areas can be identified.

• Temporal brightness variability: during movement of the camera (and light) with respectto the ear, some blocks might capture di↵erent parts of a ’uniform’ region of the ear, whilesome other blocks might capture high-contrast parts. While movement along high-contrastparts will lead to strong artifacts, blocks capturing movement across uniform regions mightstill be able to provide a reliable PPG signal. To this end, an appropriate metric could bethe brightness variance over time of a block.

• EMD: by performing EMD on each block (see Section 4.1.3), individual components with adominant frequency within the heart rate range can be selected. In this case, componentswith the highest frequency can be selected. Alternatively, the component with the highestfrequency could be selected, assuming noise is typically of a lower frequency than heart rate.Note that, due to the way EMD components are constructed, higher harmonics of the heartrate will not be present as components.

• SSA: classification using SSA (see Section 4.1.4) is similar to using EMD

28 near-earVital Signs Camera

Page 40: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.5: Example of a highlight.

To assess whether these metrics are suitable to be used to decide whether a block should beincluded for further processing or not, for every metric, its correlation with the absolute di↵erencebetween the dominant frequency (within 0.7Hz and 2Hz) and the reference heart rate is calculated.

4.2.3 Results

Figure 4.6: Results of selected pixels excluding 40% pixels with lowest or highest brightness,together with reference.

Figure 4.6 shows the results for the non-uniform segmentation using a brightness ranked-orderfilter, excluding the 20% brightest and darkest pixels. Table 4.4 shows the absolute errors betweenfull-frame processing and ranked brightness region selection with respect to the reference value.The ranked-order results show an overall improvement for the Static and Talking conditions. Forthe Static recording of subject 5, the error is high both with and without region selection, pointing

near-earVital Signs Camera

29

Page 41: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

static talking walkingfull rank full rank full rank

1 11 0 5 14 5 222 10 10 20 1 8 143 11 - 3 - 41 -4 25 4 11 5 20 145 19 36 5 15 56 66

Table 4.4: Absolute error with respect to reference [BPM] for full-frame processing as well asranked brightness region selection.

Figure 4.7: Correlation between absolute di↵erence with reference heart rate and di↵erent blockmetrics.

towards a low SNR issue (the full region estimate, while better, might still be based on an artifactrather than heart rate). For the Walking condition, performance is worse, likely due to the smallerstable region.For recordings 7, 8 and 9 (subject 3), there is no heart rate estimate. This is due to the factthat the mask is empty, which in turn means that there were no pixels that where part of the60% brightest pixels during any full window duration. Although in this case this method does notimprove signal extraction, it might be a good indicator that the signal is so close to noise levelthat it should be rejected, rather than allowed to enter any kind of extraction algorithm. As such,this pre-processing step could still be valuable for overall accuracy improvement.The correlation between the absolute di↵erence with the reference heart rate and the di↵erentblock classification metrics is shown in Figure 4.7. We first note that for all metrics, correlationis rather low across subjects. Additionally, we note that correlation is not consistently positive ornegative across subjects for any of the metrics.

4.2.4 Discussion

Selecting pixels with a limited brightness range could potentially worsen performance if extremevalues are directly caused by the PPG signal, or when the dynamic range across the frame is low.Furthermore, the window length for pixel selection constitutes a trade-o↵ between allowing morepulse-periods in a window, and allowing better spatial averaging by having more pixels in themask itself.With regard to block processing, unfortunately none of the defined metrics correlates enough withthe reference heart rate to allow for reliable classification of blocks based on either of these metrics.

30 near-earVital Signs Camera

Page 42: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

4.3 Motion detection

Motion of the camera with respect to the ear is a major source of noise. Due to their closeproximity, even small non-deliberate movements of the head and/or the hand holding the phonecan lead to drastic di↵erences in the video image. Therefore, detecting motion can be valuablefor selecting video segments that are suitable for heart rate detection, and rejecting segments thatare likely to be noisy.

4.3.1 Video analysis

One way of detecting motion is through analysis of the video signal itself. Specifically, the Sum ofAbsolute Di↵erences (SAD) between two frames can be used as a metric for change in an image.Changes larger than those resulting from the (expected) PPG signal can then be attributed tomotion. This makes it possible to select signal segments that remain below this threshold, for acertain minimal amount of time, i.e. allowing for several heart beat periods.

4.3.2 Motion sensors

Another way to assess motion is by using the motion sensors that are present on the platformby default. The advantage is that this requires less processing than using the video signal. Apotential downside could be that movement of the phone could still result in a stable video signalas long as the position of the camera with respect to the ear doesn’t change. However, in practice,such movement very often does result in a change in relative position as well.Since motion data for detection periods of movement does not need to be very high frequency, themotion sensor values are polled whenever a frame arrives. A threshold again determines whethermotion is within an acceptable range, and segments of su�cient length where this is the case areselected for further processing.

4.3.3 Stable period selection

For each 1-minute recording, both video and motion signal metrics are computed (Figure 4.8).Then an appropriate threshold is applied to each (based on visual inspection, constant across allsubjects), see Figure 4.9. Then, constant periods below a minimum length are discarded to onlyconsider periods containing at least 1 heart rate period (Figure 4.10). For a general application,each period of su�cient length could be evaluated. For the purpose of assessing the potentialbenefit of this method however, the longest period is selected for further processing, assuming alonger (stable) recording will allow for the best heart rate estimation.

near-earVital Signs Camera

31

Page 43: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.8: Motion quantification over duration of video recording (video-based).

Figure 4.9: Thresholded motion quantification over duration of video recording.

4.3.4 Results

Table 4.5 shows the results when selecting segments of the video signal with limited motion, basedon either the video signal itself or the gyroscope.These results show that in some cases, there is little to no change between processing only astable part of the segment and processing the full segment. In other cases, both stability detectionmethods provide a more accurate estimation than assessment of the full segment.Lastly there aresome cases where the estimate based on video stability di↵ers significantly from that based onmotion stability. This could be due to the thresholds that have been selected for each method,resulting in a slightly di↵erent delineation of periods of physical motion by each method. Fine-tuning of these thresholds might therefore bring the results of both methods more into agreement.However, the video-based method, while having the advantage of being based on the same signalthat is used for the PPG extraction, will su↵er from (parts of) the image pixel intensities beingclose to zero or saturation level. In the case of disagreement between the results of both methods,

32 near-earVital Signs Camera

Page 44: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.10: Removal of stable streaks below a minimal length.

static talking walkingref full video gyro ref full video gyro ref full video gyro

1 54 65 65 65 72 67 66 54 77 72 46 842 62 52 68 68 61 81 81 82 73 65 48 713 69 58 58 58 75 78 78 78 96 55 55 1034 59 84 84 88 67 56 56 70 69 89 88 655 84 65 92 92 93 88 87 88 112 56 57 65ME 15.2 12.2 13 8.8 9.2 10 26 34.2 13.4

Table 4.5: Results for selecting video segments with limited motion.

the gyroscope-based method is generally more accurate. Interestingly, motion detection using thegyroscope allows stable sections to be extracted even during a continuous walking motion.

4.3.5 Conclusion

The results presented in this section show that rejecting parts of the footage based on a stabilitymetric, is an e↵ective way to reject noisy parts of the signal that would otherwise result in inaccur-ate heart rate estimates. Specifically, using a metric based on the gyroscope signal allows for morerobust motion rejection, even when the video signal is already noisy due to exposure artifacts.

near-earVital Signs Camera

33

Page 45: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

4.4 Video motion compensation

Using motion detection (Section 4.3), it is possible to reject parts of the video signal to preventincorrect heart rate estimates. However, using video motion compensation it might be possibleto recuperate some of the PPG signal in video segments containing motion artifacts. To thisend, three motion compensation techniques have been explored: two block-based methods, andone plane-based method. These methods provide di↵erent trade-o↵ points between computationalcomplexity and accuracy. It is therefore relevant to assess the di↵erence in performance betweenthese methods for this particular application.

4.4.1 Block-based

Block-based motion detection works by dividing the input image into blocks, and then matchingthese blocks to a second image. The di↵erence in the blocks’ positions is assumed to representtheir respective movement.Various block-based motion estimation algorithms exist, each with di↵erent strengths. The maintrade-o↵ for these algorithms is generally between speed and accuracy, although this also dependson the contents of the video material. Here, 3DRS was used, which o↵ers high accuracy at lowcomputational cost [24].

4.4.2 Planar

Since the ear is the only object that is assumed to be in the camera frame, the assumption could bemade that the motion of all parts of the image exhibit related movement. In that case, it might bepossible to detect features in the full image, and track these across frames to calculate the RigidTransform matrices –extend–. Note that the latter is required to compensate for rotation andperspective transformation, which would otherwise cause di↵erent parts of the frame to exhibitdi↵erent motion vectors within the image, even if they are actually on the same plane. TheopenCV framework –REF– has been used for the feature detection, Rigid Transform calculation,and inverse image transforms.

4.4.3 Method of assessment

Because images can drift out of frame, and tracking accuracy can deteriorate over time (or rathernumber of frames), tracking duration should not be too long. However, for accurate heart rateextraction, duration should also not be too short, preferably such that it can contain at least acouple of heart beat periods. For assessment of the di↵erent methods, both 4s and 8s segments(with 50% overlap) will be compared. After each segment, tracking is reset.Because of motion, correction can lead to pixels that cannot be track throughout the full segmentduration. This results in parts of the image ’disappearing’ over the time-coarse of the segment.Simply taking the average of each full frame (equaling the average of ’trackable’ pixels in eachframe) will however still contain a strong motion component, since pixels will ’disappear’ due tomotion. Therefore, only pixels that can be tracked across the full window, i.e. that carry a signalthroughout the full window, are included in the average. An example of masks for windows of 4sand 8s using 3DRS on a static subject is shown in Figure 4.11.

34 near-earVital Signs Camera

Page 46: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.11: Motion compensation masks using 3DRS for static subject and window length of 4s(a) and 8s (b).

static talking walkingref raw 3DRS Planar ref raw 3DRS planar ref raw 3DRS Planar SPM

1 54 65 134 102 72 67 88 88 77 72 130 102 1052 62 52 46 123 61 81 120 88 73 65 105 60 1033 69 58 120 63 75 78 70 91 96 55 81 120 1064 59 84 81 102 67 56 113 91 69 89 67 81 1155 84 65 109 84 93 88 88 53 112 56 60 123 121ME 15.2 38.8 31.6 8.8 26.2 24.6 26 30.8 17

Table 4.6: Results for block-based and planar motion compensation on 4s windows.

static talking walkingref raw 3DRS Planar ref raw 3DRS planar ref raw 3DRS Planar SPM

1 54 65 84 130 72 67 - 141 77 72 88 88 1052 62 52 67 81 61 81 56 63 73 65 77 70 1033 69 58 53 74 75 78 49 127 96 55 46 67 1064 59 84 49 120 67 56 - 88 69 89 - 60 1155 84 65 - 49 93 88 - 95 112 56 - 88 121

Table 4.7: Results for block-based and planar motion compensation on 8s windows.

4.4.4 Results

Results are shown in Tables 4.6 and 4.7 for 4s and 8s windows, respectively. Note that for somecases, there is no heart rate for the 3DRS method when using 8s windows. This means that(estimated) motion during the 8s window is such that no part of the image is visible during thewhole window. For the Walking condition, the pace (SPM) is shown as well.Although in a few cases, motion compensation seems to improve the estimate (eg. subject 5 static,4s for Planar, subject 2 static, 8s for 3DRS), most motion-compensation estimates deviate highlyfrom the reference heart rate. The lack of clear features (see also Figure 3.12) is a likely causefor estimation errors. One possible solution could be to asses the motion estimates and furtherlimit the allowed (speed of) position changes. However, since small physical movements can leadto significant position changes of an ROI, such a limitation would also limit the potential benefitof such a method. Alternatively, an assessment of the motion in a particular segment (such asdiscussed in the previous Section), together with an assessment of the quality of (potential) featuresin the image, could be used to determine whether motion compensation should be performed onthis segment. This could potentially allow a portion of segments that would otherwise be rejectedfor processing to still be used for acquiring a reliable estimate. However, since compensation ofsignificant motion, which is a main source of noise, would still not be reliably possible, this fractionof segments able to potentially benefit from motion compensation would likely be small.

near-earVital Signs Camera

35

Page 47: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

4.5 Backlight strobing

4.5.1 Methodology

The backlight is only enabled every other frame. This results in two interleaved video signals:

• ambient light

• ambient light + backlight

However, the aim is to obtain a signal based on the backlight illumination alone. Therefore, thetwo interleaved signals are separated (see Figure 4.13), normalized, and used as inputs to an ICAalgorithm (see Section 4.1.1). Then, the correlation between each of the two resulting componentsand the ambient light signal is computed. The component having the lowest correlation is thenselected as the component most based on the signal resulting from the backlight illumination.

Figure 4.12: Example of alternating image capture with and without backlight.

4.5.2 Assessment

To assess the performance of the strobing method, the extracted backlight-illuminated componentspectrum is compared to that of the signal containing both environmental and backlight illumin-ation. The results are shown in Figure 4.14.

36 near-earVital Signs Camera

Page 48: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.13: a) Trace of green channel. b) Traces with stride, showing individual traces with andwithout backlight.

The extracted component has an SNR of �12.5dB, compared to �14.6dB for the raw signalcontaining both backlight and environmental light, showing that strobing can indeed provide animproved signal.However, since the operating system does not allow explicit synchronous control of the screen andcamera, it is not possible to guarantee when frames are captured or when the screen or backlight istoggled. When both are not in syn, this leads to artifacts due to a bleed-over between the signalsthat are supposed to be interleaved. Stable synchronization could not be achieved above 20Hz,resulting in an e↵ective sampling rate of 10fps. This is a significant di↵erence with the maximumof 60fps that could otherwise be attained.

Figure 4.14: a) Spectra of traces with and without backlight. b) Spectrum of extracted component.

4.5.3 Conclusion

Strobing allows an improvement of the SNR compared to a raw signal with the same frame rate.However, the implementation comes at the cost of reduced temporal resolution; at least by half,but currently more due to technical limitations. If future hard- or software would allow for abetter synchronization between screen or backlight and image capture, this could therefore provea viable method for signal improvement. However, currently, a sampling rate of 60fps will resultin a more reliable signal than a derived signal of 10fps.

near-earVital Signs Camera

37

Page 49: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

4.6 Heart rate and respiratory rate interaction

We recall that the plethysmographic signal, besides containing the cardiac pulse signal, isfurthermore modulated by respiration. This can clearly be seen in the pulse oximeter PPG signalshown in Figure 4.15. Respiratory rate can be indicative of a developing health issue [25][26][27],and would therefore be a valuable metric to extract from the video signal as well.

Figure 4.15: Pulse oximeter PPG signal showing low-frequency e↵ect of breathing on the pulsesignal.

Additionally however, this modulation of the heart rate could also help in more reliably identi-fying the heart rate peak in the frequency spectrum. As shown in Figure 4.16, the modulation dueto respiration causes side-bands located at fh±fb. The identification of such a pattern can increasethe likelihood that a frequency peak in the extracted signal is indeed caused by the changes inblood volume.

To this end, the following steps are performed in a windowed approach, applied the signalobtained by using the Xmin↵Y algorithm:All peaks that have adjacent peaks at a distance equal (up to a margin ") to the frequency of apeak also present in the signal, are identified. Then, the peak exhibiting the largest amplitudedi↵erence with the relevant adjacent peaks is selected as the peak most likely to represent theheart rate. The rationale for this is that, due to the low frequency of typical breathing rate andthe fact that the heart rate signal (or another signal) might not be completely periodic withinthe window period, smaller di↵erences in amplitude between multiple peaks might actually bethe result of frequency analysis inaccuracies rather than being caused by modulation between twofrequencies.Note that it is also possible that no peaks conforming to this pattern are found in a certainwindow. An example of the resulting estimates is shown in Figure 4.17. Further processing toperform outlier detection, such as averaging of majority voting, can then be used to further improvereliability.

38 near-earVital Signs Camera

Page 50: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

Figure 4.16: Frequency spectrum of pulse oximeter PPG signal showing heart rate peak side-bandsresulting from modulation by the breathing rate.

Figure 4.17: Detection of spectra that exhibit respiratory rate and heart rate interaction pattern.

4.6.1 Assessment

To asses the performance of this method, the experimental recordings are processed using a 4second window with 50% overlap. Then, the mode of the resulting estimates is calculated andinterpreted as the heart rate estimate for that segment.

4.6.2 Results

The results are shown in Table 4.8, together with the reference and the Xmin↵Y chrominance-based results for comparison.In the Static case, this method seems to o↵er a definitive improvement, in particular in the case

near-earVital Signs Camera

39

Page 51: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 4. VIDEO-BASED METHODS

of subjects 1, 3 and 5. For the Talking condition, performance is slightly worse, although stillgenerally similar to the results without additional processing. A reason could be that the methodof excluding outliers used here is sensitive to more instantaneous changes in heart rate during therecording.For the Walking condition, although Mean Error is lower using this method, estimates are stilllargely o↵ the reference values.

static talking walkingref chrom. pattern ref chrom. pattern ref chrom. pattern

1 54 65 51 72 67 62 77 72 632 62 52 83 61 81 83 73 65 843 69 58 77 75 78 79 96 55 644 59 84 62 67 56 46 69 89 605 84 65 81 93 88 90 112 56 69ME 15.2 7.6 8.8 12 26 21.8

Table 4.8: Results for heart rate detection based on interaction with breathing rate.

40 near-earVital Signs Camera

Page 52: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Chapter 5

Implementation

This chapter discusses the work performed on the iPhone platform. In addition to the preliminarywork discussed in Section 3.1, the following implementations have been performed:

• Hardware-accelerated Fast Fourier Transform (FFT), to facilitate peak detection in the fre-quency domain.

• Hardware-accelerated band pass filtering.

• The EMD Matlab code [23] has been converted to C using the Matlab Code Generator.Wrapper functions have then been written to interface between the Matlab MEX multi-dimensional array structures and Objective-C arrays.

• Logging of reliable estimates: if 5 identical heart rate estimations have occurred (estimatesare performed once per second), the estimate is considered correct and is saved. Additionally,it is exported to the Health App that is available by default in iOS 8, so that the user canview the trend of these measurements over time along with their other health metrics.

• Although strobing is not used in the final application (for the sake of being able to use thefull 60 fps temporal resolution), the most reliable toggling of the backlight has been achievedthrough the use of OpenGL. The OS provides the OpenGL context with a high priority tofulfill its frame rate, making this more reliable than changing the screen color using the UIthread, which happens asynchronously and o↵ers no guarantee of update speed.

• Custom graphs have been created to o↵er feedback to the user.

These implementations finally resulted in a proof-of-concept App as shown in Figure 5.1.

near-earVital Signs Camera

41

Page 53: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Figure 5.1: Example of implemented demonstration App.

Page 54: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Chapter 6

Conclusions

This work has presented an overview of the challenges involved in performing automatic heart rateestimation when using a smart-phone to make a phone call. The various parts of the processingchain involved were determined and explored.Existing PPG extraction algorithms were evaluated against this new paradigm of near but non-contact video-based PPG recording. The importance of explicit and dynamic camera settings hasbeen shown, along with the e↵ect of screen backlight as a source of illumination.Furthermore, several preprocessing techniques to improve heart rate estimation accuracy havebeen devised and tested. With regard to selection of parts of the image that o↵er a better SNRthan other parts, two di↵erent methods have been explored: non-uniform segmentation based onranked order filtering of pixel brightness, and block-based processing. Brightness-based segment-ation o↵ers a potential for improvement, but at the cost of a trade-o↵ between reduced spatialor temporal information. Block-based processing does not have this limitation, but does requirea metric to evaluate which blocks contain the highest PPG signal component. Unfortunately, asuitable metric has not been found.As the main source of noise is due to relative motion between the camera and the skin surface,assessing the amount of motion can be used as a metric to reject temporal parts of the signal, toprevent false heart rate estimates. Methods to do this, both based on the camera signal itself, aswell as using motion sensor data, have been developed and shown to be e↵ective. Rejection basedon the gyroscope signal proved to be the most e↵ective, even while walking.Since motion compensation could potentially allow for recorded signals to be used for reliableestimation instead of being rejected, ultimately resulting in the availability of more reliable estim-ates, two methods to achieve this have been explored: one based on block-based motion estimation,and one based on planar motion estimation. Unfortunately, both methods rely on the availabilityof su�ciently clear features (high-contrast image parts), which is problematic given the proximityof the camera to relatively smooth skin surfaces and the lighting conditions. As a result, erroneousmotion estimation is likely to occur, resulting in a degradation of signal quality. Lastly, a novelmethod has been proposed to improve control over the illumination source used for the PPG ex-traction. Strobing can improve the SNR in the presence of noisy ambient light. While this comesat the cost of decreased e↵ective sampling rate, and is currently limited by restrictive hardwarecontrol, this could prove e↵ective in future iterations of the platform.

near-earVital Signs Camera

43

Page 55: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

CHAPTER 6. CONCLUSIONS

Figure 6.1: Overview of the e↵orts performed in this research across the signal processing chain.

6.1 Future work

Future e↵orts should focus on optimizing the thresholds for determining signal stability, suchthat as many reliable estimates as possible can be extracted. Additionally, further dynamic useof di↵erent parts of the signal chain could possibly be exploited, for example by selecting theillumination and PPG extraction algorithm to be used in real-time based on signal properties.Building on the availability of pulse data, a future e↵ort could aim at accurately determiningderived metrics, such as heart rate variability (HRV). HRV is a measure for the (variation in)time between a pair of pulses. While a certain variability between pulse periods is indicativeof healthy cardiac performance, it can also be indicative of abnormalities such as arrhythmia.Respiratory information, it’s detection which is also covered in the present work, can further aidin the interpretation of these results [28]. As such, HRV can provide yet another insight in apersons’ cardiac health.

44 near-earVital Signs Camera

Page 56: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

Bibliography

[1] J. Meyer, “Innovations in Ubicomp Products for Everyone !,” pp. 13–16, 2014. 1

[2] M. a. Hogervorst, A. M. Brouwer, and W. K. Vos, “Physiological correlates of stress inindividuals about to undergo eye laser surgery,” Proceedings - 2013 Humaine AssociationConference on A↵ective Computing and Intelligent Interaction, ACII 2013, pp. 473–478, 2013.1

[3] F. Gouaux, L. Simon-Chautemps, S. Adami, M. Arzi, D. Assanelli, J. Fayn, M. Forlini,C. Malossi, A. Martinez, J. Placide, G. Ziliani, and P. Rubel, “Smart devices for the early de-tection and interpretation of cardiological syndromes,” 4th International IEEE EMBS SpecialTopic Conference on Information Technology Applications in Biomedicine, 2003., pp. 291–294, 2003. 1

[4] A. Mendez and M. Lado, “Heart of Darkness Heart Rate Variability on patients with risk ofsuicide,” Conference: Information Systems and Technologies (CISTI), 2013. 1

[5] A. Clarke and R. Steele, “How personal fitness data can be re-used by smart cities,” Proceed-ings of the 2011 7th International Conference on Intelligent Sensors, Sensor Networks andInformation Processing, ISSNIP 2011, pp. 395–400, 2011. 1

[6] BGR, “Apple market share figures q4 2014.” http://bgr.com/2015/03/03/

iphone-market-share-q4-2014-gartner/. Accessed: 06-05-2015. 1

[7] K. Banitsas, P. Pelegris, T. Orbach, D. Cavouras, K. Sidiropoulos, and S. Kostopoulos, “Asimple algorithm to monitor HR for Real Time treatment applications,” in Proceedings of the9th International Conference on Information Technology and Applications in Biomedicine(ITAB), no. November in 9, 2009. 5

[8] P. Pelegris, K. Banitsas, T. Orbach, and K. Marias, “A novel method to detect heart beatrate using a mobile phone.,” Conference proceedings : Annual International Conference ofthe IEEE Engineering in Medicine and Biology Society (EMBS)., vol. 2010, pp. 5488–5491,Jan. 2010. 5

[9] J. B. Bolkhovsky, C. G. Scully, and K. H. Chon, “Statistical analysis of heart rate and heartrate variability monitoring through the use of smart phone cameras.,” Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and BiologySociety. IEEE Engineering in Medicine and Biology Society. Conference, vol. 2012, pp. 1610–3, Jan. 2012. 5

[10] W. Karlen, J. M. Ansermino, G. a. Dumont, and C. Sche↵er, “Detection of the optimal regionof interest for camera oximetry,” Proceedings of the Annual International Conference of theIEEE Engineering in Medicine and Biology Society, EMBS, vol. 2013, pp. 2263–2266, Jan.2013. 5

near-earVital Signs Camera

45

Page 57: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

BIBLIOGRAPHY

[11] W. Karlen, J. M. Ansermino, and G. Dumont, “Adaptive pulse segmentation and artifactdetection in photoplethysmography for mobile applications,” Proceedings of the Annual In-ternational Conference of the IEEE Engineering in Medicine and Biology Society, EMBS,vol. 2012, pp. 3131–3134, Jan. 2012. 5

[12] Y. Sun, S. Hu, V. Azorin-Peris, S. Greenwald, J. Chambers, and Y. Zhu, “Motion-compensated noncontact imaging photoplethysmography to monitor cardiorespiratory statusduring exercise.,” Journal of biomedical optics, vol. 16, p. 077010, July 2011. 5

[13] W. Verkruysse, L. O. Svaasand, and J. S. Nelson, “Remote plethysmographic imaging usingambient light.,” Optics express, vol. 16, pp. 21434–21445, Dec. 2008. 5

[14] M.-Z. Poh, D. J. McDu↵, and R. W. Picard, “Non-contact, automated cardiac pulse meas-urements using video imaging and blind source separation.,” Optics express, vol. 18, no. 10,pp. 10762–10774, 2010. 5, 21

[15] T. Pursche, J. Krajewski, and R. Moeller, “Video-based heart rate measurement from humanfaces,” Proceedings of the IEEE International Conference on Consumer Electronics (ICCE),2012, pp. 544–545, 2012. 5

[16] S. Kwon, H. Kim, and K. S. Park, “Validation of heart rate extraction using video imaging on abuilt-in camera system of a smartphone,” Proceedings of the Annual International Conferenceof the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 2174–2177, 2012. 5

[17] M. Z. Poh, D. J. McDu↵, and R. W. Picard, “Advancements in noncontact, multiparameterphysiological measurements using a webcam,” IEEE Transactions on Biomedical Engineering,vol. 58, pp. 7–11, Jan. 2011. 5, 14

[18] G. De Haan and V. Jeanne, “Robust pulse rate from chrominance-based rPPG,” IEEE Trans-actions on Biomedical Engineering, vol. 60, pp. 2878–2886, Oct. 2013. 5, 21

[19] J. a. Crowe and D. Damianou, “The wavelength dependence of the photoplethysmogram andits implication to pulse oximetry,” 1992 14th Annual International Conference of the IEEEEngineering in Medicine and Biology Society, vol. 6, pp. 2423–2424, 1992. 9, 10

[20] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung,and H. H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysis,” Proceedings of the Royal Society A: Mathematical,Physical and Engineering Sciences, 1998. 23

[21] D. Broomhead and G. P. King, “Extracting qualitative dynamics from experimental data,”Physica D: Nonlinear Phenomena, vol. 20, no. 2-3, pp. 217–236, 1986. 23

[22] F. Cardoso, “Jade for real-valued data.” http://perso.telecom-paristech.fr/

~

cardoso/

Algo/Jade/jadeR.m. Accessed: 05-06-2014. 23

[23] Y. Kopsinis and S. McLaughlin, “Improved EMD using doubly-iterative sifting and highorder spline interpolation,” Eurasip Journal on Advances in Signal Processing, vol. 2008,no. 1, p. 128293, 2008. 23, 41

[24] G. de Haan, P. W. a. C. Biezen, H. Huijgen, and O. a. Ojo, “True-motion estimation with3-D recursive search block matching,” IEEE Transactions on Circuits and Systems for VideoTechnology, vol. 3, no. 5, pp. 368–379, 1993. 34

[25] M. Buist, S. Bernard, T. V. Nguyen, G. Moore, and J. Anderson, “Association betweenclinically abnormal observations and subsequent in-hospital mortality: A prospective study,”Resuscitation, vol. 62, pp. 137–141, 2004. 38

46 near-earVital Signs Camera

Page 58: Eindhoven University of Technology MASTER Near-ear vital ... · The main concept is based on remote photoplethysmography (rPPG), which measures the volume changes of oxygenated hemoglobin

BIBLIOGRAPHY

[26] T. J. Hodgetts, G. Kenward, I. G. Vlachonikolis, S. Payne, and N. Castle, “The identificationof risk factors for cardiac arrest and formulation of acti v ation criteria to alert a medicalemergency team,” vol. 54, pp. 125–131, 2002. 38

[27] M. Cretikos, An Evaluation of Activation and Implementation of the Medical EmergencyTeam System. PhD thesis, Sydney: The University of New South Wales, 2006. 38

[28] D. W. Young, “Self-measure of heart rate variability (HRV) and arrhythmia to monitor and tomanage atrial arrhythmias : personal experience with high intensity interval exercise (HIIE)for the conversion to sinus rhythm,” Frontiers in Physiology — Computational Physiologyand Medicine, vol. 5, no. July, pp. 1–4, 2014. 44

near-earVital Signs Camera

47