10
Eye-CU: Sleep Pose Classification for Healthcare using Multimodal Multiview Data Carlos Torres Victor Fragoso Scott D. Hammond Jeffrey C. Fried * B.S. Manjunath Univ. of California Santa Barbara West Virginia University * Santa Barbara Cottage Hospital {carlostorres@ece, shammond@tmrl, manj@ece}.ucsb.edu [email protected] [email protected] Abstract Manual analysis of body poses of bed-ridden patients re- quires staff to continuously track and record patient poses. Two limitations in the dissemination of pose-related thera- pies are scarce human resources and unreliable automated systems. This work addresses these issues by introducing a new method and a new system for robust automated clas- sification of sleep poses in an Intensive Care Unit (ICU) environment. The new method, coupled-constrained Least- Squares (cc-LS), uses multimodal and multiview (MM ) data and finds the set of modality trust values that minimizes the difference between expected and estimated labels. The new system, Eye-CU, is an affordable multi-sensor modu- lar system for unobtrusive data collection and analysis in healthcare. Experimental results indicate that the perfor- mance of cc-LS matches the performance of existing meth- ods in ideal scenarios. This method outperforms the latest techniques in challenging scenarios by 13% for those with poor illumination and by 70% for those with both poor illu- mination and occlusions. Results also show that a reduced Eye-CU configuration can classify poses without pressure information with only a slight drop in its performance. Keywords: Sleep poses, sleep analysis, patient position- ing, coupled-constrained, Least-Squares optimization, mul- timodal, multiview, ICU monitoring, pose classification, healthcare, patient monitoring, modality contribution. 1. Introduction New innovative methods for non-disruptive monitoring and analysis of patient-on-bed body configurations, such as those observed in sleep-pose patterns, add objective metrics for evaluating and predicting health status. Clinical scenar- ios where body poses of patients correlate to medical con- ditions include sleep apnea – where the obstructions of the airway are affected by supine positions [16]. Mothers-to- be are recommended to lay on their sides to improve fetal blood flow [11]. The findings of [2, 8, 19] correlate sleep positions with various effects on patient health. In these studies, the findings highlight the importance of automated analysis of patient sleep poses in natural scenarios. They substantiate the need of this work and its potential bene- fits. The benefits include improving patient quality of life and quality of care through continuously monitoring patient poses, correlating poses to medical diagnosis, and optimiz- ing treatments by manipulating poses. The proposed Eye- CU system and cc-LS fusion method tackles the classifica- tion of sleep poses in a natural ICU environment with condi- tions that range from bright and clear to dark and occluded. The system collects sleep-pose data using an array of RGB- D cameras and a pressure mat. The method extracts features from each modality, estimates unimodal pose labels, fuses unimodal decisions based on trust (priors) values, and infers a multimodal pose label. The trusts are estimated via cc-LS optimization, which minimizes the distance between the or- acle and multimodal matrices. In this context, the term mul- timodal refers to the various Eye-CU sensor measurements. 1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed in [9, 10, 13] but are limited to scenes with constant illumi- nation and/or without occlusions. The deformable parts model approach, commonly used in RGB images presented in [20] requires images with relatively uniform illumination and is limited to minor self-occlusions. The discriminative approach from [17] uses depth images and is robust to il- lumination changes. It requires clean depth segmentation and contrast and is susceptible to occlusions. A controlled method to classify human sleep poses using RGB images and a low-resolution pressure array is presented in [7]. It uses normalized geometric and load distribution features in- terdependently and requires a clear view of the patient. The cc-LS work builds upon our previous work [18], where features from R, D, and P sensors from a single view are combined to overcome challenging scene condi- tions. The trust method uses unimodal features to pro- 1 arXiv:1602.02343v2 [cs.CV] 22 Feb 2016

f g.ucsb.edu [email protected] [email protected] ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Eye-CU Sleep Pose Classification for Healthcare usingMultimodal Multiview Data

Carlos Torresdagger Victor FragosoDagger Scott D Hammonddagger Jeffrey C Fried BS Manjunathdagger

daggerUniv of California Santa Barbara DaggerWest Virginia University Santa Barbara Cottage Hospitalcarlostorresece shammondtmrl manjeceucsbedu victorfragosomailwvuedu jfriedsbchorg

Abstract

Manual analysis of body poses of bed-ridden patients re-quires staff to continuously track and record patient posesTwo limitations in the dissemination of pose-related thera-pies are scarce human resources and unreliable automatedsystems This work addresses these issues by introducing anew method and a new system for robust automated clas-sification of sleep poses in an Intensive Care Unit (ICU)environment The new method coupled-constrained Least-Squares (cc-LS) uses multimodal and multiview (MM )data and finds the set of modality trust values that minimizesthe difference between expected and estimated labels Thenew system Eye-CU is an affordable multi-sensor modu-lar system for unobtrusive data collection and analysis inhealthcare Experimental results indicate that the perfor-mance of cc-LS matches the performance of existing meth-ods in ideal scenarios This method outperforms the latesttechniques in challenging scenarios by 13 for those withpoor illumination and by 70 for those with both poor illu-mination and occlusions Results also show that a reducedEye-CU configuration can classify poses without pressureinformation with only a slight drop in its performance

Keywords Sleep poses sleep analysis patient position-ing coupled-constrained Least-Squares optimization mul-timodal multiview ICU monitoring pose classificationhealthcare patient monitoring modality contribution

1 IntroductionNew innovative methods for non-disruptive monitoring

and analysis of patient-on-bed body configurations such asthose observed in sleep-pose patterns add objective metricsfor evaluating and predicting health status Clinical scenar-ios where body poses of patients correlate to medical con-ditions include sleep apnea ndash where the obstructions of theairway are affected by supine positions [16] Mothers-to-be are recommended to lay on their sides to improve fetal

blood flow [11] The findings of [2 8 19] correlate sleeppositions with various effects on patient health In thesestudies the findings highlight the importance of automatedanalysis of patient sleep poses in natural scenarios Theysubstantiate the need of this work and its potential bene-fits The benefits include improving patient quality of lifeand quality of care through continuously monitoring patientposes correlating poses to medical diagnosis and optimiz-ing treatments by manipulating poses The proposed Eye-CU system and cc-LS fusion method tackles the classifica-tion of sleep poses in a natural ICU environment with condi-tions that range from bright and clear to dark and occludedThe system collects sleep-pose data using an array of RGB-D cameras and a pressure mat The method extracts featuresfrom each modality estimates unimodal pose labels fusesunimodal decisions based on trust (priors) values and infersa multimodal pose label The trusts are estimated via cc-LSoptimization which minimizes the distance between the or-acle and multimodal matrices In this context the term mul-timodal refers to the various Eye-CU sensor measurements

11 Related Work

Computer vision methods using RGB data to detectbody configurations of patients on beds are discussed in[9 10 13] but are limited to scenes with constant illumi-nation andor without occlusions The deformable partsmodel approach commonly used in RGB images presentedin [20] requires images with relatively uniform illuminationand is limited to minor self-occlusions The discriminativeapproach from [17] uses depth images and is robust to il-lumination changes It requires clean depth segmentationand contrast and is susceptible to occlusions A controlledmethod to classify human sleep poses using RGB imagesand a low-resolution pressure array is presented in [7] Ituses normalized geometric and load distribution features in-terdependently and requires a clear view of the patient

The cc-LS work builds upon our previous work [18]where features from R D and P sensors from a singleview are combined to overcome challenging scene condi-tions The trust method uses unimodal features to pro-

1

arX

iv1

602

0234

3v2

[cs

CV

] 2

2 Fe

b 20

16

Figure 1 Diagram of the Eye-CU physical setup showing thepressure mat (left) in green and the camera views(center) top (vt)in red side (vs) in blue and head (vh) in black and the mock-upICU (right) where the system is tested

pose label candidates and infer a multimodal label It im-proves unimodal decisions of Linear Discriminant Analysis(LDA) and Support Vector Classifier (SVC) via modalitytrust Modality trust is defined as the mean classification ac-curacy of the unimodal pose classifiers (under the measuredscene conditions) The trust system uses a high-resolutionpressured mat and its performance relies heavily on a fixedcamera over the patientrsquos bed A trust-adjustment methodaccounts for sensor failures however performance declinesgreatly without pressure data

12 Proposed Work

The work presented in this paper differs from [18] by in-troducing a new probabilistic method to estimate trusts Weuse cc-LS optimization (section 4) to estimate trusts learnmodality priors and improve classification accuracy by upto 30 Instead of using a multimodal system with a sin-gle camera view and a pressure mat the Eye-CU systemuses multimodal and multiview (MM ) data Results sug-gest that combining reduced Eye-CU configurations withcc-LS robustly classifies sleep poses with incomplete viewsand without pressure information Figure 1 shows two per-spective views of the system in the mock-up ICU room

Main Contributions of this work (1) cc-LS a simpleand elegant method to estimate modality trusts which im-proves pose classification accuracy (2) Eye-CU a completemodular MM system that performs sleep pose classifica-tion with very high accuracy in healthcare One node isshown in Figure 2 and the system is currently deployed ina a medical ICU and (3) a fully annotated MM dataset of66000 sleep-pose images 1

2 Eye-CU System Description

The various Eye-CU system configurations depend onthe combination of modalities used RGB (R) depth (D)and pressure (P ) and available camera views head (h) top(t) and side (s) The following configurations are explored

1will be available online at httpvisioneceucsb

Figure 2 Multimodal Eye-CU node with environmental sensorsRGB-D camera aluminum enclosure Panda Board and batterypack Four nodes are used to monitor a medical ICU room

Figure 3 Multimodal and multiview representation of the fe-tal left-oriented pose observed by three RGB-D cameras and onepressure-mat collected using the Eye-CU system

bull Multimodal and Multiview (MM ) uses RDP dataand the h s t views It is the most complex and hasthe best performance but is difficult to deploy

bull Multimodal partial-Multiview (MpM ) uses RDPdata and less than three views MpM with a top viewis equivalent to the one used in competing methods

bull Partial-Multimodal and Multiview (PMM ) uses R Dor RD data from three camera views (hst) Its perfor-mance depends on having all views available

bull Partial-Multimodal partial-Multiview (PMpM ) is thesimples configuration It uses RD data from two views(hs ht st) and sets the lower bound in performance

Why Multimodal Suitability tests (section 5) of exist-ing methods and available modalities indicate that neither asingle modality nor the concatenation of modalities can beused to classify poses in an natural ICU environment

2

Why Multiview The ICU is a dynamic environmentwhere equipment is moved around continuously and canblock sensors and view of the patients A multiview systemimproves classification performance increases the chancesof observing the patients and enables monitoring usingsimple and affordable sensor Cameras do not have contactwith patients and avoid the risk of infections by touch

3 Data CollectionSample MM data collected from one actor in various

poses and scene conditions using all camera views andmodalities is shown in Figure 4 The complete dataset isconstructed with sleep poses collected from five actors ina mock-up ICU setting with a real ICU bed and equip-ment The observations are the set of sleep poses Z =Background Soldier U Soldier D Faller R Faller L LogR Log L Yearner R Yearner L Fetal R Fetal L of sizeL (= |Z|) and indexed by l The letters U and D indi-cate the patient is up-facing and down-facing and letters Land R indicate lying-on-Left and lying-on-Right sides Thevariable Zl is used to identify one specific pose label (egZ0 = Background) The scene conditions are simulated us-ing three illumination levels bright (light sensor with 70-90 saturation) medium (50-70 saturation) and dark(below 50 saturation) as well as four occlusion typesclear (no occlusion) blanket (covering 90 of the actorrsquosbody) blanket and pillow and pillow (between actorrsquos headand upper back and the pressure mat) The illumination in-tensities are based on percent saturation values of an illumi-nation sensor and the occlusions are detected using radio-frequency identification (RFID) and proximity sensors allby NET Gadgeteer The combination of the illuminationlevels and occlusions types generates a 12-element sceneset C = (bright medium dark) times (clear blanket pillowblanket+pillow) The variable c isin C is used to indicate asingle illumination and occlusion combination (eg c = 1indicates bright and clear scene) The dataset is createdby allowing one scene to be the combination of one actorin one pose and under a single scene condition Ten mea-surements are collected from one scene ndash three modalities(RD and synthetic binary masks) from each of the threecamera views in the set V = t h s and one pressureimage (P ) The data collection process includes acquiringthe background (empty bed) and asking the actors to rotatethrough the 10 poses (11 classes including the background)under each of the 12 scene conditions The process is re-peated 10 times for each of the five actors In total thisprocess generates a dataset of 66000 images (five actors times10 sessions times 10 images times 11 classes times 12 scenes)

31 Modalities

This section describes the modalities used by the Eye-CU system system (see Figures 3 and 4) It presents the

modalitiesrsquo basic properties discusses pros and cons andprovides an intuitive justification for their complementaryuse in the cc-LS formulation

RGB Standard RGB video data provides reliable infor-mation to represent and classify human sleep poses inscenes with relatively ideal conditions However most peo-ple sleep in imperfectly illuminated scenarios using sheetsblankets and pillows that block and disturb sensor measure-ments The systems collects RGB color images of dimen-sions 640times 480 from each actor in each of the scene condi-tions and extracts pose appearance features representativeof the lines in the human body (ie limbs and extremities)

Depth Infrared depth cameras can be resilient to illu-mination changes The Eye-CU system uses PrimenseCarmine devices to collect depth data The devices are de-signed for indoor use and can acquire images of dimensions640times480 These sensors use 16 bits to represent pixel inten-sity values which correspond to the distance from sensor toa point in the scene Their operating distance range is 08 mto 35 m and their spatial resolution for scenes 20 m awayis 35 mm for the horizontal (x) and vertical (y) axes and30 mm along the depth (z) axis The systems uses the depthimages to represent the 3-dimensional shape of the posesThe usability of these images however depends on depthcontrast which is affected by the deformation properties ofthe mattress and blanket present in ICU environments

Pressure In preliminary studies the pressure modalityremained constant in the presence of sheets and blanketsThe Eye-Cu systems uses the Tekscan Body Pressure Mea-surement System (BPMS) model BRE5315-4 The com-plete mat is composed of four independent pressure arrayseach with its own handle (ie USB adapter) to measure thepressure distribution on support surfaces The data fromeach of the four arrays was synchronized and acquired us-ing the proprietary Tekscan BPMS software The completepressure sensing area is 19507 mmtimes4267 mm with a totalof 8064 sensing elements (or sensel) The sensel density is1 senselcm2 each with a sensing pressure range from 0 to250 mm Hg (0-5 psi) The images generated using the pres-sure mat have dimensions of 3341times 8738 pixels Althoughthe size of the pressure images is relatively large the gener-ation of such images depends on consistent physical body-mattress contact In particular pillows deformation prop-erties of the mattress and bed configurations (not exploredin this work) can disturb the measurements and the imagesgenerated by the mat In addition proper pressure-imagegeneration requires a sensor array with high resolution andfull bed coverage the use of which can be prohibitively ex-pensive and constrictive due to sanitation procedures andlimited technical support

3

Figure 4 Multimodal and multiview dictionary of sleep poses for a single actor in various sleep configurations and scene conditions Itcontains RD (equalized for display) images from t s and h views and the pressure mat P Images are transformed wrt the t view

32 Feature Extraction

The sensors and camera views are calibrated using thestandard methods from [5] Homography transformationsare computed relative to the top view and gradient and shapefeatures are then extracted from the transformed images

Histogram of Oriented Gradients (HOG) HOG fea-tures are extracted from RGB images to represent sleep-pose limb structures as demonstrated by [3 20] The HOGextraction parameters are four orientations 16-by-16 pix-els per cell and two-by-two cells per block which yield a5776-element vector per image

Geometric Moments (gMOM) Image gMOM featuresintroduced in [6] and validated in [1 15] are used to repre-sent sleep-pose shapes The in-house implementation usesthe raw pixel values from tiled depth and pressure imagesinstead of standard binarized pixel values The six-by-sixtile dimensions are determined empirically to balance ac-curacy and complexity Finally moments up to the third or-der are extracted from each block to generate a 10-elementvector per block The vectors from each of the 36 blocks areconcatenated to form a 360-element vector per image Fig-ure 5 shows how features are extracted from each modality

4 Multimodal-Multiview FormulationExplanation of the method begins with the problem state-

ment in section 41 followed by a description of the single-view multimodal formulation in section 42 This formula-tion is expanded to include multiview data in section 44The multimodal classification framework for a single-viewsystem is shown in Figure 6 which is applied to the setof pose labels Z of size L indexed by l The multimodaldataset (X ) of size K indexed by k is separated for eachscene c isin C The dataset is composed of features extracted

Figure 5 Multimodal representation of the Fetal L pose showingthe features extracted from each modality

from a set of M modalities N = RDP indexed by m(eg fNm

with m = 1 gives fR) The k-th datapoint in thedataset has the form

Xk = fNmM = fR fD fP

=

HOG(R) gMOM(D) gMOM(P )

(1)

where fNmis the feature vector extracted from the m-th

modality These features are used to train the ensemble ofM unimodal SVM (and LDA) classifiers (CLFm) For agiven input datapoint Xk each of the classifiers outputs aprobability vector CLFkm =

[sk1m skLm

]T

where the elements (s) represent the probability of labell given modality feature m The classifier label proba-bilities are computed using the implementations from [12]

4

of Plattrsquos method for SVC and Bayesrsquo rule for LDA Thefeature-classifier combinations are quantified at the trustestimation stage where the unimodal trust values wc =[wc

R wcD wc

P

]Tare computed for specific scene c The

multimodal trusted classifier is formed by fusing the candi-date label decisions from the unimodal classifiers into oneThe objective of this formulation is to find the pose label(Zl) with the highest MM probability for a given inputquery Xk where l is the estimated index label The vari-ables used through out this paper are listed in Table 1

41 Problem Statement

The proposed fusion technique uses probabilistic con-cepts to compute the probability of a given class bymarginalizing the joint probability over the modalities Thejoint probability is calculated from the conditional proba-bility of each class and the set of prior probabilities for eachmodality The conditional probabilities are extracted fromthe classifiers in the ensemble of M -unimodal classifiers(ie P(Z = Zl|X = Xk) = P(Zl|Xk)) and re-written as

P(Zl|Xk) =

Msumm=1

P(Zl|XkM = m)P(M = m) (2)

Methods such as Plattrsquos [14] for SVMs enable the com-putation of conditional probabilities given by

sklm = P(Zl|XkM = m) (3)

However the prior probability for each modality wm =P(M = m) remains unknown The trust method finds theset of priors for each modality m in the ensemble of Mmodalities that approximates the following probability

bkl = P(Z = zl|X = XkOracle) (4)

produced by an oracle-observed datapoint X = Xk The es-timation process is repeated for all crsquos However c is omit-ted to simplify the notation (ie wc becomes w)

The method uses the following coupled optimizationproblem to find the modality priors wm for scene c

minimizew

1

2

Ksumk=1

Lsuml=1

(Msum

m=1

sklmwm minus bkl

)2

subject to 1Tw = 1

0 le wm le 1m = 1 M

(5)

The objective is to find the weights wm that approximatethe oracle bkl for every data point Xk Using the loss inEq 5 the problem becomes a cc-LS optimization problemThis type of problem uses all points and poses labels fromthe training set to find the set of priors that approximates thevalues produced by the oracle for each point Xk at once

VARIABLESSYMBOL DESCRIPTIONA Multimodal matrix isin RUtimesM

am m-th column vector of A with U elementsb Oracle vector isin RU

bm Oracle column vector for modality mC Scenes set (light times occlusion) combinationc Scene index 1 le c le |C|CLFkm Classifier for the m-th modalityfNmk Set of feature M vectors for the k-th datapointD Depth modalityh Head camera viewK Dataset size K = |X |k Datapoint index 1 le k le KL Size of the set of pose labels L = |Z|l Index of the pose label (Zl) 1 le l le L

l Index of the estimated pose label 1 le l le Lllowast Index of the ground truth label (Zl) 1 le llowast le LMM Multimodal and MultiviewMpM Multimodal and partial-MultiviewM Size of modality set M = |N |m Modality index 1 le m leMN Modality setN = RDP indexed by mP Pressure modalitypMM Partial-Multimodal and MultiviewPMpM Partial-Multimodal partial-MultiviewR RGB modalitys Side camera viewsklm Probability of label l from CLFkm

t Top camera viewU Multimodal dimension U = KLV View set V = t s hV Number of views V = |V|v View index 1 le v le V

wc Trusts w =[wR wD wP

]T for scene cwNm Modality trust value (eg wR for m = 1)X Dataset indexed by k (ie Xk)Xk k-th datapoint with fNmk = fR fD fP kY MM dimensions (= KLV )Z Sleep-pose set

Table 1 Variables and their descriptions

42 Multimodal Construction

The estimation method uses cc-LS optimization to min-imize the difference between Oracle (b) and the multi-modal matrix (A) It frames the trust estimation as a lin-ear system of equations of the form Aw minus b = 0 wherethe modality trust values are the elements of the vectorw =

[wR wD wP

]Tthat approximate Aw to b

Construction of the Multimodal Matrix (A) The ma-trix A contains label probabilities for each of the datapointsin the training set (K = |Xtrain|) This matrix has U rows(U = KL) and M columns where L is the total number

5

Figure 6 Diagram of the trusted multimodal classifier for the MpM configuration Image features are extracted from the RDP cameraand pressure data Then the features are used to train unimodal classifiers (CLFm) which are in turn used to estimate the modality trustvalues In the last stage of the MM classifier the unimodal decisions are trusted and combined

of labels (L = |Z|) and M is the number of modalities(M = 3) and has the following structure

A =[STk=1 ST

k=K ]TUtimesM (6)

where Sk(lm) = sklm

Construction of the Multimodal Oracle Vector (b)The vector b is generated by the oracle and quantifies theclassification ability of the combined modalities It is usedto corroborate estimation correctness when compared to theground truth The bm column vectors have U rows

bm =[bTk=1 bT

k=K

]T (7)

where bk =[bkl=1 bkl=L

]T The values of the

bkl elements are set using the following condition

bkl =

1 if l = llowast for Xk

0 otherwise(8)

where l = argmax sklm is the index of the estimated labeland llowast is the index of the ground truth label for Xk

The construction of the oracle b depends on how thecolumns bm (ie unimodal oracles) are combined Thesystem is tested with a uniform construction and the resultsare reported in section 5 In the uniform construction eachmodality has a 1

M voting power and can add up to one via

b =

sumforallm

bm

M (9)

43 Coupled Constrained Least-Squares (cc-LS)

Finally the weight vector w = [wR wD wP ]T is com-

puted by substituting A and b into Eqn (5) and solving thecc-LS optimization problem

minimizew

1

2Aw minus b22

subject to 1Tw = 1

0 le wm le 1m = 1 M

(10)

Intuitively the cc-LS problem finds the modality priorsthat allow the method to fuse information from differentmodalities to approximate the oracle probabilities

44 Multiview Formulation

The bounded multimodal formulation is expanded to in-clude multiview data using V views indexed by v The val-ues that v can take indicate which camera view is used (egv = 1 for the top view v = 2 for the side view and v = 3for the head view) The multimodal and multiview matrixA has the following form

A =[[A(v=1)] [A(v=V )]

]TYtimesM (11)

6

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 2: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Figure 1 Diagram of the Eye-CU physical setup showing thepressure mat (left) in green and the camera views(center) top (vt)in red side (vs) in blue and head (vh) in black and the mock-upICU (right) where the system is tested

pose label candidates and infer a multimodal label It im-proves unimodal decisions of Linear Discriminant Analysis(LDA) and Support Vector Classifier (SVC) via modalitytrust Modality trust is defined as the mean classification ac-curacy of the unimodal pose classifiers (under the measuredscene conditions) The trust system uses a high-resolutionpressured mat and its performance relies heavily on a fixedcamera over the patientrsquos bed A trust-adjustment methodaccounts for sensor failures however performance declinesgreatly without pressure data

12 Proposed Work

The work presented in this paper differs from [18] by in-troducing a new probabilistic method to estimate trusts Weuse cc-LS optimization (section 4) to estimate trusts learnmodality priors and improve classification accuracy by upto 30 Instead of using a multimodal system with a sin-gle camera view and a pressure mat the Eye-CU systemuses multimodal and multiview (MM ) data Results sug-gest that combining reduced Eye-CU configurations withcc-LS robustly classifies sleep poses with incomplete viewsand without pressure information Figure 1 shows two per-spective views of the system in the mock-up ICU room

Main Contributions of this work (1) cc-LS a simpleand elegant method to estimate modality trusts which im-proves pose classification accuracy (2) Eye-CU a completemodular MM system that performs sleep pose classifica-tion with very high accuracy in healthcare One node isshown in Figure 2 and the system is currently deployed ina a medical ICU and (3) a fully annotated MM dataset of66000 sleep-pose images 1

2 Eye-CU System Description

The various Eye-CU system configurations depend onthe combination of modalities used RGB (R) depth (D)and pressure (P ) and available camera views head (h) top(t) and side (s) The following configurations are explored

1will be available online at httpvisioneceucsb

Figure 2 Multimodal Eye-CU node with environmental sensorsRGB-D camera aluminum enclosure Panda Board and batterypack Four nodes are used to monitor a medical ICU room

Figure 3 Multimodal and multiview representation of the fe-tal left-oriented pose observed by three RGB-D cameras and onepressure-mat collected using the Eye-CU system

bull Multimodal and Multiview (MM ) uses RDP dataand the h s t views It is the most complex and hasthe best performance but is difficult to deploy

bull Multimodal partial-Multiview (MpM ) uses RDPdata and less than three views MpM with a top viewis equivalent to the one used in competing methods

bull Partial-Multimodal and Multiview (PMM ) uses R Dor RD data from three camera views (hst) Its perfor-mance depends on having all views available

bull Partial-Multimodal partial-Multiview (PMpM ) is thesimples configuration It uses RD data from two views(hs ht st) and sets the lower bound in performance

Why Multimodal Suitability tests (section 5) of exist-ing methods and available modalities indicate that neither asingle modality nor the concatenation of modalities can beused to classify poses in an natural ICU environment

2

Why Multiview The ICU is a dynamic environmentwhere equipment is moved around continuously and canblock sensors and view of the patients A multiview systemimproves classification performance increases the chancesof observing the patients and enables monitoring usingsimple and affordable sensor Cameras do not have contactwith patients and avoid the risk of infections by touch

3 Data CollectionSample MM data collected from one actor in various

poses and scene conditions using all camera views andmodalities is shown in Figure 4 The complete dataset isconstructed with sleep poses collected from five actors ina mock-up ICU setting with a real ICU bed and equip-ment The observations are the set of sleep poses Z =Background Soldier U Soldier D Faller R Faller L LogR Log L Yearner R Yearner L Fetal R Fetal L of sizeL (= |Z|) and indexed by l The letters U and D indi-cate the patient is up-facing and down-facing and letters Land R indicate lying-on-Left and lying-on-Right sides Thevariable Zl is used to identify one specific pose label (egZ0 = Background) The scene conditions are simulated us-ing three illumination levels bright (light sensor with 70-90 saturation) medium (50-70 saturation) and dark(below 50 saturation) as well as four occlusion typesclear (no occlusion) blanket (covering 90 of the actorrsquosbody) blanket and pillow and pillow (between actorrsquos headand upper back and the pressure mat) The illumination in-tensities are based on percent saturation values of an illumi-nation sensor and the occlusions are detected using radio-frequency identification (RFID) and proximity sensors allby NET Gadgeteer The combination of the illuminationlevels and occlusions types generates a 12-element sceneset C = (bright medium dark) times (clear blanket pillowblanket+pillow) The variable c isin C is used to indicate asingle illumination and occlusion combination (eg c = 1indicates bright and clear scene) The dataset is createdby allowing one scene to be the combination of one actorin one pose and under a single scene condition Ten mea-surements are collected from one scene ndash three modalities(RD and synthetic binary masks) from each of the threecamera views in the set V = t h s and one pressureimage (P ) The data collection process includes acquiringthe background (empty bed) and asking the actors to rotatethrough the 10 poses (11 classes including the background)under each of the 12 scene conditions The process is re-peated 10 times for each of the five actors In total thisprocess generates a dataset of 66000 images (five actors times10 sessions times 10 images times 11 classes times 12 scenes)

31 Modalities

This section describes the modalities used by the Eye-CU system system (see Figures 3 and 4) It presents the

modalitiesrsquo basic properties discusses pros and cons andprovides an intuitive justification for their complementaryuse in the cc-LS formulation

RGB Standard RGB video data provides reliable infor-mation to represent and classify human sleep poses inscenes with relatively ideal conditions However most peo-ple sleep in imperfectly illuminated scenarios using sheetsblankets and pillows that block and disturb sensor measure-ments The systems collects RGB color images of dimen-sions 640times 480 from each actor in each of the scene condi-tions and extracts pose appearance features representativeof the lines in the human body (ie limbs and extremities)

Depth Infrared depth cameras can be resilient to illu-mination changes The Eye-CU system uses PrimenseCarmine devices to collect depth data The devices are de-signed for indoor use and can acquire images of dimensions640times480 These sensors use 16 bits to represent pixel inten-sity values which correspond to the distance from sensor toa point in the scene Their operating distance range is 08 mto 35 m and their spatial resolution for scenes 20 m awayis 35 mm for the horizontal (x) and vertical (y) axes and30 mm along the depth (z) axis The systems uses the depthimages to represent the 3-dimensional shape of the posesThe usability of these images however depends on depthcontrast which is affected by the deformation properties ofthe mattress and blanket present in ICU environments

Pressure In preliminary studies the pressure modalityremained constant in the presence of sheets and blanketsThe Eye-Cu systems uses the Tekscan Body Pressure Mea-surement System (BPMS) model BRE5315-4 The com-plete mat is composed of four independent pressure arrayseach with its own handle (ie USB adapter) to measure thepressure distribution on support surfaces The data fromeach of the four arrays was synchronized and acquired us-ing the proprietary Tekscan BPMS software The completepressure sensing area is 19507 mmtimes4267 mm with a totalof 8064 sensing elements (or sensel) The sensel density is1 senselcm2 each with a sensing pressure range from 0 to250 mm Hg (0-5 psi) The images generated using the pres-sure mat have dimensions of 3341times 8738 pixels Althoughthe size of the pressure images is relatively large the gener-ation of such images depends on consistent physical body-mattress contact In particular pillows deformation prop-erties of the mattress and bed configurations (not exploredin this work) can disturb the measurements and the imagesgenerated by the mat In addition proper pressure-imagegeneration requires a sensor array with high resolution andfull bed coverage the use of which can be prohibitively ex-pensive and constrictive due to sanitation procedures andlimited technical support

3

Figure 4 Multimodal and multiview dictionary of sleep poses for a single actor in various sleep configurations and scene conditions Itcontains RD (equalized for display) images from t s and h views and the pressure mat P Images are transformed wrt the t view

32 Feature Extraction

The sensors and camera views are calibrated using thestandard methods from [5] Homography transformationsare computed relative to the top view and gradient and shapefeatures are then extracted from the transformed images

Histogram of Oriented Gradients (HOG) HOG fea-tures are extracted from RGB images to represent sleep-pose limb structures as demonstrated by [3 20] The HOGextraction parameters are four orientations 16-by-16 pix-els per cell and two-by-two cells per block which yield a5776-element vector per image

Geometric Moments (gMOM) Image gMOM featuresintroduced in [6] and validated in [1 15] are used to repre-sent sleep-pose shapes The in-house implementation usesthe raw pixel values from tiled depth and pressure imagesinstead of standard binarized pixel values The six-by-sixtile dimensions are determined empirically to balance ac-curacy and complexity Finally moments up to the third or-der are extracted from each block to generate a 10-elementvector per block The vectors from each of the 36 blocks areconcatenated to form a 360-element vector per image Fig-ure 5 shows how features are extracted from each modality

4 Multimodal-Multiview FormulationExplanation of the method begins with the problem state-

ment in section 41 followed by a description of the single-view multimodal formulation in section 42 This formula-tion is expanded to include multiview data in section 44The multimodal classification framework for a single-viewsystem is shown in Figure 6 which is applied to the setof pose labels Z of size L indexed by l The multimodaldataset (X ) of size K indexed by k is separated for eachscene c isin C The dataset is composed of features extracted

Figure 5 Multimodal representation of the Fetal L pose showingthe features extracted from each modality

from a set of M modalities N = RDP indexed by m(eg fNm

with m = 1 gives fR) The k-th datapoint in thedataset has the form

Xk = fNmM = fR fD fP

=

HOG(R) gMOM(D) gMOM(P )

(1)

where fNmis the feature vector extracted from the m-th

modality These features are used to train the ensemble ofM unimodal SVM (and LDA) classifiers (CLFm) For agiven input datapoint Xk each of the classifiers outputs aprobability vector CLFkm =

[sk1m skLm

]T

where the elements (s) represent the probability of labell given modality feature m The classifier label proba-bilities are computed using the implementations from [12]

4

of Plattrsquos method for SVC and Bayesrsquo rule for LDA Thefeature-classifier combinations are quantified at the trustestimation stage where the unimodal trust values wc =[wc

R wcD wc

P

]Tare computed for specific scene c The

multimodal trusted classifier is formed by fusing the candi-date label decisions from the unimodal classifiers into oneThe objective of this formulation is to find the pose label(Zl) with the highest MM probability for a given inputquery Xk where l is the estimated index label The vari-ables used through out this paper are listed in Table 1

41 Problem Statement

The proposed fusion technique uses probabilistic con-cepts to compute the probability of a given class bymarginalizing the joint probability over the modalities Thejoint probability is calculated from the conditional proba-bility of each class and the set of prior probabilities for eachmodality The conditional probabilities are extracted fromthe classifiers in the ensemble of M -unimodal classifiers(ie P(Z = Zl|X = Xk) = P(Zl|Xk)) and re-written as

P(Zl|Xk) =

Msumm=1

P(Zl|XkM = m)P(M = m) (2)

Methods such as Plattrsquos [14] for SVMs enable the com-putation of conditional probabilities given by

sklm = P(Zl|XkM = m) (3)

However the prior probability for each modality wm =P(M = m) remains unknown The trust method finds theset of priors for each modality m in the ensemble of Mmodalities that approximates the following probability

bkl = P(Z = zl|X = XkOracle) (4)

produced by an oracle-observed datapoint X = Xk The es-timation process is repeated for all crsquos However c is omit-ted to simplify the notation (ie wc becomes w)

The method uses the following coupled optimizationproblem to find the modality priors wm for scene c

minimizew

1

2

Ksumk=1

Lsuml=1

(Msum

m=1

sklmwm minus bkl

)2

subject to 1Tw = 1

0 le wm le 1m = 1 M

(5)

The objective is to find the weights wm that approximatethe oracle bkl for every data point Xk Using the loss inEq 5 the problem becomes a cc-LS optimization problemThis type of problem uses all points and poses labels fromthe training set to find the set of priors that approximates thevalues produced by the oracle for each point Xk at once

VARIABLESSYMBOL DESCRIPTIONA Multimodal matrix isin RUtimesM

am m-th column vector of A with U elementsb Oracle vector isin RU

bm Oracle column vector for modality mC Scenes set (light times occlusion) combinationc Scene index 1 le c le |C|CLFkm Classifier for the m-th modalityfNmk Set of feature M vectors for the k-th datapointD Depth modalityh Head camera viewK Dataset size K = |X |k Datapoint index 1 le k le KL Size of the set of pose labels L = |Z|l Index of the pose label (Zl) 1 le l le L

l Index of the estimated pose label 1 le l le Lllowast Index of the ground truth label (Zl) 1 le llowast le LMM Multimodal and MultiviewMpM Multimodal and partial-MultiviewM Size of modality set M = |N |m Modality index 1 le m leMN Modality setN = RDP indexed by mP Pressure modalitypMM Partial-Multimodal and MultiviewPMpM Partial-Multimodal partial-MultiviewR RGB modalitys Side camera viewsklm Probability of label l from CLFkm

t Top camera viewU Multimodal dimension U = KLV View set V = t s hV Number of views V = |V|v View index 1 le v le V

wc Trusts w =[wR wD wP

]T for scene cwNm Modality trust value (eg wR for m = 1)X Dataset indexed by k (ie Xk)Xk k-th datapoint with fNmk = fR fD fP kY MM dimensions (= KLV )Z Sleep-pose set

Table 1 Variables and their descriptions

42 Multimodal Construction

The estimation method uses cc-LS optimization to min-imize the difference between Oracle (b) and the multi-modal matrix (A) It frames the trust estimation as a lin-ear system of equations of the form Aw minus b = 0 wherethe modality trust values are the elements of the vectorw =

[wR wD wP

]Tthat approximate Aw to b

Construction of the Multimodal Matrix (A) The ma-trix A contains label probabilities for each of the datapointsin the training set (K = |Xtrain|) This matrix has U rows(U = KL) and M columns where L is the total number

5

Figure 6 Diagram of the trusted multimodal classifier for the MpM configuration Image features are extracted from the RDP cameraand pressure data Then the features are used to train unimodal classifiers (CLFm) which are in turn used to estimate the modality trustvalues In the last stage of the MM classifier the unimodal decisions are trusted and combined

of labels (L = |Z|) and M is the number of modalities(M = 3) and has the following structure

A =[STk=1 ST

k=K ]TUtimesM (6)

where Sk(lm) = sklm

Construction of the Multimodal Oracle Vector (b)The vector b is generated by the oracle and quantifies theclassification ability of the combined modalities It is usedto corroborate estimation correctness when compared to theground truth The bm column vectors have U rows

bm =[bTk=1 bT

k=K

]T (7)

where bk =[bkl=1 bkl=L

]T The values of the

bkl elements are set using the following condition

bkl =

1 if l = llowast for Xk

0 otherwise(8)

where l = argmax sklm is the index of the estimated labeland llowast is the index of the ground truth label for Xk

The construction of the oracle b depends on how thecolumns bm (ie unimodal oracles) are combined Thesystem is tested with a uniform construction and the resultsare reported in section 5 In the uniform construction eachmodality has a 1

M voting power and can add up to one via

b =

sumforallm

bm

M (9)

43 Coupled Constrained Least-Squares (cc-LS)

Finally the weight vector w = [wR wD wP ]T is com-

puted by substituting A and b into Eqn (5) and solving thecc-LS optimization problem

minimizew

1

2Aw minus b22

subject to 1Tw = 1

0 le wm le 1m = 1 M

(10)

Intuitively the cc-LS problem finds the modality priorsthat allow the method to fuse information from differentmodalities to approximate the oracle probabilities

44 Multiview Formulation

The bounded multimodal formulation is expanded to in-clude multiview data using V views indexed by v The val-ues that v can take indicate which camera view is used (egv = 1 for the top view v = 2 for the side view and v = 3for the head view) The multimodal and multiview matrixA has the following form

A =[[A(v=1)] [A(v=V )]

]TYtimesM (11)

6

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 3: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Why Multiview The ICU is a dynamic environmentwhere equipment is moved around continuously and canblock sensors and view of the patients A multiview systemimproves classification performance increases the chancesof observing the patients and enables monitoring usingsimple and affordable sensor Cameras do not have contactwith patients and avoid the risk of infections by touch

3 Data CollectionSample MM data collected from one actor in various

poses and scene conditions using all camera views andmodalities is shown in Figure 4 The complete dataset isconstructed with sleep poses collected from five actors ina mock-up ICU setting with a real ICU bed and equip-ment The observations are the set of sleep poses Z =Background Soldier U Soldier D Faller R Faller L LogR Log L Yearner R Yearner L Fetal R Fetal L of sizeL (= |Z|) and indexed by l The letters U and D indi-cate the patient is up-facing and down-facing and letters Land R indicate lying-on-Left and lying-on-Right sides Thevariable Zl is used to identify one specific pose label (egZ0 = Background) The scene conditions are simulated us-ing three illumination levels bright (light sensor with 70-90 saturation) medium (50-70 saturation) and dark(below 50 saturation) as well as four occlusion typesclear (no occlusion) blanket (covering 90 of the actorrsquosbody) blanket and pillow and pillow (between actorrsquos headand upper back and the pressure mat) The illumination in-tensities are based on percent saturation values of an illumi-nation sensor and the occlusions are detected using radio-frequency identification (RFID) and proximity sensors allby NET Gadgeteer The combination of the illuminationlevels and occlusions types generates a 12-element sceneset C = (bright medium dark) times (clear blanket pillowblanket+pillow) The variable c isin C is used to indicate asingle illumination and occlusion combination (eg c = 1indicates bright and clear scene) The dataset is createdby allowing one scene to be the combination of one actorin one pose and under a single scene condition Ten mea-surements are collected from one scene ndash three modalities(RD and synthetic binary masks) from each of the threecamera views in the set V = t h s and one pressureimage (P ) The data collection process includes acquiringthe background (empty bed) and asking the actors to rotatethrough the 10 poses (11 classes including the background)under each of the 12 scene conditions The process is re-peated 10 times for each of the five actors In total thisprocess generates a dataset of 66000 images (five actors times10 sessions times 10 images times 11 classes times 12 scenes)

31 Modalities

This section describes the modalities used by the Eye-CU system system (see Figures 3 and 4) It presents the

modalitiesrsquo basic properties discusses pros and cons andprovides an intuitive justification for their complementaryuse in the cc-LS formulation

RGB Standard RGB video data provides reliable infor-mation to represent and classify human sleep poses inscenes with relatively ideal conditions However most peo-ple sleep in imperfectly illuminated scenarios using sheetsblankets and pillows that block and disturb sensor measure-ments The systems collects RGB color images of dimen-sions 640times 480 from each actor in each of the scene condi-tions and extracts pose appearance features representativeof the lines in the human body (ie limbs and extremities)

Depth Infrared depth cameras can be resilient to illu-mination changes The Eye-CU system uses PrimenseCarmine devices to collect depth data The devices are de-signed for indoor use and can acquire images of dimensions640times480 These sensors use 16 bits to represent pixel inten-sity values which correspond to the distance from sensor toa point in the scene Their operating distance range is 08 mto 35 m and their spatial resolution for scenes 20 m awayis 35 mm for the horizontal (x) and vertical (y) axes and30 mm along the depth (z) axis The systems uses the depthimages to represent the 3-dimensional shape of the posesThe usability of these images however depends on depthcontrast which is affected by the deformation properties ofthe mattress and blanket present in ICU environments

Pressure In preliminary studies the pressure modalityremained constant in the presence of sheets and blanketsThe Eye-Cu systems uses the Tekscan Body Pressure Mea-surement System (BPMS) model BRE5315-4 The com-plete mat is composed of four independent pressure arrayseach with its own handle (ie USB adapter) to measure thepressure distribution on support surfaces The data fromeach of the four arrays was synchronized and acquired us-ing the proprietary Tekscan BPMS software The completepressure sensing area is 19507 mmtimes4267 mm with a totalof 8064 sensing elements (or sensel) The sensel density is1 senselcm2 each with a sensing pressure range from 0 to250 mm Hg (0-5 psi) The images generated using the pres-sure mat have dimensions of 3341times 8738 pixels Althoughthe size of the pressure images is relatively large the gener-ation of such images depends on consistent physical body-mattress contact In particular pillows deformation prop-erties of the mattress and bed configurations (not exploredin this work) can disturb the measurements and the imagesgenerated by the mat In addition proper pressure-imagegeneration requires a sensor array with high resolution andfull bed coverage the use of which can be prohibitively ex-pensive and constrictive due to sanitation procedures andlimited technical support

3

Figure 4 Multimodal and multiview dictionary of sleep poses for a single actor in various sleep configurations and scene conditions Itcontains RD (equalized for display) images from t s and h views and the pressure mat P Images are transformed wrt the t view

32 Feature Extraction

The sensors and camera views are calibrated using thestandard methods from [5] Homography transformationsare computed relative to the top view and gradient and shapefeatures are then extracted from the transformed images

Histogram of Oriented Gradients (HOG) HOG fea-tures are extracted from RGB images to represent sleep-pose limb structures as demonstrated by [3 20] The HOGextraction parameters are four orientations 16-by-16 pix-els per cell and two-by-two cells per block which yield a5776-element vector per image

Geometric Moments (gMOM) Image gMOM featuresintroduced in [6] and validated in [1 15] are used to repre-sent sleep-pose shapes The in-house implementation usesthe raw pixel values from tiled depth and pressure imagesinstead of standard binarized pixel values The six-by-sixtile dimensions are determined empirically to balance ac-curacy and complexity Finally moments up to the third or-der are extracted from each block to generate a 10-elementvector per block The vectors from each of the 36 blocks areconcatenated to form a 360-element vector per image Fig-ure 5 shows how features are extracted from each modality

4 Multimodal-Multiview FormulationExplanation of the method begins with the problem state-

ment in section 41 followed by a description of the single-view multimodal formulation in section 42 This formula-tion is expanded to include multiview data in section 44The multimodal classification framework for a single-viewsystem is shown in Figure 6 which is applied to the setof pose labels Z of size L indexed by l The multimodaldataset (X ) of size K indexed by k is separated for eachscene c isin C The dataset is composed of features extracted

Figure 5 Multimodal representation of the Fetal L pose showingthe features extracted from each modality

from a set of M modalities N = RDP indexed by m(eg fNm

with m = 1 gives fR) The k-th datapoint in thedataset has the form

Xk = fNmM = fR fD fP

=

HOG(R) gMOM(D) gMOM(P )

(1)

where fNmis the feature vector extracted from the m-th

modality These features are used to train the ensemble ofM unimodal SVM (and LDA) classifiers (CLFm) For agiven input datapoint Xk each of the classifiers outputs aprobability vector CLFkm =

[sk1m skLm

]T

where the elements (s) represent the probability of labell given modality feature m The classifier label proba-bilities are computed using the implementations from [12]

4

of Plattrsquos method for SVC and Bayesrsquo rule for LDA Thefeature-classifier combinations are quantified at the trustestimation stage where the unimodal trust values wc =[wc

R wcD wc

P

]Tare computed for specific scene c The

multimodal trusted classifier is formed by fusing the candi-date label decisions from the unimodal classifiers into oneThe objective of this formulation is to find the pose label(Zl) with the highest MM probability for a given inputquery Xk where l is the estimated index label The vari-ables used through out this paper are listed in Table 1

41 Problem Statement

The proposed fusion technique uses probabilistic con-cepts to compute the probability of a given class bymarginalizing the joint probability over the modalities Thejoint probability is calculated from the conditional proba-bility of each class and the set of prior probabilities for eachmodality The conditional probabilities are extracted fromthe classifiers in the ensemble of M -unimodal classifiers(ie P(Z = Zl|X = Xk) = P(Zl|Xk)) and re-written as

P(Zl|Xk) =

Msumm=1

P(Zl|XkM = m)P(M = m) (2)

Methods such as Plattrsquos [14] for SVMs enable the com-putation of conditional probabilities given by

sklm = P(Zl|XkM = m) (3)

However the prior probability for each modality wm =P(M = m) remains unknown The trust method finds theset of priors for each modality m in the ensemble of Mmodalities that approximates the following probability

bkl = P(Z = zl|X = XkOracle) (4)

produced by an oracle-observed datapoint X = Xk The es-timation process is repeated for all crsquos However c is omit-ted to simplify the notation (ie wc becomes w)

The method uses the following coupled optimizationproblem to find the modality priors wm for scene c

minimizew

1

2

Ksumk=1

Lsuml=1

(Msum

m=1

sklmwm minus bkl

)2

subject to 1Tw = 1

0 le wm le 1m = 1 M

(5)

The objective is to find the weights wm that approximatethe oracle bkl for every data point Xk Using the loss inEq 5 the problem becomes a cc-LS optimization problemThis type of problem uses all points and poses labels fromthe training set to find the set of priors that approximates thevalues produced by the oracle for each point Xk at once

VARIABLESSYMBOL DESCRIPTIONA Multimodal matrix isin RUtimesM

am m-th column vector of A with U elementsb Oracle vector isin RU

bm Oracle column vector for modality mC Scenes set (light times occlusion) combinationc Scene index 1 le c le |C|CLFkm Classifier for the m-th modalityfNmk Set of feature M vectors for the k-th datapointD Depth modalityh Head camera viewK Dataset size K = |X |k Datapoint index 1 le k le KL Size of the set of pose labels L = |Z|l Index of the pose label (Zl) 1 le l le L

l Index of the estimated pose label 1 le l le Lllowast Index of the ground truth label (Zl) 1 le llowast le LMM Multimodal and MultiviewMpM Multimodal and partial-MultiviewM Size of modality set M = |N |m Modality index 1 le m leMN Modality setN = RDP indexed by mP Pressure modalitypMM Partial-Multimodal and MultiviewPMpM Partial-Multimodal partial-MultiviewR RGB modalitys Side camera viewsklm Probability of label l from CLFkm

t Top camera viewU Multimodal dimension U = KLV View set V = t s hV Number of views V = |V|v View index 1 le v le V

wc Trusts w =[wR wD wP

]T for scene cwNm Modality trust value (eg wR for m = 1)X Dataset indexed by k (ie Xk)Xk k-th datapoint with fNmk = fR fD fP kY MM dimensions (= KLV )Z Sleep-pose set

Table 1 Variables and their descriptions

42 Multimodal Construction

The estimation method uses cc-LS optimization to min-imize the difference between Oracle (b) and the multi-modal matrix (A) It frames the trust estimation as a lin-ear system of equations of the form Aw minus b = 0 wherethe modality trust values are the elements of the vectorw =

[wR wD wP

]Tthat approximate Aw to b

Construction of the Multimodal Matrix (A) The ma-trix A contains label probabilities for each of the datapointsin the training set (K = |Xtrain|) This matrix has U rows(U = KL) and M columns where L is the total number

5

Figure 6 Diagram of the trusted multimodal classifier for the MpM configuration Image features are extracted from the RDP cameraand pressure data Then the features are used to train unimodal classifiers (CLFm) which are in turn used to estimate the modality trustvalues In the last stage of the MM classifier the unimodal decisions are trusted and combined

of labels (L = |Z|) and M is the number of modalities(M = 3) and has the following structure

A =[STk=1 ST

k=K ]TUtimesM (6)

where Sk(lm) = sklm

Construction of the Multimodal Oracle Vector (b)The vector b is generated by the oracle and quantifies theclassification ability of the combined modalities It is usedto corroborate estimation correctness when compared to theground truth The bm column vectors have U rows

bm =[bTk=1 bT

k=K

]T (7)

where bk =[bkl=1 bkl=L

]T The values of the

bkl elements are set using the following condition

bkl =

1 if l = llowast for Xk

0 otherwise(8)

where l = argmax sklm is the index of the estimated labeland llowast is the index of the ground truth label for Xk

The construction of the oracle b depends on how thecolumns bm (ie unimodal oracles) are combined Thesystem is tested with a uniform construction and the resultsare reported in section 5 In the uniform construction eachmodality has a 1

M voting power and can add up to one via

b =

sumforallm

bm

M (9)

43 Coupled Constrained Least-Squares (cc-LS)

Finally the weight vector w = [wR wD wP ]T is com-

puted by substituting A and b into Eqn (5) and solving thecc-LS optimization problem

minimizew

1

2Aw minus b22

subject to 1Tw = 1

0 le wm le 1m = 1 M

(10)

Intuitively the cc-LS problem finds the modality priorsthat allow the method to fuse information from differentmodalities to approximate the oracle probabilities

44 Multiview Formulation

The bounded multimodal formulation is expanded to in-clude multiview data using V views indexed by v The val-ues that v can take indicate which camera view is used (egv = 1 for the top view v = 2 for the side view and v = 3for the head view) The multimodal and multiview matrixA has the following form

A =[[A(v=1)] [A(v=V )]

]TYtimesM (11)

6

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 4: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Figure 4 Multimodal and multiview dictionary of sleep poses for a single actor in various sleep configurations and scene conditions Itcontains RD (equalized for display) images from t s and h views and the pressure mat P Images are transformed wrt the t view

32 Feature Extraction

The sensors and camera views are calibrated using thestandard methods from [5] Homography transformationsare computed relative to the top view and gradient and shapefeatures are then extracted from the transformed images

Histogram of Oriented Gradients (HOG) HOG fea-tures are extracted from RGB images to represent sleep-pose limb structures as demonstrated by [3 20] The HOGextraction parameters are four orientations 16-by-16 pix-els per cell and two-by-two cells per block which yield a5776-element vector per image

Geometric Moments (gMOM) Image gMOM featuresintroduced in [6] and validated in [1 15] are used to repre-sent sleep-pose shapes The in-house implementation usesthe raw pixel values from tiled depth and pressure imagesinstead of standard binarized pixel values The six-by-sixtile dimensions are determined empirically to balance ac-curacy and complexity Finally moments up to the third or-der are extracted from each block to generate a 10-elementvector per block The vectors from each of the 36 blocks areconcatenated to form a 360-element vector per image Fig-ure 5 shows how features are extracted from each modality

4 Multimodal-Multiview FormulationExplanation of the method begins with the problem state-

ment in section 41 followed by a description of the single-view multimodal formulation in section 42 This formula-tion is expanded to include multiview data in section 44The multimodal classification framework for a single-viewsystem is shown in Figure 6 which is applied to the setof pose labels Z of size L indexed by l The multimodaldataset (X ) of size K indexed by k is separated for eachscene c isin C The dataset is composed of features extracted

Figure 5 Multimodal representation of the Fetal L pose showingthe features extracted from each modality

from a set of M modalities N = RDP indexed by m(eg fNm

with m = 1 gives fR) The k-th datapoint in thedataset has the form

Xk = fNmM = fR fD fP

=

HOG(R) gMOM(D) gMOM(P )

(1)

where fNmis the feature vector extracted from the m-th

modality These features are used to train the ensemble ofM unimodal SVM (and LDA) classifiers (CLFm) For agiven input datapoint Xk each of the classifiers outputs aprobability vector CLFkm =

[sk1m skLm

]T

where the elements (s) represent the probability of labell given modality feature m The classifier label proba-bilities are computed using the implementations from [12]

4

of Plattrsquos method for SVC and Bayesrsquo rule for LDA Thefeature-classifier combinations are quantified at the trustestimation stage where the unimodal trust values wc =[wc

R wcD wc

P

]Tare computed for specific scene c The

multimodal trusted classifier is formed by fusing the candi-date label decisions from the unimodal classifiers into oneThe objective of this formulation is to find the pose label(Zl) with the highest MM probability for a given inputquery Xk where l is the estimated index label The vari-ables used through out this paper are listed in Table 1

41 Problem Statement

The proposed fusion technique uses probabilistic con-cepts to compute the probability of a given class bymarginalizing the joint probability over the modalities Thejoint probability is calculated from the conditional proba-bility of each class and the set of prior probabilities for eachmodality The conditional probabilities are extracted fromthe classifiers in the ensemble of M -unimodal classifiers(ie P(Z = Zl|X = Xk) = P(Zl|Xk)) and re-written as

P(Zl|Xk) =

Msumm=1

P(Zl|XkM = m)P(M = m) (2)

Methods such as Plattrsquos [14] for SVMs enable the com-putation of conditional probabilities given by

sklm = P(Zl|XkM = m) (3)

However the prior probability for each modality wm =P(M = m) remains unknown The trust method finds theset of priors for each modality m in the ensemble of Mmodalities that approximates the following probability

bkl = P(Z = zl|X = XkOracle) (4)

produced by an oracle-observed datapoint X = Xk The es-timation process is repeated for all crsquos However c is omit-ted to simplify the notation (ie wc becomes w)

The method uses the following coupled optimizationproblem to find the modality priors wm for scene c

minimizew

1

2

Ksumk=1

Lsuml=1

(Msum

m=1

sklmwm minus bkl

)2

subject to 1Tw = 1

0 le wm le 1m = 1 M

(5)

The objective is to find the weights wm that approximatethe oracle bkl for every data point Xk Using the loss inEq 5 the problem becomes a cc-LS optimization problemThis type of problem uses all points and poses labels fromthe training set to find the set of priors that approximates thevalues produced by the oracle for each point Xk at once

VARIABLESSYMBOL DESCRIPTIONA Multimodal matrix isin RUtimesM

am m-th column vector of A with U elementsb Oracle vector isin RU

bm Oracle column vector for modality mC Scenes set (light times occlusion) combinationc Scene index 1 le c le |C|CLFkm Classifier for the m-th modalityfNmk Set of feature M vectors for the k-th datapointD Depth modalityh Head camera viewK Dataset size K = |X |k Datapoint index 1 le k le KL Size of the set of pose labels L = |Z|l Index of the pose label (Zl) 1 le l le L

l Index of the estimated pose label 1 le l le Lllowast Index of the ground truth label (Zl) 1 le llowast le LMM Multimodal and MultiviewMpM Multimodal and partial-MultiviewM Size of modality set M = |N |m Modality index 1 le m leMN Modality setN = RDP indexed by mP Pressure modalitypMM Partial-Multimodal and MultiviewPMpM Partial-Multimodal partial-MultiviewR RGB modalitys Side camera viewsklm Probability of label l from CLFkm

t Top camera viewU Multimodal dimension U = KLV View set V = t s hV Number of views V = |V|v View index 1 le v le V

wc Trusts w =[wR wD wP

]T for scene cwNm Modality trust value (eg wR for m = 1)X Dataset indexed by k (ie Xk)Xk k-th datapoint with fNmk = fR fD fP kY MM dimensions (= KLV )Z Sleep-pose set

Table 1 Variables and their descriptions

42 Multimodal Construction

The estimation method uses cc-LS optimization to min-imize the difference between Oracle (b) and the multi-modal matrix (A) It frames the trust estimation as a lin-ear system of equations of the form Aw minus b = 0 wherethe modality trust values are the elements of the vectorw =

[wR wD wP

]Tthat approximate Aw to b

Construction of the Multimodal Matrix (A) The ma-trix A contains label probabilities for each of the datapointsin the training set (K = |Xtrain|) This matrix has U rows(U = KL) and M columns where L is the total number

5

Figure 6 Diagram of the trusted multimodal classifier for the MpM configuration Image features are extracted from the RDP cameraand pressure data Then the features are used to train unimodal classifiers (CLFm) which are in turn used to estimate the modality trustvalues In the last stage of the MM classifier the unimodal decisions are trusted and combined

of labels (L = |Z|) and M is the number of modalities(M = 3) and has the following structure

A =[STk=1 ST

k=K ]TUtimesM (6)

where Sk(lm) = sklm

Construction of the Multimodal Oracle Vector (b)The vector b is generated by the oracle and quantifies theclassification ability of the combined modalities It is usedto corroborate estimation correctness when compared to theground truth The bm column vectors have U rows

bm =[bTk=1 bT

k=K

]T (7)

where bk =[bkl=1 bkl=L

]T The values of the

bkl elements are set using the following condition

bkl =

1 if l = llowast for Xk

0 otherwise(8)

where l = argmax sklm is the index of the estimated labeland llowast is the index of the ground truth label for Xk

The construction of the oracle b depends on how thecolumns bm (ie unimodal oracles) are combined Thesystem is tested with a uniform construction and the resultsare reported in section 5 In the uniform construction eachmodality has a 1

M voting power and can add up to one via

b =

sumforallm

bm

M (9)

43 Coupled Constrained Least-Squares (cc-LS)

Finally the weight vector w = [wR wD wP ]T is com-

puted by substituting A and b into Eqn (5) and solving thecc-LS optimization problem

minimizew

1

2Aw minus b22

subject to 1Tw = 1

0 le wm le 1m = 1 M

(10)

Intuitively the cc-LS problem finds the modality priorsthat allow the method to fuse information from differentmodalities to approximate the oracle probabilities

44 Multiview Formulation

The bounded multimodal formulation is expanded to in-clude multiview data using V views indexed by v The val-ues that v can take indicate which camera view is used (egv = 1 for the top view v = 2 for the side view and v = 3for the head view) The multimodal and multiview matrixA has the following form

A =[[A(v=1)] [A(v=V )]

]TYtimesM (11)

6

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 5: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

of Plattrsquos method for SVC and Bayesrsquo rule for LDA Thefeature-classifier combinations are quantified at the trustestimation stage where the unimodal trust values wc =[wc

R wcD wc

P

]Tare computed for specific scene c The

multimodal trusted classifier is formed by fusing the candi-date label decisions from the unimodal classifiers into oneThe objective of this formulation is to find the pose label(Zl) with the highest MM probability for a given inputquery Xk where l is the estimated index label The vari-ables used through out this paper are listed in Table 1

41 Problem Statement

The proposed fusion technique uses probabilistic con-cepts to compute the probability of a given class bymarginalizing the joint probability over the modalities Thejoint probability is calculated from the conditional proba-bility of each class and the set of prior probabilities for eachmodality The conditional probabilities are extracted fromthe classifiers in the ensemble of M -unimodal classifiers(ie P(Z = Zl|X = Xk) = P(Zl|Xk)) and re-written as

P(Zl|Xk) =

Msumm=1

P(Zl|XkM = m)P(M = m) (2)

Methods such as Plattrsquos [14] for SVMs enable the com-putation of conditional probabilities given by

sklm = P(Zl|XkM = m) (3)

However the prior probability for each modality wm =P(M = m) remains unknown The trust method finds theset of priors for each modality m in the ensemble of Mmodalities that approximates the following probability

bkl = P(Z = zl|X = XkOracle) (4)

produced by an oracle-observed datapoint X = Xk The es-timation process is repeated for all crsquos However c is omit-ted to simplify the notation (ie wc becomes w)

The method uses the following coupled optimizationproblem to find the modality priors wm for scene c

minimizew

1

2

Ksumk=1

Lsuml=1

(Msum

m=1

sklmwm minus bkl

)2

subject to 1Tw = 1

0 le wm le 1m = 1 M

(5)

The objective is to find the weights wm that approximatethe oracle bkl for every data point Xk Using the loss inEq 5 the problem becomes a cc-LS optimization problemThis type of problem uses all points and poses labels fromthe training set to find the set of priors that approximates thevalues produced by the oracle for each point Xk at once

VARIABLESSYMBOL DESCRIPTIONA Multimodal matrix isin RUtimesM

am m-th column vector of A with U elementsb Oracle vector isin RU

bm Oracle column vector for modality mC Scenes set (light times occlusion) combinationc Scene index 1 le c le |C|CLFkm Classifier for the m-th modalityfNmk Set of feature M vectors for the k-th datapointD Depth modalityh Head camera viewK Dataset size K = |X |k Datapoint index 1 le k le KL Size of the set of pose labels L = |Z|l Index of the pose label (Zl) 1 le l le L

l Index of the estimated pose label 1 le l le Lllowast Index of the ground truth label (Zl) 1 le llowast le LMM Multimodal and MultiviewMpM Multimodal and partial-MultiviewM Size of modality set M = |N |m Modality index 1 le m leMN Modality setN = RDP indexed by mP Pressure modalitypMM Partial-Multimodal and MultiviewPMpM Partial-Multimodal partial-MultiviewR RGB modalitys Side camera viewsklm Probability of label l from CLFkm

t Top camera viewU Multimodal dimension U = KLV View set V = t s hV Number of views V = |V|v View index 1 le v le V

wc Trusts w =[wR wD wP

]T for scene cwNm Modality trust value (eg wR for m = 1)X Dataset indexed by k (ie Xk)Xk k-th datapoint with fNmk = fR fD fP kY MM dimensions (= KLV )Z Sleep-pose set

Table 1 Variables and their descriptions

42 Multimodal Construction

The estimation method uses cc-LS optimization to min-imize the difference between Oracle (b) and the multi-modal matrix (A) It frames the trust estimation as a lin-ear system of equations of the form Aw minus b = 0 wherethe modality trust values are the elements of the vectorw =

[wR wD wP

]Tthat approximate Aw to b

Construction of the Multimodal Matrix (A) The ma-trix A contains label probabilities for each of the datapointsin the training set (K = |Xtrain|) This matrix has U rows(U = KL) and M columns where L is the total number

5

Figure 6 Diagram of the trusted multimodal classifier for the MpM configuration Image features are extracted from the RDP cameraand pressure data Then the features are used to train unimodal classifiers (CLFm) which are in turn used to estimate the modality trustvalues In the last stage of the MM classifier the unimodal decisions are trusted and combined

of labels (L = |Z|) and M is the number of modalities(M = 3) and has the following structure

A =[STk=1 ST

k=K ]TUtimesM (6)

where Sk(lm) = sklm

Construction of the Multimodal Oracle Vector (b)The vector b is generated by the oracle and quantifies theclassification ability of the combined modalities It is usedto corroborate estimation correctness when compared to theground truth The bm column vectors have U rows

bm =[bTk=1 bT

k=K

]T (7)

where bk =[bkl=1 bkl=L

]T The values of the

bkl elements are set using the following condition

bkl =

1 if l = llowast for Xk

0 otherwise(8)

where l = argmax sklm is the index of the estimated labeland llowast is the index of the ground truth label for Xk

The construction of the oracle b depends on how thecolumns bm (ie unimodal oracles) are combined Thesystem is tested with a uniform construction and the resultsare reported in section 5 In the uniform construction eachmodality has a 1

M voting power and can add up to one via

b =

sumforallm

bm

M (9)

43 Coupled Constrained Least-Squares (cc-LS)

Finally the weight vector w = [wR wD wP ]T is com-

puted by substituting A and b into Eqn (5) and solving thecc-LS optimization problem

minimizew

1

2Aw minus b22

subject to 1Tw = 1

0 le wm le 1m = 1 M

(10)

Intuitively the cc-LS problem finds the modality priorsthat allow the method to fuse information from differentmodalities to approximate the oracle probabilities

44 Multiview Formulation

The bounded multimodal formulation is expanded to in-clude multiview data using V views indexed by v The val-ues that v can take indicate which camera view is used (egv = 1 for the top view v = 2 for the side view and v = 3for the head view) The multimodal and multiview matrixA has the following form

A =[[A(v=1)] [A(v=V )]

]TYtimesM (11)

6

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 6: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Figure 6 Diagram of the trusted multimodal classifier for the MpM configuration Image features are extracted from the RDP cameraand pressure data Then the features are used to train unimodal classifiers (CLFm) which are in turn used to estimate the modality trustvalues In the last stage of the MM classifier the unimodal decisions are trusted and combined

of labels (L = |Z|) and M is the number of modalities(M = 3) and has the following structure

A =[STk=1 ST

k=K ]TUtimesM (6)

where Sk(lm) = sklm

Construction of the Multimodal Oracle Vector (b)The vector b is generated by the oracle and quantifies theclassification ability of the combined modalities It is usedto corroborate estimation correctness when compared to theground truth The bm column vectors have U rows

bm =[bTk=1 bT

k=K

]T (7)

where bk =[bkl=1 bkl=L

]T The values of the

bkl elements are set using the following condition

bkl =

1 if l = llowast for Xk

0 otherwise(8)

where l = argmax sklm is the index of the estimated labeland llowast is the index of the ground truth label for Xk

The construction of the oracle b depends on how thecolumns bm (ie unimodal oracles) are combined Thesystem is tested with a uniform construction and the resultsare reported in section 5 In the uniform construction eachmodality has a 1

M voting power and can add up to one via

b =

sumforallm

bm

M (9)

43 Coupled Constrained Least-Squares (cc-LS)

Finally the weight vector w = [wR wD wP ]T is com-

puted by substituting A and b into Eqn (5) and solving thecc-LS optimization problem

minimizew

1

2Aw minus b22

subject to 1Tw = 1

0 le wm le 1m = 1 M

(10)

Intuitively the cc-LS problem finds the modality priorsthat allow the method to fuse information from differentmodalities to approximate the oracle probabilities

44 Multiview Formulation

The bounded multimodal formulation is expanded to in-clude multiview data using V views indexed by v The val-ues that v can take indicate which camera view is used (egv = 1 for the top view v = 2 for the side view and v = 3for the head view) The multimodal and multiview matrixA has the following form

A =[[A(v=1)] [A(v=V )]

]TYtimesM (11)

6

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 7: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

where Y = LKV for a system with V views and Mmodalities The bm multimodal and multiview oracle vec-tor is constructed by concatenating data from all the viewsin the set (V) via

bm =

[[b(v=1)k=1

]T

[b(v=V )k=1

]T]TY

(12)

and the b column vector is generated using (9)

45 Testing

The test process is shown in Figure 7 The room sen-sors in combination with N = RDP measurementsare collected from the ICU scene Features (fNmk) areextracted from the modalities in N and are used as inputsto the trusted multimodal classifier The classifier outputs aset of label candidates from which the label with the largestprobability for datapoint Xk = fNm

k is selected via

lk = argmaxlisinL

(wNmCLFmfNmk) forallm (13)

Missing Modalities Hardware failures are simulated byevaluating the classification performance with one modalityremoved at a time The trust value of a missing or failingsensor modality (wlowastNn

) is set to zero and its original value(wNn

) is proportionally distributed to the others via

wlowastNm= wNm

(1 +|wNn minus wNm |

W

) (14)

for n isin 1 M m isin 1 M n and W =sumforallm

wm

5 ExperimentsValidation of modalities and views for sleep-pose clas-

sification substantiates the need for a multiview and multi-modal system The cc-LS method is tested on the MpM MM PMM and PMpM Eye-CU configurations and datacollected from scenes with various illumination levels andocclusion types The labels are estimated using multi-classlinear SVC (C=05) and LDA classifiers from [12] A val-idation set is used to tune the SVCrsquos C parameter and theAda parameters Classification accuracies are computed us-ing five-fold cross validation using in-house implementa-tions of competing methods and reported as percent accu-racy values inside color-scaled cells

51 Modality and View Assessment

Classification results obtained using unimodal and mul-timodal data without modality trust are shown in Figure 8The cell values indicate classification percent accuracy foreach individual modality and modality combinations withthree common classification methods The labels of the col-umn blocks at the top of the figure indicates modalities used

The labels at the bottom of the figure show which classifieris used The labels on the left and right indicate scene illu-mination level and type of occlusion The figure only showsclassification results for the top camera view because varia-tion across views tested did not have statistical significance

52 Performance of Reduced Eye-CUs

The complete MM configuration achieves the best clas-sification performance followed closely by the perfor-mances of the MpM PMM and PMpM configurationswhich is summarized in figure 9 The values inside thecells represent classification percent accuracy of the cc-Lsmethod combined with various Eye-CU system configura-tions The top row indicates the configuration The secondrow indicates the views The labels on the bottom of thefigure identify the modalities The labels on the left andright indicate illumination level and occlusion type Thered scale ranges from light red (worst) to dark red (best)The figure shows that the complete MM system in combi-nation with the cc-LS method performs the best across allscenes However it requires information from a pressuremat The PMM and PMpM configurations do not re-quire the pressure mat and are still capable of performingreliably and with only a slight drop in their performanceFor example in dark and occluded scenes the PMM andPMpM configurations reach 77 and 80 classificationrates respectively (see row DARK Blanket amp Pillow)

53 Comparison with Existing Methods

Performance of the cc-LS and the in-house implementa-tions of the competing methods from [7] and [18] and Ada[4] are shown in Figure 10 The figure shows results us-ing the MpM configuration which more closely resemblesthose used in the competing methods All the methods usea multimodal system with a top camera view and a pres-sure mat The values inside the cells are the classificationpercent accuracy The green scale goes from light green(worst) to dark green (best) The top row divides the meth-ods into competing and proposed The second row cites themethods The bottom row indicates which classifier and inparentheses modalities are used The labels on the left andright indicate illumination level and occlusion type The re-sults are obtained using the four methods with MM dataset

Confusion Matrices The confusion matrices in Figure11 show how the indexes of estimated labels l match the ac-tual labels llowast The top three matrices are from a scene withbright and clear ICU conditions (Figure 11a) The bottomthree matrices illustrate the performance of the methods ina dim and occluded ICU scenario (Figure 11b) A dark bluediagonal in the confusion matrices indicates perfect classifi-cation In the selected scenes all methods achieved a 100 classification for the bright and clear scene However their

7

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 8: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Figure 7 Block diagram for testing of a single view multimodal trusted classifier Observations (RDP ) are collected from the sceneFeatures are extracted from the observations and sent to the unimodal classifiers to provide a set of score-ranked pose candidate labels Theset of candidates is trusted and combined into one multimodal set from which one with the highest score is selected

Figure 8 Performance evaluation of modalities and modality combinations using SVC LDA and Ada-Boosted SVC (Ada) based ontheir classification percent accuracy (cell values) The evaluation is performed over all the scene conditions considered in this study Theresults indicate that no single modality (RDP ) or combination of concatenated modalities (RDRPDPRDP ) in combination withone of three classification techniques cannot be directly used to recognize poses in all scenes The top row indicates which modality orcombination of modalities is used The labels on the bottom indicate which classifier is used The labels to the left and right indicate thescenersquos illumination level and occlusion types The gray-scaled boxes range from worst (white) to best (black) performance

performance varies greatly in dim and occluded scenes Thematrix generated using [7] achieves 7 classification ac-curacy (bottom left) matrix generated using [18] achievesa 55 accuracy (bottom center) and the matrix generatedwith the cc-LS method achieves a 867 accuracy (bottomright) The MpM configuration with the cc-LS method out-

performs the competing methods by an approximate 30

Performance of Ada-Boost The system is tested usingAda-Boost (Ada) algorithm [4] to improve the decision ofweak unimodal SVCs The results from Figure 8 show aslight SVC improvement The comparison in Figure 10

8

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 9: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

Figure 9 Classification performance in red scale (dark bestlight worst) of the various Eye-CU configurations using LDAThe PMpM has the lowest performance of 767 using sh viewsof a dark and occluded scene The method from [18] performs be-low 50 and the method from [7] is not suited for such conditionsThe top row identifies the configuration The second row indicatesviews used The bottom labels indicate modalities used (in paren-thesis) The labels on the left and right indicate scene illuminationand occlusion type Similar pattern is observed with SVC

shows that the Adarsquos improvement is small It barely outper-forms the reduced MpM configuration with cc-LS methodin some scenes (see row MID-Blanket) Overall Ada isoutperformed by the combination of cc-LS and MpM

6 Discussion

The results in Figure 10 show performance disparitiesbetween the results obtained with the in-house implementa-tion and those reported in [7] The data and code from [7]were not released so the findings and implementation de-tails reported in this paper cannot be compared at the finestlevel Nevertheless the accuracy variations observed aremost likely due to differences in data resolutions sensor ca-pacities scene properties and tuning parameters

The performance of the MM and MpM configurationswhich use a pressure mat is slightly improved Howeverthe deployment and maintenance of such systems in the realworld can be very difficult and perhaps logistically impos-sible The cc-LS method in combination with the PMM orPMpM configurations which do not use a pressure matmatch and outperform the competing techniques in idealand challenging scenarios (see Figure 10)

Figure 10 Mean classification performance in green scale (darkbest lightworst) of MaVL Huangrsquos [7] Torresrsquo [18] Feundrsquos [4]and the cc-LS method using SVC and LDA The combination ofcc-LS and MpM matches the performance of competing methodsin bright and clear scenes Classification is improved with cc-LSby 70 with SVC and by 30 with LDA in dark and occludedscenes The top row distinguishes between competing and pro-posed methods the second row cites them The bottom row indi-cates classifier and modalities (in parenthesis) used The labels onthe left and right indicate scene illumination and occlusion typeNA indicates not suitable

7 Conclusion and Future Work

This work introduced a new modality trust estimationmethod based on cc-LS optimization The trust values ap-proximate the difference between the multimodal candidatelabels A and the expected oracle b labels The Eye-CUsystem uses the trust to weight label propositions of avail-able modalities and views The cc-LS method with theMM Eye-CU system outperforms three competing meth-ods Two reduced Eye-CU variations reliably classify sleepposes without pressure data The MM properties allow thesystem to handle occlusions and avoid problems associatedwith a pressure mat (eg sanitation and sensor integrity)

Reliable pose classification methods and systems enableclinical researchers to design enhance and evaluate pose-related healthcare protocols and therapies Given that theEye-CU system is capable of reliably classifying humansleep poses in an ICU environment expansion of the sys-tem and methods is under investigation to include temporalinformation Future analysis will seek to quantify and typifypose sequences (ie duration and transition) Future work

9

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work
Page 10: f g.ucsb.edu victor.fragoso@mail.wvu.edu jfried@sbch.org ......1.1. Related Work Computer vision methods using RGB data to detect body configurations of patients on beds are discussed

(a) Bright scene clear of occlusions

(b) Dark scene with pillow and blanket occlusions

Figure 11 Confusion matrices generated in blue scale (darkbest light worst) using a top camera view and applying the meth-ods from Huangrsquos [7] Torresrsquo [18] and cc-LS with MpM Thetop matrices show all methods have perfect classification in idealscenes (ie main diagonal) The bottom matrices are [7] with7 [18] with 55 and cc-LS with 867 for dark and occludedscenes The matrices show the matches between estimated (l) andground truth (llowast) indices

will investigate removing the constraints that clearly definethe set of sleep poses and explore tools from novelty de-tection to identify other (eg helpful and harmful) patientposes that occur in an ICU Recent studies indicate that deepfeatures might improve the classification performance of theEye-CU system in the most challenging healthcare scenar-ios Hence future work will investigate the performanceand integration of deep features into the cc-LS method andthe Eye-CU system

Acknowledgements This project was supported in partby the Institute for Collaborative Biotechnologies (ICB)through grant W911NF-09-0001 from the US Army Re-search Office and by the US Office of Naval Research(ONR) through grant N00014-12-1-0503 The content ofthe information does not necessarily reflect the position orthe policy of the Government and no official endorsementshould be inferred

References[1] M A R Ahad J K Tan H Kim and S Ishikawa Motion history

image its variants and applications Machine Vision and Applica-tions 2012

[2] S Bihari R D McEvoy E Matheson S Kim R J Woodman andA D Bersten Factors affecting sleep quality of patients in intensivecare unit Journal of Clinical Sleep Medicine 2012

[3] N Dalal and B Triggs Histograms of oriented gradients for humandetection In Proc of the IEEE Conf on Computer Vision and PatternRecognition (CVPR) 2005

[4] Y Freund and R E Schapire A decision-theoretic generalization ofon-line learning and an application to boosting Jrnrsquol of computerand sys sci 1997

[5] R I Hartley and A Zisserman Multiple View Geometry in ComputerVision Cambridge University Press 2nd edition 2004

[6] M-K Hu Visual pattern recognition by moment invariants IEEEIRE Trans on Info Theory 1962

[7] W Huang A A P Wai S F Foo J Biswas C-C Hsia and K LiouMultimodal sleeping posture classification In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2010

[8] C Idzikowski Sleep position gives personality clue BBC NewsSeptember 16 2003

[9] C-H Kuo F-C Yang M-Y Tsai and L Ming-Yih Artificial neuralnetworks based sleep motion recognition using night vision camerasBiomedical Engineering Applications Basis and Communications2004

[10] W-H Liao and C-M Yang Video-based activity and movementpattern analysis in overnight sleep studies In Proc of the IEEE IntrsquolConf on Pattern Recognition (ICPR) 2008

[11] S Morong B Hermsen and N de Vries Sleep position and preg-nancy In Positional Therapy in Obstructive Sleep Apnea Springer2015

[12] F Pedregosa G Varoquaux A Gramfort V Michel B ThirionO Grisel M Blondel P Prettenhofer R Weiss V Dubourg J Van-derplas A Passos D Cournapeau M Brucher M Perrot andE Duchesnay Scikit-learn Machine learning in Python Journalof Machine Learning Research 2011

[13] T Penzel and R Conradt Computer based sleep recording and anal-ysis Sleep medicine reviews 2000

[14] J Platt et al Fast training of support vector machines using sequen-tial minimal optimization Advances in kernel methods support vec-tor learning 1999

[15] S Ramagiri R Kavi and V Kulathumani Real-time multi-view hu-man action recognition using a wireless camera network In Proc ofthe ACMIEEE Intrsquol Conf on Distributed Smart Cameras (ICDSC)2011

[16] C Sahlin K A Franklin H Stenlund and E Lindberg Sleep inwomen normal values for sleep stages and position and the effect ofage obesity sleep apnea smoking alcohol and hypertension Sleepmedicine 2009

[17] J Shotton R Girshick A Fitzgibbon T Sharp M Cook M Finoc-chio R Moore P Kohli A Criminisi and A Kipman Efficienthuman pose estimation from single depth images IEEE Trans onPattern Analysis and Machine Intelligence (PAMI) 2013

[18] C Torres S D Hammond J C Fried and B S Manjunath Mul-timodal pose recognition in an icu using multimodal data and envi-ronmental feedback In Proc of Springer Intrsquol Conf on ComputerVision Sys (ICVS) 2015

[19] G L Weinhouse and R J Schwab Sleep in the critically ill patientSleep-New York Then Westchester 2006

[20] Y Yang and D Ramanan Articulated human detection with flexiblemixtures of parts IEEE Trans on Pattern Analysis and MachineIntelligence 2013

10

  • 1 Introduction
    • 11 Related Work
    • 12 Proposed Work
      • 2 Eye-CU System Description
      • 3 Data Collection
        • 31 Modalities
        • 32 Feature Extraction
          • 4 Multimodal-Multiview Formulation
            • 41 Problem Statement
            • 42 Multimodal Construction
            • 43 Coupled Constrained Least-Squares (cc-LS)
            • 44 Multiview Formulation
            • 45 Testing
              • 5 Experiments
                • 51 Modality and View Assessment
                • 52 Performance of Reduced Eye-CUs
                • 53 Comparison with Existing Methods
                  • 6 Discussion
                  • 7 Conclusion and Future Work