7
Balancing Privacy and Safety: Protecting Driver Identity in Naturalistic Driving Video Data Sujitha Martin Laboratory of Intelligent and Safe Automobiles UCSD - La Jolla, CA, USA [email protected] Ashish Tawari Laboratory of Intelligent and Safe Automobiles UCSD - La Jolla, CA, USA [email protected] Mohan M. Trivedi Laboratory of Intelligent and Safe Automobiles UCSD - La Jolla, CA, USA [email protected] ABSTRACT Naturalistic driving dataset is at the heart of automotive user interface research, detecting/measuring driver distrac- tion, and many other driver safety related studies. Recent advances in the collection of large scale naturalistic driving data include the second Strategic Highway Research Program (SHRP2) consisting of more than 3000 subjects and the 100- Car study. Public access to such data, however, is made dif- ficult due to personal identifiable information and protection of privacy. We propose de-identification filters for protect- ing the privacy of drivers while preserving sufficient details to infer driver behavior, such as the gaze direction, in natural- istic driving videos. Driver’s gaze estimation is of particular interest because it is a good indicator of driver’s visual atten- tion and a good predictor of driver’s intent. We implement and compare de-identification filters, which are made up of a combination of preserving eye regions, superimposing head pose encoded face mask and replacing background with black pixels, and show promising results. Author Keywords De-Identification; Driver Safety; Privacy; Human Factors; ACM Classification Keywords K.4.1 Computers and Society: Privacy; I.4.0 Image Process- ing and Computer Vision: Image Processing Software INTRODUCTION The 100-Car Naturalistic Driving Study (NDS) collected data for more than one year, yielding in nearly 2,000,000 miles and 42,300 hours driven, 82 crashes, 761 near crashes and 8,295 critical incidents [10]. Preliminary analysis on the data has shown nearly 80% of crashes and 65% of near-crashes in- volved some form of driver inattention within three seconds prior to the event [10], and a strong correlation between ac- cumulated off-road eye glances and the risk of crashes and near-crashes [18]. Similarly, the SHRP-2 naturalistic driv- ing study (NDS) has been collecting data for the past two years which will result in approximately 4 petabytes of data, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. AutomotiveUI ’14, September 17 - 19 2014, Seattle, WA, USA Copyright c 2014 ACM 978-1-4503-3212-5/14/09...$15.00 http://dx.doi.org/10.1145/2667317.2667325 1 million hours of video, 3000 subjects, 5 million trips, 33 million miles driven and 4 billion GPS points [2]. Public ac- cess to raw data, however, for neither the 100-Car study nor the SHRP2-NDS data is available due to personal identifiable information and protection of privacy. There is, therefore, a need to protect the privacy of individuals while preserving sufficient details to infer driver behavior in naturalistic driv- ing videos. Camera sensors looking at the driver, an integral part of in- telligent vehicles [12], are of particular concern for invasion of privacy, as they can be used to recognize the driver’s iden- tity. Typical protection of privacy of individuals in a video sequence include blacking out and blurring of faces or peo- ple, commonly referred to as “De-Identification”. While this will help to protect the identities of individual drivers, it im- pedes the purpose of sensorizing vehicles to look at the driver and his behavior. In an ideal situation, a de-identification al- gorithm applied to the video of looking at the driver would protect the privacy of drivers while preserving sufficient de- tails to infer driver behavior (e.g. eye gaze, head pose, hand activity). Any form of de-identification on video sequences of looking at the driver can only degrade the performance of detecting or measuring driver behavior. However, the degradation in per- formance could be minimized by using appropriate methods of de-identification. Therefore, we propose, de-identification filters which preserve disjoint regions of the face depending on the type of driver behavior study. For example, a de- identification filter which preserves only the mouth region can be used for monitoring a driver’s yawning [21] or talking and a de-identification filter which preserves the eye regions can be used for detecting driver’s fatigue [13] or gaze direction. Since an ensemble of facial features is often used to identify a person, de-identification filters which preserve only a subset of facial features and that too in a disjoint manner are promis- ing for privacy protection. In addition to preserving a subset of disjoint facial regions, we explore the advantages of super- imposing a head pose encoded face mask to provide spatial context. In this paper, we choose the driver’s gaze as the driver behav- ior to preserve. We implement and compare de-identification filters, which are made up of a combination of preserving eye regions for fine gaze estimation, superimposing head pose en- coded face masks for providing spatial context and replac- ing background with black pixels for ensuring privacy pro- tection, as shown in Figure 1c. The eye region preserving 1

Balancing Privacy and Safety: Protecting Driver Identity ...cvrr.ucsd.edu/publications/2014/MartinTawariTrivedi_AutoUI2014.pdf · ... Protecting Driver Identity in Naturalistic Driving

  • Upload
    ngodan

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Balancing Privacy and Safety: Protecting Driver Identity inNaturalistic Driving Video Data

Sujitha MartinLaboratory of Intelligent and

Safe AutomobilesUCSD - La Jolla, CA, USA

[email protected]

Ashish TawariLaboratory of Intelligent and

Safe AutomobilesUCSD - La Jolla, CA, USA

[email protected]

Mohan M. TrivediLaboratory of Intelligent and

Safe AutomobilesUCSD - La Jolla, CA, USA

[email protected]

ABSTRACTNaturalistic driving dataset is at the heart of automotiveuser interface research, detecting/measuring driver distrac-tion, and many other driver safety related studies. Recentadvances in the collection of large scale naturalistic drivingdata include the second Strategic Highway Research Program(SHRP2) consisting of more than 3000 subjects and the 100-Car study. Public access to such data, however, is made dif-ficult due to personal identifiable information and protectionof privacy. We propose de-identification filters for protect-ing the privacy of drivers while preserving sufficient detailsto infer driver behavior, such as the gaze direction, in natural-istic driving videos. Driver’s gaze estimation is of particularinterest because it is a good indicator of driver’s visual atten-tion and a good predictor of driver’s intent. We implementand compare de-identification filters, which are made up ofa combination of preserving eye regions, superimposing headpose encoded face mask and replacing background with blackpixels, and show promising results.

Author KeywordsDe-Identification; Driver Safety; Privacy; Human Factors;

ACM Classification KeywordsK.4.1 Computers and Society: Privacy; I.4.0 Image Process-ing and Computer Vision: Image Processing Software

INTRODUCTIONThe 100-Car Naturalistic Driving Study (NDS) collected datafor more than one year, yielding in nearly 2,000,000 milesand 42,300 hours driven, 82 crashes, 761 near crashes and8,295 critical incidents [10]. Preliminary analysis on the datahas shown nearly 80% of crashes and 65% of near-crashes in-volved some form of driver inattention within three secondsprior to the event [10], and a strong correlation between ac-cumulated off-road eye glances and the risk of crashes andnear-crashes [18]. Similarly, the SHRP-2 naturalistic driv-ing study (NDS) has been collecting data for the past twoyears which will result in approximately 4 petabytes of data,

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’14, September 17 - 19 2014, Seattle, WA, USACopyright c© 2014 ACM 978-1-4503-3212-5/14/09...$15.00http://dx.doi.org/10.1145/2667317.2667325

1 million hours of video, 3000 subjects, 5 million trips, 33million miles driven and 4 billion GPS points [2]. Public ac-cess to raw data, however, for neither the 100-Car study northe SHRP2-NDS data is available due to personal identifiableinformation and protection of privacy. There is, therefore, aneed to protect the privacy of individuals while preservingsufficient details to infer driver behavior in naturalistic driv-ing videos.

Camera sensors looking at the driver, an integral part of in-telligent vehicles [12], are of particular concern for invasionof privacy, as they can be used to recognize the driver’s iden-tity. Typical protection of privacy of individuals in a videosequence include blacking out and blurring of faces or peo-ple, commonly referred to as “De-Identification”. While thiswill help to protect the identities of individual drivers, it im-pedes the purpose of sensorizing vehicles to look at the driverand his behavior. In an ideal situation, a de-identification al-gorithm applied to the video of looking at the driver wouldprotect the privacy of drivers while preserving sufficient de-tails to infer driver behavior (e.g. eye gaze, head pose, handactivity).

Any form of de-identification on video sequences of lookingat the driver can only degrade the performance of detecting ormeasuring driver behavior. However, the degradation in per-formance could be minimized by using appropriate methodsof de-identification. Therefore, we propose, de-identificationfilters which preserve disjoint regions of the face dependingon the type of driver behavior study. For example, a de-identification filter which preserves only the mouth region canbe used for monitoring a driver’s yawning [21] or talking anda de-identification filter which preserves the eye regions canbe used for detecting driver’s fatigue [13] or gaze direction.Since an ensemble of facial features is often used to identify aperson, de-identification filters which preserve only a subsetof facial features and that too in a disjoint manner are promis-ing for privacy protection. In addition to preserving a subsetof disjoint facial regions, we explore the advantages of super-imposing a head pose encoded face mask to provide spatialcontext.

In this paper, we choose the driver’s gaze as the driver behav-ior to preserve. We implement and compare de-identificationfilters, which are made up of a combination of preserving eyeregions for fine gaze estimation, superimposing head pose en-coded face masks for providing spatial context and replac-ing background with black pixels for ensuring privacy pro-tection, as shown in Figure 1c. The eye region preserving

1

(a) Preserving scene [6] (b) Preserving action [1] (c) Preserving gaze direction (this paper)Figure 1. Comparison of selected works in de-identification from different applications: (a) Google street view: removing pedestrians and preservingscene using multiple views, (b) Surveillance: Obscuring identity of actor and preserving action and (c) Intelligent vehicles: Protecting driver’s identityand preserving driver’s gaze (this paper).

filters will also be useful for other works, such as understand-ing the relationship between eyes-off road and lane keepingability [14]. Furthermore, with similarly designed filters thatmatch the designer’s criteria of what to preserve, institutionsmay be more inclined to publicly share de-identified natural-istic driving data. The research community can then benefittremendously from large amounts of naturalistic driving dataand focus on the design and evaluation of intelligent vehicles.

RELATED WORKSThe term “De-Identification” has been popularly used in lit-erature to address the concern of privacy invasion from sen-sorized environments. A sensorized environment can be any-thing from when a photographer captures a moment in timewhere the camera is the sensor and the scene is the environ-ment. The need for de-identifying people in sensorized en-vironments are typically for two reasons, either the person isnot intended to be in the image or the presence and actionof the person is intended but not their identity. The formeris a prevalent reason for de-identification in applications likeGoogle-street view [7, 6], while the latter has applications insurveillance [1], as illustrated in Figure 1.

In the 360◦ panoramic view, the Google street view car cap-tures not only the appearance of location specific objects suchas buildings, billboards and street signs but also privacy in-vasion material such as people and license plates. Googleprotects individual privacy by introducing a system that au-tomatically blurs faces and license plates [7]. However, re-cent studies have shown that face is one of many identifiablefeatures associated with people, such as silhouette [3], gaitand articles of clothing. To this end, many researchers haveproposed to remove persons from the Google street view andreplace them with background pixels using multiple views ofthe same scene [6] or with similar looking pedestrians from acontrolled dataset [11].

As consumer vehicles are also instrumented with an increas-ing number of sensors, looking inside at the driver is rais-ing concern over individual privacy. The first work on de-identification of drivers in naturalistic driving videos can befound in [9], but it lacks the spatial context we present inthis paper with the face mask, which shows a significant im-provement in driver’s gaze-zone estimation. To the best ofour knowledge, no other work has explicitly proposed or eval-uated de-identification algorithms for inside the vehicle. Wesay explicitly because some researchers have used a deriva-tive of raw camera sensory data to do further analysis. In[20], Cheng and Trivedi represent data from multiple cam-era sensors as voxel data and perform occupant posture anal-ysis. Similarly, [19] uses EXtremity Movement OBserva-tion (XMOB) for 3D upper body pose tracking to determinewhether the driver’s hand is on the wheel. In this study,we intentionally apply de-identification filters on data froma camera looking at the driver, and quantify the level of de-identification and the effects of de-identification on estimat-ing driver’s gaze direction.

DE-IDENTIFICATION FILTERA de-identification filter is that which takes an image or asequence of images where a person could be recognizable andmakes it unrecognizable, meanwhile preserving the necessaryinformation for which the image was captured. In the drivingcontext, a de-identification filter as applied to a raw imageof looking at the driver will ideally output an image where,semantically speaking, the driver’s identity is protected andthe driver’s behavior is preserved.

Challenges of De-Identification in Naturalistic DrivingDe-Identification of drivers inside the vehicle cockpitpresents a few questions, issues and challenges. First,given an image containing the driver’s face, whether de-identification should be applied locally inside a region of in-terest (e.g. the driver’s face) or over the entire image. Theformer is not recommended because if the face is detected or

2

Figure 2. Illustrating three different de-identification filters, which semantically share the same goal of obscuring driver’s identity and preservingdriver’s behavior, but in different degrees: (a) the first filter output is on the lower end of privacy because it masks only parts of the face and leavesspatially contextual information (i.e. hair color/length, body shape/posture), (b) the second filter provides more privacy while still preserving gaze, and(c) the third filter preserves only deduced information and therefore provides the highest privacy among the three.

tracked incorrectly, then de-identification on the wrong por-tion of the image leaves the driver’s face vulnerable for identi-fication. Robustness of face tracking algorithms is made diffi-cult by the changing illumination conditions and large spatialhead movements present in typical naturalistic driving data.

Therefore, the latter proposition of applying de-identificationuniformly to the entire image is more ideal for its limited orlack of dependence on face detection or tracking modules.It would, however, mean sacrificing resolution or key detailsin the background, which do not reveal the driver’s identity.For example, distorting the whole image can deteriorate theperformance not only of recognizing driver’s identity, but alsoof inferring whether the driver is wearing a seat belt or ininferring driver’s hand movements.

This leads to the second issue of what should be preservedin the process of de-identification. Driver fatigue monitor-ing systems would benefit largely from preserving mouth be-havior such as the number of times the driver yawned [21],and eye behavior such as the proportion of time the eyes areclosed (PERCLOS) [13]. In addition to fatigue monitoring,preserving eye gaze behavior can be used as a proxy to de-termine what information the driver is processing [17]. Forexample, coarse gaze direction estimation is a good indicatorof driver’s intent to change lanes [4, 5]. Figure 2 illustratesthree different de-identification filters which share the samegoal of obscuring driver identity and preserving driver behav-ior, but in varying degrees. Therefore, depending on the studyof driver behavior, specific de-identification algorithms canbe designed.

Evaluating the degree of de-identification is a key part ofdesigning the filter because failure to provide adequate pro-tection of privacy is unacceptable. Evaluations can occur inone of two ways: human user study and algorithmic facerecognition. While the latter is more objective, it does notcompare to a human’s ability to recognize faces. In a user

study for face recognition, participants are typically asked tomatch the driver in a de-identified image against a pool ofpossible candidate photographs. One of the main issue indesigning this user study is in choosing the candidates. Itwould be ideal that not all individuals represented in the de-identified images be in the list of candidates and vice versa.This represents a realistic situation of person identificationwhere the subject in real life encounters many unknown facesto match with known (candidate) faces.

De-Identification by Parts, with and withoutSpatial ContextIn this study, we are interested in protecting the driver’s iden-tity and preserving the driver’s gaze direction. There is atrade-off, however, between accurately and robustly estimat-ing driver’s gaze and protecting the driver’s identity, becausethe same facial features that are useful for gaze estimationplay a key role in recognizing a person’s identity. We exploremethods to de-identify the driver yet allow for gaze estima-tion by preserving key facial regions in the foreground andobscuring other regions in the background.

Gaze estimation can be accomplished using any one of thecombinations of facial regions in the first row of Figure 3,where some are more robust than others to large head move-ments, facial deformations, lighting conditions etc. Whilepresenting these facial regions in a disjoint manner helps tohamper face recognition, it also hampers gaze estimation.However, superimposing these disjoint facial regions onto ageneric face model, as shown in the second row of Figure 3,may provide the necessary spatial context for gaze estimation.

In this study, we explore the benefits of preserving the re-gion around the eyes while either replacing the rest of theface region with black pixels or an appropriate face mask.To extract the location of the eyes, facial features are reli-ably detected using supervised descent method [22]. Thesefacial landmarks are also used to estimate head pose. Head

3

Figure 3. Illustration of different combinations of patches around fa-cial landmarks to estimate or predict driver behavior (i.e. driver’s gaze,driver is talking) while protecting driver’s identity. The second row isthe same as the first row but with an underlying face mask to providespatial context. However, some combinations are more susceptible thanothers to face recognition.

pose is computed using seven facial landmarks (eye corners,nose corners and nose tip) and their corresponding points ona 3D-generic face model [8]. Using head pose, appropriateface masks are generate for each image. To generate a facemask, a 3D mean face model is rotated using head pose, isscaled using the distance between the detected eye corners inthe 2D image plane, is aligned using the nose tip and is finallyprojected onto the image by discarding the axis value that isperpendicular to the image plane. Furthermore, using headpose, a de-identification scheme which preserves only one-eye can determine which eye is more visible from the cameraperspective.

When facial feature tracking becomes unreliable and de-identification is only applied to the estimated face region,parts of the true face region becomes exposed and vulnera-ble to recognition. Some possibilities include diffusing [15]or scrambling the background (i.e. non-face region). Vi-sual querying, however, raises concern that any form of de-identification on the background, except for pixel replace-ment, could provide some information towards driver’s iden-tity. For this reason, we replace everything around the de-tected face region with black pixels. However, removingbackground information can remove contextual informationoften helpful in e.g. determining driver gaze (look) zone.

Therefore, a thorough experimental analysis is conducted us-ing three de-identification methods, where the background(i.e. non-face region) is replaced with black pixels: the firstpreserves the region around one-eye with black pixel replace-ment for the rest of the face region, the second preservesthe region around both eyes with black pixel replacement forthe rest of the face region and the third preserves the regionaround both eyes with superimposed face mask on the rest ofthe face region. These de-identification methods, henceforth,will be referred to as One-Eye, Two-Eyes and Mask with Two-Eyes, respectively.

EXPERIMENTAL EVALUATIONIn the following sections we describe the collected datasetand present participants’ response to de-identified images of

Figure 4. Illustrates the five gaze zone regions of interest: Left, Front,Right, Rear Mirror, and Inside.

drivers in two user studies, face recognition and gaze zoneestimation.

Experiment DesignThe LISA-A tested [16] is used to collect data of four driversdriving on urban roadways and on freeways around the Uni-versity of California, San Diego (UCSD) campus. Amongthe collected sensory inputs, one is a video feed from a cam-era mounted to the left of the driver on the front wind shieldnear the A-pillar. While each drive lasted approximately 20minutes, we choose sample images from the video sequencewhere the driver is looking at particular gaze zones to conductthe face recognition and gaze zone estimation user study. Fig-ure 4 illustrates the five gaze zones considered: Left, Front,Right, Rear Mirror, Inside.

A total of 20 human subjects were asked to participate in thistwo part user study: 10 participants for the face recognitionstudy and the other 10 participants for the gaze-zone estima-tion study. In the user study for recognition of faces in de-identified images, we used five images of each driver for thefive gaze zones times two for two types of de-identification,One-Eye and Two-Eyes. In addition, approximately 5 randomimages from four other drivers were used to increase variabil-ity in the dataset. So there was a total of 5 x 4 x 2 + 38 =78 de-identified images presented to each of the ten partici-pants. Figure 5 shows the layout of the user study as seen byparticipants.

Given a test sample, participants choose one of the candidateswho best identifies with the person in the de-identified image.The candidate images are random pictures and not taken fromthe collected dataset. Among the 12 candidates, the imagewith a question mark is available for use when the participantcould not conjecture who is in the de-identified image. Theparticipants were instructed that people in the de-identifiedimages may not necessarily be available as one of the candi-dates and not all candidates are necessarily represented in thede-identified images. This represents a realistic situation ofperson identification where the subject in real life encountersmany unknown faces to match with known (available) faces.

The second user study is comprised of nine expert partici-pants classifying the gaze of the driver in the de-identifiedimage into one of six categories: Left, Front, Right, RearMirror, Inside, and Unknown. These ’experts’ represent re-

4

Figure 5. Layout of face recognition toolbox for user study. Given a de-identified image, participants choose one of the 12 candidates that bestmatches the driver in the de-identified image.

Table 1. Face Recognition User Study: Statistics on Participants’ Re-sponse

De-IdentificationMethod

Drivers Samples RecognitionRate

UnknownRate

One-Eye

1 50 0% 34%2 50 8% 38%3 50 2% 38%4 50 10% 46%

All 200 5% 39%

Two-Eyes

1 50 0% 46%2 50 8% 40%3 50 8% 30%4 50 16% 54%

All 200 8% 43%

searchers who are familiar with the camera perspective. Bywatching unperturbed videos of drivers not in this study butfrom the same camera perspective, the expert participantswere familiarized with estimating gaze zones. This is espe-cially challenging because the camera position is biased to theleft. Testing samples contain five images of two drivers foreach of the five gaze zones times three for all three types ofde-identification. A total of 5 x 2 x 5 x 3 = 150 de-identifiedimages is presented to each participant.

Face Recognition PerformanceFirst step of evaluation, one participant took part in the userstudy for recognition of faces on images before they were de-identified. With a 100% recognition rate, we claim the pic-tures of candidates satisfactorily represented the raw imagesof looking at the driver. Second step of evaluation tested thelevel of face recognition with two-types of de-identificationmethods: One-Eye and Two-Eyes.

De-identified images by way of face mask superimposed withthe actual eyes of the driver on a black background is not

Table 2. Gaze Zone Estimation User Study: Statistics on Participants’response

De-IdentificationMethod

Gaze-zone Samples Accuracy UnknownRate

One-Eye

Left 90 67% 1%Front 90 82% 0%Right 90 76% 9%

Rear Mirror 90 67% 0%Inside 90 32% 7%

All 450 65% 3%

Two-Eyes

Left 90 92% 2%Front 90 87% 2%Right 90 77% 1%

Rear Mirror 90 61% 1%Inside 90 39% 3%

All 450 71% 2%Left 40 98% 0%

Front 40 85% 0%Mask with Right 40 88% 0%Two-Eyes Rear Mirror 40 88% 0%

Inside 40 65% 0%All 200 85% 0%

part of the face recognition study, because it doesn’t revealany more information about the driver’s identity than the de-identified images with two-eyes only. On the contrary, fusingfacial features from two different sources, one source beingthe driver and the other source being the mean face model,can only introduce confusion.

Table 1 details the number of drivers, the number of samplesaccumulated over all participants per driver, the recognitionrate and the percentage of times participants responded withUnknown for de-identification with One-Eye and Two-eyes.Given there are 12 possible candidates to choose from, ran-dom chance of recognition is 1/12 = 8.3%. Table 1 shows themean recognition rate is less than or equal to chance for mostof the drivers considered. On average, the recognition rateis higher when using the Two-Eyes de-identification methodthan when using the One-Eye de-identification method, as ex-pected, however, both are below chance level.

It’s important to mention that the nature of the experimentwhere the choices are given to pick one from is very con-servative and subjects could use elimination tactics withoutactually identifying the driver. For example, one participantnoted that eye color (e.g. dark or light) was one of his criteriafor choosing a candidate. Despite this, recognition rate of anyparticular driver as well as overall is at most at chance level.

Gaze Zone Estimation PerformanceThe gaze zones are as illustrated in Figure 4: Left, Front,Right, Rear Mirror and Inside. A total of nine expert par-ticipants classified the driver’s gaze direction as perceived inthe de-identified images of looking at the driver. Table 2 liststhe number of samples over all participants, the classificationaccuracy, and the percentage of times participants respondedwith Unknown for each method of de-identification for each

5

0.67 0.31 0.01 0 0

0.09 0.82 0.04 0 0.04

0 0.08 0.76 0.01 0.07

0 0.02 0.31 0.67 0

0 0.33 0.28 0 0.32

Left Front Right Rear−view Down

Left

Front

Right

Rear−view

Down

0.92 0.02 0 0 0.03

0.08 0.87 0.03 0 0

0 0.19 0.77 0.01 0.02

0 0.06 0.32 0.61 0

0 0.33 0.24 0 0.39

Left Front Right Rear−view Down

Left

Front

Right

Rear−view

Down

0.98 0.03 0 0 0

0.1 0.85 0.05 0 0

0 0.03 0.88 0.03 0.08

0 0 0.13 0.88 0

0 0.1 0.25 0 0.65

Left Front Right Rear−view Down

Left

Front

Right

Rear−view

Down

(a) One-eye (b) Two-eyes (c) Mask with two-eyesFigure 6. Confusion matrix for the five gaze zone classification by participants on de-identified images with (a) one-eye (b) two-eyes and (c) two-eyessuperimposed on face mask. Gaze zones are as depicted in Figure 4. Each row represents true gaze and each column represents the participants’estimate of the gaze zone. On average, gaze zone estimation accuracy is 65%, 71% and 85% for de-identification with one-eye, with two-eyes and withtwo-eyes superimposed on face mask, respectively.

gaze zone. Given there are five gaze zones, random chanceof gaze zone accuracy is 1/5 = 20%. As shown in Table2, gaze zone accuracy is well above chance for all three de-identification methods for all of the gaze zones considered.On average, gaze zone estimation was 65%, 71% and 85%with One-Eye, Two-Eyes and Mask with Two-Eyes, respec-tively.

Confusion matrix, as given in Figure 6, gives insight into themisclassification of gaze zones. For instance, Rear Mirrorgaze-zone is significantly confused with Right gaze-zone, be-cause it is not incorrect to assume a rightward gaze when adriver is gazing at the rear-view mirror. Similarly, Inside gazeis significantly confused with Front gaze and Right gaze. It isexpected since gazing at the gauge and instrument panel in-voke gazes similar to Front and Right, respectively. Some ofthese misclassifications, however, decreased when presentedwith more spatial context. For instance, Left gaze-zone mis-classification is high when using One-Eye but decreased sig-nificantly when using Two-Eyes with and without the mask.On the other hand, Rear-view gaze zone misclassification isconsistently large when using One-Eye and Two-Eyes but de-creased when using Mask with Two-Eyes.

CONCLUDING REMARKSIn the design of driver assistance system, when looking at thedriver, driver’s identity is irrelevant to understanding and pre-dicting driver behavior. We explored three de-identificationschemes, which are made up of a combination of preserv-ing eye regions, superimposing head pose encoded face maskand replacing background with black pixels. Eyes especially,because it can provide finer detail on gaze zone estimation.A two part user study using human participants showed facerecognition to be well below chance and gaze zone estimationaccuracy to be 65%, 71% and 85% for One-Eye, Two-Eyesand Mask with Two-Eyes, respectively.

AcknowledgmentWe acknowledge support of the UC Discovery Program andassociated industry partners. We also thank our UCSD LISAcolleagues who helped in a variety of important ways in ourresearch studies. Finally, we thank the reviewers for theirconstructive comments.

REFERENCES1. Agrawal, P., and Narayanan, P. Person de-identification

in videos. Circuits and Systems for Video Technology,IEEE Transactions on 21, 3 (2011), 299–310.

2. Campbell, K. Analyzing driver behavior using data fromthe shrp 2 naturalistic driving study, May 2013.

3. Collins, R. T., Gross, R., and Shi, J. Silhouette-basedhuman identification from body shape and gait. InAutomatic Face and Gesture Recognition, 2002. FifthIEEE International Conference on (2002).

4. Doshi, A., and Trivedi, M. Investigating the relationshipsbetween gaze patterns, dynamic vehicle surroundanalysis, and driver intentions. In Intelligent VehiclesSymposium, 2009 IEEE, IEEE (2009), 887–892.

5. Doshi, A., and Trivedi, M. M. Head and eye gazedynamics during visual attention shifts in complexenvironments. Journal of vision 12, 2 (2012), 9.

6. Flores, A., and Belongie, S. Removing pedestrians fromgoogle street view images. In Computer Vision andPattern Recognition Workshops (CVPRW), 2010 IEEEComputer Society Conference on, IEEE (2010).

7. Frome, A., Cheung, G., Abdulkader, A., Zennaro, M.,Wu, B., Bissacco, A., Adam, H., Neven, H., andVincent, L. Large-scale privacy protection in googlestreet view. In Computer Vision, 2009 IEEE 12thInternational Conference on, IEEE (2009), 2373–2380.

8. Martin, S., Tawari, A., Murphy-Chutorian, E., Cheng,S. Y., and Trivedi, M. On the design and evaluation ofrobust head pose for visual user interfaces: algorithms,databases, and comparisons. In Automotive UserInterfaces and Interactive Vehicular Applications(2012).

9. Martin, S., Tawari, A., and Trivedi, M. M. Towardprivacy-protecting safety systems for naturalistic drivingvideos. Intelligent Transportation Systems, IEEETransactions on (2014).

10. Neale, V. L., Dingus, T. A., Klauer, S. G., Sudweeks, J.,and Goodman, M. An overview of the 100-carnaturalistic study and findings. National Highway TrafficSafety Administration, Paper, 05-0400 (2005).

11. Nodari, A., Vanetti, M., and Gallo, I. Digital privacy:Replacing pedestrians from google street view images.In Pattern Recognition (ICPR), 2012 21st InternationalConference on, IEEE (2012), 2889–2893.

12. Ohn-Bar, E., Tawari, A., Martin, S., and Trivedi, M.Vision on wheels: Looking at driver, vehicle, and

6

surround for on-road maneuver analysis. In Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition Workshops (2014), 185–190.

13. Pei, Z., Zhenghe, S., and Yiming, Z. Perclos-basedrecognition algorithms of motor driver fatigue.Journal-China Agricultural University 7, 2 (2002),104–109.

14. Peng, Y., Boyle, L. N., and Hallmark, S. L. Driver’s lanekeeping ability with eyes off road: Insights from anaturalistic study. Accident Analysis & Prevention 50(2013), 628–634.

15. Perona, P., and Malik, J. Scale-space and edge detectionusing anisotropic diffusion. Pattern Analysis andMachine Intelligence, IEEE Transactions on 12, 7(1990), 629–639.

16. Tawari, A., and Trivedi, M. M. Head dynamic analysis:A multi-view framework. In New Trends in ImageAnalysis and Processing–ICIAP 2013. Springer, 2013,536–544.

17. Taylor, T., Pradhan, A., Divekar, G., Romoser, M.,Muttart, J., Gomez, R., Pollatsek, A., and Fisher, D. Theview from the road: The contribution of on-roadglance-monitoring technologies to understanding driverbehavior. Accident Analysis & Prevention (2013).

18. Tian, R., Li, L., Chen, M., Chen, Y., and Witt, G.Studying the effects of driver distraction and trafficdensity on the probability of crash and near-crash eventsin naturalistic driving environment. IntelligentTransportation Systems, IEEE Transactions on (2013).

19. Tran, C., and Trivedi, M. M. 3-d posture and gesturerecognition for interactivity in smart spaces. IndustrialInformatics, IEEE Transactions on 8, 1 (2012),178–187.

20. Trivedi, M. M., Cheng, S. Y., Childers, E. M. C., andKrotosky, S. J. Occupant posture analysis with stereoand thermal infrared video: Algorithms andexperimental evaluation. Vehicular Technology, IEEETransactions on 53, 6 (2004), 1698–1712.

21. Wang, Q., Yang, J., Ren, M., and Zheng, Y. Driverfatigue detection: a survey. In Intelligent Control andAutomation, 2006. WCICA 2006. The Sixth WorldCongress on, vol. 2, IEEE (2006), 8587–8591.

22. Xiong, X., and De la Torre, F. Supervised descentmethod and its applications to face alignment. InComputer Vision and Pattern Recognition (CVPR), 2013IEEE Conference on (2013).

7