4
Machine Learning project: Identify a Car’s Driver from Driving Behavior Fan Yang, Chunjing Jia December 12, 2013 1 Introduction Each individual has his/her personal driving behavior, which could been used as a identifying characteristic, similar to handwriting. Under this hypothesis, we propose a learning study of the connection between a driver’s identity and the vehicle’s characteristics, such as accelerometer/heading/speed, which can usu- ally be collected using the electronic system of the vehicle or by imposing other measurements. The dataset includes real-time high-frequency accelerometer, heading, speed, odometer, and gas usage. We first convert the time-dependent data into a large number of time-independent features, which can then be used to train vehicle-against-vehicle classifiers. We aim to obtain the reliable super- vised learning algorithm for the single driver driving the same car, as well as unsupervised clustering to detect when vehicles have multiple drivers. 2 Data Collecting The data collection was operated by MetroMile, Inc and has been saved in csv (comma-separated values) format which can be seen and manipulated by Microsoft Excel and Matlab. Each csv file has the information for one car, in which the data was collected for a number of trips. Each trip includes the information for a continuous section of time, usually every second or every few seconds. The recorded information includes the velocity in the units of mph, the orientation of the car, the accelerations in three dimensions, and the transient gas usage. See figure 1 for the first trip that has been collected for car #133000249. The characteristic number of trips collected at each car is a few thousands, which is for example 2281 for car #133000249 when any two consecutive data points collected with a time interval greater than 60 seconds being seen as two different trips. This provides us a lot of information to study the driving behavior of each driver. And further with the assumption that driving behavior is unique for each single person, we can identify the driver just by looking at the way he/she drives. We note that the we assume that the each driver’s driving behavior is independent of the car’s make/model/condition, just 1

Machine Learning project: Identify a Car’s Driver from ...cs229.stanford.edu/proj2013/MachineLearning_Yang_Jia.pdf · Machine Learning project: Identify a Car’s Driver from Driving

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning project: Identify a Car’s Driver from ...cs229.stanford.edu/proj2013/MachineLearning_Yang_Jia.pdf · Machine Learning project: Identify a Car’s Driver from Driving

Machine Learning project: Identify a Car’s Driver

from Driving Behavior

Fan Yang, Chunjing Jia

December 12, 2013

1 Introduction

Each individual has his/her personal driving behavior, which could been used asa identifying characteristic, similar to handwriting. Under this hypothesis, wepropose a learning study of the connection between a driver’s identity and thevehicle’s characteristics, such as accelerometer/heading/speed, which can usu-ally be collected using the electronic system of the vehicle or by imposing othermeasurements. The dataset includes real-time high-frequency accelerometer,heading, speed, odometer, and gas usage. We first convert the time-dependentdata into a large number of time-independent features, which can then be usedto train vehicle-against-vehicle classifiers. We aim to obtain the reliable super-vised learning algorithm for the single driver driving the same car, as well asunsupervised clustering to detect when vehicles have multiple drivers.

2 Data Collecting

The data collection was operated by MetroMile, Inc and has been saved incsv (comma-separated values) format which can be seen and manipulated byMicrosoft Excel and Matlab. Each csv file has the information for one car,in which the data was collected for a number of trips. Each trip includes theinformation for a continuous section of time, usually every second or everyfew seconds. The recorded information includes the velocity in the units ofmph, the orientation of the car, the accelerations in three dimensions, and thetransient gas usage. See figure 1 for the first trip that has been collected forcar #133000249. The characteristic number of trips collected at each car is afew thousands, which is for example 2281 for car #133000249 when any twoconsecutive data points collected with a time interval greater than 60 secondsbeing seen as two different trips. This provides us a lot of information to studythe driving behavior of each driver. And further with the assumption thatdriving behavior is unique for each single person, we can identify the driver justby looking at the way he/she drives. We note that the we assume that the eachdriver’s driving behavior is independent of the car’s make/model/condition, just

1

Page 2: Machine Learning project: Identify a Car’s Driver from ...cs229.stanford.edu/proj2013/MachineLearning_Yang_Jia.pdf · Machine Learning project: Identify a Car’s Driver from Driving

like when people recognize the signature the kind of pen he/she uses is ignored.The same data collecting procedure has been performed for 18 different cars.We know from the data provider that some of the cars are driven by one singledrive, while some of the cars are driven by multiple people in a family. Table1 shows the list of car names, the corresponding number in the study and thenumber of drivers.

0 100 200 300 400 500 600 700 8000

100

200

300

heading degree

0 100 200 300 400 500 600 700 8000

50

100

speed mphgas mpg

0 100 200 300 400 500 600 700 800−1

−0.50

0.51

accel x gsaccel y gsaccel z gs

Figure 1: The information of the first trip/section that has been collected forcar #133000249.

3 Feature selection

Extracting out the key features from the tons of data that we have obtained isone of the key questions for this study. We see each trip as one data point, sothat we can extract a vector x containing all the useful features to represent thisdata point. Then we can obtain, for example for car #249, 2281 data points.This has provided us a large enough data set for either the regression for thesingle-driver cases or the multi-class classification for the multiple drivers cases.

To find out the good and useful features turn out to be a tough question,especially considering the complexity of the collected data and the problem it-self. The features that we propose include: (1) average speed in each sectionx1 (2) max speed in each section x2 (3) average speed on the ramp when en-tering highway x3 (4) average speed on the ramp when leaving highway x4 (5)frequency of lane changing x5 (6) speed at 1 second before stop x6 (7) speedat 2 second before stop x7 (8) speed at 3 second before stop x8 (9) speed at1 second after start x9 (10) speed at 2 second after start x10 (11) speed at 3second after start x11. x= [x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11]T . We find

2

Page 3: Machine Learning project: Identify a Car’s Driver from ...cs229.stanford.edu/proj2013/MachineLearning_Yang_Jia.pdf · Machine Learning project: Identify a Car’s Driver from Driving

Table 1: The list of car models, with the car number and the number of drivers,that have been used for the data collecting.

car model and make car number driver(s) condition2005 Volkswagen GTI 2-door Hatchback — 4-cylinder 133000249 22004 Honda Pilot 6-cylinder — 4WD 133000250 22012 Toyota Prius v 4-door Wagon — 4-cylinder 133000251 22003 Toyota Corolla 4-door Sedan — 4-cylinder 133000252 12011 Infiniti G37 4-door Sedan — 6-cylinder 133000253 Family of 3 drivers2011 Mercedes-Benz GL450 8-cylinder — 4WD 133000254 same as 2542008 Subaru Outback 4-door Wagon — 4-cylinder 133000257 Family of 3 drivers2003 Honda Accord 4-door Sedan — 4-cylinder 133000258 same as 2572005 Toyota Camry 4-door Sedan — 4-cylinder 133000259 12012 Subaru Impreza 4-door Wagon — 4-cylinder 133000261 12011 Volkswagen Jetta 4-door Sedan — 5-cylinder 133000263 12011 Nissan Versa 4-door Hatchback — 4-cylinder 133000265 22007 Acura MDX 6-cylinder — 4WD 133000284 12000 Toyota Camry 4-door Sedan — 4-cylinder 133000374 12007 BMW 335 4-door Sedan — 6-cylinder 133000381 22001 BMW X5 8-cylinder — 4WD 133000386 12006 Honda Civic 2-door Coupe — 4-cylinder 133000485 12003 Cadillac CTS 4-door Sedan — 6-cylinder 133000623 1

out that by including these features we don’t oversimply the modeling nor makethe modeling over complicated so as to overfit.

4 Supervised learning

We performed supervised learning for the car of single driver. The internal rela-tion of the features can be modeled as: x2 ∼ N (a1∗x2

1+a2∗x1+a3, a4∗x1+a5),x3 ∼ N (a6, a7), x4 ∼ N (a8, a9), x5 ∼ N (a10, a11), x6 ∼ N (a12 ∗ x7 + a13, a14 ∗x7+a15), x7 ∼ N (a16∗x8+a17, a18∗x8+a19), x9 ∼ N (a20∗x10+a21, a22∗x10+a23), x10 ∼ N (a24 ∗ x11 + a25, a26 ∗ x11 + a27). For the cars of one single driver,we fit the features with the model described above and find the parameter a=[a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21,a22, a23, a24, a25, a26, a27]T . The parameter vector a can be used to as the iden-tification for the driver. A model fitting of the features for car #133000259 hasbeen shown in figure 2.

5 Unsupervised learning

For those cars of multiple drivers, we use k-means clustering algorithm to sepa-rate different drivers. For example, for car #133000249 as shown in figure 3, thefrequency of lane changing highlighted by the dotted circles have two clustersthat can be directly used to separate the two drivers. This algorithm becomesvery useful for separating the drivers who have very different behaviors on lanechanging frequency, but may not work very well when different drives tend tohave close behaviors on lane changing frequency.

3

Page 4: Machine Learning project: Identify a Car’s Driver from ...cs229.stanford.edu/proj2013/MachineLearning_Yang_Jia.pdf · Machine Learning project: Identify a Car’s Driver from Driving

0 20 40 60 80 1000

20

40

60

80

100speed

ma

x s

pe

ed

mp

h

average speed mph

0 20 40 60 80 100

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

average (accel)speed on ramp (mph)

average (deaccel)speed on ramp (mph)

frequency of lane changing #/5000s

0 10 20 30 40 500

10

20

30

40

50

speed this second

sp

ee

d la

st se

co

nd

1 second before stop

2 seconds before stop

0 10 20 30 40 500

10

20

30

40

50

speed this second

sp

ee

d n

ext se

co

nd

1 second after stop

2 seconds after stop

Figure 2: Features and the model parameters for car #133000259 (1 driver).

0 20 40 60 80 1000

20

40

60

80

100speed

max

spe

ed m

ph

average speed mph0 20 40 60 80 100

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

average (accel)speed on ramp (mph)average (deaccel)speed on ramp (mph)frequency of lane changing #/5000s

0 10 20 30 40 500

10

20

30

40

50

speed this second

spee

d la

st s

econ

d

1 second before stop2 seconds before stop

0 10 20 30 40 500

10

20

30

40

50

speed this second

spee

d ne

xt s

econ

d

1 second after stop2 seconds after stop

Figure 3: Features and the model parameters for car #133000249 (2 drivers).

4