GPS en verplaatsingsgedrag

GPS Data Collection

Harry TimmermansEindhoven University of Technology

04/12/2023

The Survey Method

• Conventional survey methods for activity-travel diary data

• Application of new data collection methods– GPS logger (original traces)– User participation

• Social demographic information• Personal profile• Downloading and uploading data• Validating activity-trip agendas

– Web-based prompt recall survey • Embedded in TraceAnnotator

The Prompt Recall

Validation of Activities/Trips

Survey Management

• Time horizon– 4 waves, each wave takes 3 months– Each individual is invited for 3 months continuously

• Location– Rijnmond and Eindhoven regions

• Respondents– People living in area– Companies recruit their own panels

• Statistics followed will use the data from Rijnmond region as an example

User Participation (# of days)

0~7 8~14 15~31 32~60 60~0%

10%

20%

30%

40%

50%

60%

70%

19%

6%11%

5%

59%

User participation: Rijmond area

Number of days

Perc

enta

ge o

f the

par

ticip

ation

• 300 of 434 respondents are fully or partly involved in the survey

~16yr 17~30yr 31~55yr 56~65yr 66~yr0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

0%

10%

41%

26%

23%

Age

Age of Respondents

The percentage of respondents who are older than 55 is almost 50%.

No children

Frequency of Activities/Trips

Missing days

High frequency is due to the short events, which needs to be filtered.

Single activity

Frequency of Activity Type

Ave. Activity Duration by Type

Frequency of Transport Mode

Many short walking trips

Approach

• Classification of transport modes and activity episode– Bayesian Belief Network (BBN)

• Replaces ad hoc rules

• A graphical representation of probabilistic causal information incorporating sets of probability conditional tables;

• Represents the interrelationship between spatial and temporal factors (input), and activity-travel pattern (output), i.e. transportation modes and activity episode;

• Learning-based improved accuracy if consistent evidence is obtained over time from more samples;

Framework

04/12/2023 Feng&Timmermans 13

• Transportation mode• Activity episode

Personal Data

GPS data

Geographical Data

Conditional Probabilities

Theoretical support and applications

• Accuracy of the algorithm– Limited sample and transportation modes– Full sample and full transportation modes

• Comparison of different imputation algorithms• Improve the imputed activity/trip sequence• Map matching between GPS traces and road networks• Impact of equity of travel time uncertainty

Accuracy of the Algorithm

Source: Anastasia, et al., (2010) Semi-Automatic Imputation of Activity-Travel Diaries Using GPS Traces, Prompted Recall and Context-Sensitive Learning Algorithms. Journal of Transportation Research Record, 2183.

Accuracy of the Algorithm Activity Walking Running Cycling Bus Motorcycle Car Train Metro Tram Light rail

Activity 84% 4% 0% 0% 0% 0% 1% 9% 2% 0% 0%Walking 2% 97% 0% 0% 1% 0% 0% 0% 0% 0% 0%Running 0% 0% 98% 0% 1% 0% 1% 0% 0% 0% 0%Cycling 0% 0% 0% 100% 0% 0% 0% 0% 0% 0% 0%Bus 1% 0% 0% 0% 87% 0% 0% 0% 0% 12% 0%Motorcycle 0% 0% 0% 0% 0% 100% 0% 0% 0% 0% 0%Car 0% 0% 0% 0% 1% 0% 98% 0% 0% 0% 1%Train 0% 0% 0% 0% 0% 0% 5% 58% 36% 0% 0%Metro 1% 0% 0% 0% 0% 0% 0% 1% 98% 0% 0%Tram 0% 0% 0% 0% 0% 0% 2% 0% 0% 98% 0%Light rail 0% 0% 0% 0% 2% 0% 0% 0% 0% 0% 98%

GPS OnlyActivity 84%Walking 97%Running 98%Cycling 100%Bus 87%Motorcycle 100%Car 98%Train 58%Metro 98%Tram 98%Light rail 98%

Source: Feng, T and Timmermans, H. (2012) Recognition of transportation mode using GPS and accelerometer data. International Conference of IATBR, Toronto, Canada, 15-20, July, 2012.

Comparison of Imputation AlgorithmsId Algorithms

1 Bayesian Network (BN)2 Naive Bayesian (NB)3 Logistic regression (LR)4 Multilayer Perception (MP)5 Decision Table (DT)6 Support Vector Machine (SVM)7 C4.5 (C45)8 CART (CART)

AlgorithmsTraining Data Test Data

CCI (%) ICI (%) Kappa CCI (%) ICI (%) KappaBN 99.805 0.195 0.997 99.474 0.526 0.993NB 86.966 13.034 0.822 86.648 13.352 0.818LR 94.865 5.135 0.926 94.510 5.490 0.921MP 97.118 2.882 0.958 96.816 3.184 0.954DT 98.886 1.114 0.984 98.100 1.900 0.973SVM 94.667 5.333 0.923 94.458 5.542 0.920C45 99.825 0.175 0.998 99.309 0.691 0.990

Table 3 Prediction accuracy and model performance

• Training data and test data• We use the indicators of the correctly classified

instances (CCI), incorrectly classified instances (ICI) and Kappa value (Kappa).

• Data are for each time epoch

- WCTRS 2013

Count Percentage

Training data 39,942 75%Test data 13,316 25%Total 53,258 100%

Training and test datasets

Comparison of Imputation Algorithms

Table 4 Hit ratios by transportation mode and activity episode Note: A-Activity episode; B-Train; C-Walking; D-Bike; E-Car; F-Bus; G-Motorbike; H-Running; I-Tram; J-Metro

• BN and C45 may perform more stable than others• The hit ratios for the test data do not have to be lower than that for the

training data, except the BN and C45.

• The level of the hit ratio of BN model is comparable with other methods.

Training Data A B C D E F G H I J

BN 0.997 0.997 0.999 1 0.999 0.999 1 0.999 1 1NB 0.848 0.969 0.934 0.799 0.836 0.926 0.949 0.98 1 0.983LR 0.989 0.991 0.818 0.928 0.891 0.758 0.947 0.76 1 1MP 0.998 0.974 0.916 0.926 0.965 0.743 0.989 0.985 1 1DT 0.999 0.971 0.958 0.985 0.979 0.99 0.991 0.974 0.982 0.98SVM 0.987 0.999 0.76 0.925 0.876 0.888 0.971 0.654 1 1C45 1 0.999 0.993 0.997 0.997 0.994 0.998 0.999 0.996 0.99

Test Data A B C D E F G H I J

BN 0.996 0.993 0.988 0.997 0.994 0.977 0.999 1 1 0.983NB 0.849 0.964 0.942 0.789 0.826 0.9 0.946 0.963 1 0.975

LR 0.99 0.994 0.815 0.915 0.882 0.733 0.935 0.752 1 1

MP 0.998 0.976 0.896 0.926 0.962 0.708 0.987 0.974 1 1

DT 0.998 0.948 0.939 0.973 0.97 0.973 0.982 0.963 0.892 0.959

SVM 0.987 0.998 0.763 0.931 0.869 0.844 0.968 0.641 0.985 1

C45 0.998 0.998 0.974 0.992 0.987 0.98 0.991 0.956 1 0.992

Superimposing the activity/trip sequence

L1 = L4HOME

L2Work

L3Shop

Sport

Trip 2

Trip 3

Trip 4

1

2

3

Trip 1

Trip 5

Trip 6

L5Restaurant • Method 1

o The frequency of the transportation mode which has the highest probability is identified for each trip episode separately. The transportation mode which has the highest frequency for all trips is selected.

• Method 2o The frequencies of all transportation modes of all

trip episodes which belong to the same tour are put together. Then, the one which has the highest frequency with highest probabilities is selected to replace others.

• Method 3o In case of three or more trips within a same tour,

we identify the transportation mode using Method 1 for all trips excluding the first and the last trips. Then, we use the confirmed mode as the replacement of the first and last trips.

- NTTS2013

Morning peak Evening peak

Original imputed 60,50% 71,1%

Method 1 65,8% 76,3%

Method 2 76,3% 65,4%

Method 3 63,2% 68,4%

• Hit ratios of car mode (AM vs. PM) BIKE BUS CAR METRO TRAIN TRAM WALKINGOriginal BIKE 4,3% - 6,4% 4,8% - 5,6% 20,9%

BUS 4,3% - 34,6% 9,5% - - 21,3%CAR 4,3% 42,9% 2,3% 6,3% 57,1% - 24,4%METRO - - 0,5% 27,0% - - 2,2%RUNNING 48,9% - 0,3% - - - 12,5%TRAIN - 4,8% 42,7% 34,9% 28,6% - 17,2%TRAM - 47,6% 1,8% - - 79,6% 0,9%WALKING 38,3% 4,8% 11,5% 17,5% 14,3% 14,8% 0,6%

Method 1 BIKE 34,0% - 2,8% 4,8% - 1,9% 14,1%BUS 4,3% 4,8% 22,6% 9,5% - - 9,7%CAR - 28,6% 26,2% 11,1% 85,7% - 44,4%METRO - 4,8% 0,8% 23,8% - - 1,9%RUNNING 34,0% - 0,3% - - - 5,3%TRAIN - 9,5% 28,8% 33,3% - - 14,4%TRAM - 38,1% 1,3% - - 72,2% 3,4%WALKING 27,7% 14,3% 17,3% 17,5% 14,3% 25,9% 6,9%

Method 2 BIKE 19,1% - 3,1% 4,8% - 1,9% 15,0%BUS 4,3% - 19,6% 9,5% - - 7,8%CAR 2,1% 33,3% 26,7% 11,1% 71,4% - 44,1%METRO - - 0,8% 20,6% - - 1,6%RUNNING 34,0% - 0,3% - - - 6,3%TRAIN - 9,5% 31,6% 36,5% 14,3% - 14,4%TRAM - 47,6% 2,0% - - 77,8% 2,5%WALKING 40,4% 9,5% 16,0% 17,5% 14,3% 20,4% 8,4%

Method 3 BIKE 17,0% - 4,8% 4,8% - 1,9% 13,8%BUS 4,3% - 23,2% 9,5% - - 14,4%CAR 2,1% 38,1% 13,7% 6,3% 57,1% - 29,7%METRO - - 1,3% 27,0% - - 1,6%RUNNING 29,8% - 0,3% - - 5,6% 10,6%TRAIN - 9,5% 34,4% 36,5% 28,6% - 16,3%TRAM - 38,1% 0,8% - - 68,5% 2,5%WALKING 46,8% 14,3% 21,6% 15,9% 14,3% 24,1% 11,3%

Total 100,0% 100,0% 100,0% 100,0% 100,0% 100,0% 100,0%

• Confusion matrix of original imputed data and new methods

• The confusion matrix shows that the suggested algorithm could substantially improve the accuracy of the imputation;

• As shown in the hit ratio, all improved methods lead to increased accuracy for morning peak trips relative to originally imputed data;

• Method 1 is better than the other two methods, especially for the prediction of motorized commute trips during peak times.

Feedbacks from Respondents

• Problems during the survey– Problems of using BT747

• Different windows system (64b system)• Internet browser (Firefox sometimes has problems)• Can’t download data (complex reasons)• Can’t upload data (wrong data file or data format)

– Problems of website• Small bugs of website program (improved)• Multiple persons in a same household (user account specific)• Long processing time (Not cleaning data)

– Missing days• Forget GPS logger or problematic data (view as a schedule)

Other Issues

• Enough number of respondents• Monitor and remind respondents• Completeness of personal profile data (social

demography)• Post data processing

Thanks for your attention.