11
Improving efficiency and accuracy in data management for naturalistic driving studies Rusan Chen Georgetown University

Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Improving efficiency and accuracy indata management for naturalistic driving studies

Rusan Chen

Georgetown University

Page 2: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

• Naturalistic driving studies involve complicated, dynamic datasets1

• Efficient data management is essential for the analysis results being replicable 2

• Based on my experience working on the 40-car Naturalistic Driving Study3

Overview

Page 3: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Sound familiar?

You have multiple versions for the same file and don’t know which is which.

You cannot find an important file and think you may have deleted it.

There are two versions of the ‘latest’ draft for a paper, with the same name ‘final.doc’

Page 4: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Efficient workflow requires proper

• Organizing

• Documenting

• Automating

• Archiving

Page 5: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Organizing

• \Work and \Post directories are critical

• Once a file is posted, it is never changed!

Example:

C:\40Car

\ADS

\Work

\Post

40carAnalysis.doc

Page 6: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Organizing folders

• \Post \2009

\012710 survey questionnaire analysis

\013110 personality related to risky driving

\031110 predicting C/NC from g-force

\032710 SAS Glimmix

\033010 risky friends interaction

\052110 speeding analysis

\052410 perception of risk as mediators

\060610 high vs low risky drivers

Page 7: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Documenting

It is always better to document today than tomorrow

What to document?

• Date

• Purpose

• Data sources

• How to form new composite scores

• Steps of analysis

• Where to save the results

• To whom you sent the results

Page 8: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Automating

• Data management involves doing the same task multiple times.

• Automating these tasks can save time and prevent errors

What to automate? (using macros and loops)

• To update, merge, and subset datasets

• to create and label new variables

• To check outliers

• To define and report missing values

• To fit a sequence of similar models

• To save analysis results

Page 9: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

short-term mid-term long-term

mirror backup archive

Archiving: to protect your files

Page 10: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Thank you!

Page 11: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times

Reference

• Long, JS (2009) The workflow of data analysis using Stata. Stata Press, TX: College Station