How to Make Cars Smarter: A Step Towards Self-Driving Cars

Preview:

Citation preview

How to Make Cars Smarter: A Step Towards Self-Driving Cars

Kaushik K. DasEsther VasietePivotal Data Science

October 2016

Jeffrey Kelly
i changed the background image to a car. is everyone ok with that?
Jeffrey Kelly
Assuming this slide is a work in progress?
Jeffrey Kelly
What do you mean by "feature set" in step 2?
Kaushik Das
The set of features or variables which are inputs for the model
Jeffrey Kelly
ok, sounds good. just wasn't clear based only on the text. but i know you will explain it in the talk track.
Dormain Drewitz
+kdas@pivotal.io +evasiete@pivotal.io not to throw a wrench in the whole title, but I see a lot written about "autonomous" and "semi-autonomous" vehicles... is there a reason we're not using that language?
Jeffrey Kelly
Good question. Should at least say something like, "Some people refer to smart augmented cars as semi-autonomous vehicles." Just acknowledge the term.
Kaushik Das
Done

Today’s presentersPivotal Data Science Perspectives

Kaushik K. DasHead of Data Science, Pivotal

Esther VasieteData Scientist, Pivotal

Agenda

• What do we mean by “smarter cars”?

• How do we apply data science to build smarter cars?

Example 1: Predictive Maintenance

Example 2: Understanding Driver Behavior Patterns

• Demo

• Next Steps

Autonomous Cars will offer many advantages

Call a car whenever you want to go somewhere – sit and relax – and you are there!

● No stress for you – don’t have to drive in traffic or maintain a car

● Better utilization of cars leading to lower impact on environment

● Fewer accidents and injuries

BUT

there are some issues that still need to be solved – e.g. California law needs a driver ready to take over in case of an emergency

Autonomous Cars

Manually Driven Cars

We need to get from

Smart “Augmented” Cars*

Autonomous Cars

Manually Driven CarsWhy not -

* Some people refer to smart augmented cars as semi-autonomous vehicles

Augmentation – a situation in which humans and computers combine to create effective and efficient outcomes*

● You get reduced stress and fewer accidents

● Fewer regulatory / legal barriers

● Easier to implement* Thomas H. Davenport, Augmentation or Automation ?, WSJ, Feb 25, 2015.

Smart Cars offer many of the advantages of automation

Smart System = Sensors Digital Brain + Actuators

Problem Formulation

Data Step

Modeling Step

Application Step

Data Science For Building Models

Sensors & Data

Data Lake

Big Data Platform

Phase 1: Problem Formulation

Make sure you formulate a problem that is relevant to

the goals and pain points of the stakeholders

Phase 2: Data StepBuild the right feature set

making full use of the volume, variety and

velocity of all available data

Phase 3: Modeling StepThis is where you move from answering what, where and when to answering why and

what if?

Phase 4: ApplicationCreate a framework for

integrating the model with decision making processes and taking action using the

Internet of Things

Technology SelectionSelect the right platform and the right set of tools for solving the

problem at hand

Iterative ApproachPerform each phase in an agile manner, team up with domain experts and SMEs, and iterate

as required

CreativityTake the opportunity to innovate at every phase

Building a NarrativeCreate a fact-based narrative

that clearly communicates insights to stakeholders

The Eightfold Path of Data Science – four phases and four differentiating factors

KEY LANGUAGES

P L A T F O R M

KEY TOOLS

MLlib

PL/X

Mod

elin

g To

ols

Visu

aliz

atio

n To

ols

Platform

PivotalHDB

Pivotal Greenplum

Spring Cloud Data Flow

Apache Spark

PivotalHDP

Data Science Toolkit

Scalable, In-Database Machine Learning

• Open source https://github.com/apache/incubator-madlib• Downloads and docs http://madlib.incubator.apache.org/• Wiki

https://cwiki.apache.org/confluence/display/MADLIB/

Functions

Linear Systems• Sparse and Dense Solvers• Linear Algebra

Matrix Factorization• Singular Value Decomposition (SVD)• Low Rank

Generalized Linear Models• Linear Regression• Logistic Regression• Multinomial Logistic Regression• Ordinal Regression• Cox Proportional Hazards Regression• Elastic Net Regularization• Robust Variance (Huber-White),

Clustered Variance, Marginal Effects

Other Machine Learning Algorithms• Principal Component Analysis (PCA)• Association Rules (Apriori)• Topic Modeling (Parallel LDA)• Decision Trees• Random Forest• Support Vector Machines• Conditional Random Field (CRF)• Clustering (K-means) • Cross Validation• Naïve Bayes• Support Vector Machines (SVM)• Prediction Metrics

Descriptive StatisticsSketch-Based Estimators• CountMin (Cormode-Muth.)• FM (Flajolet-Martin)• MFV (Most Frequent Values)Correlation and CovarianceSummary

Utility ModulesArray and Matrix OperationsSparse VectorsRandom SamplingProbability FunctionsData PreparationPMML ExportConjugate GradientStemmingSessionizationPivot

Inferential StatisticsHypothesis Tests

Time Series• ARIMA

Sept 2016

Path Functions• Operations on Pattern Matches

Data Science Use-Cases● Smarter Car‒ Is the car functioning well?‒ Do any of the parts need servicing or replacement?‒ How are the new parts functioning? Are they better than the old parts? How’s their performance

relative to tests?

● Smarter Driver Response‒ Understand drivers driving patterns and typical routes and customize for better driving experience

(Advanced Driver Assistance Systems)

● Smarter Response to Surroundings‒ How do we improve congestion forecasting and optimize routes better?‒ How do we improve traffic management ?‒ How can city planning be improved by using very granular driving and traffic information?

InitialSales

Web/AppsLogs

Demographics

CRM

Consumer Data

Surveys

DrivingBehavior

Sales &Leasing

Car Data

Dealership

Service Data

Parts

Manufactur-ing

Telemetry Data

Weather

Traffic

Economic

External

SpecialEvents

(Note: not an exhaustive list)

There’s a lot of data available

Example 1 - Smarter Car

Preventive Maintenance for Connected Cars

Diagnostic Trouble Codes (DTC)

Unscheduled repairs

AB1029 – Power steering pump replacementCT3408 – Wheel alignment

Data Sources for Predictive Maintenance

VINTimestamp DTC CodeOdometer

SpeedAcceleration

Engine Temperature Engine Torque GPS

Coordinates etc.

VINDate vehicle in

Date vehicle outRepair code

Parts replacedWarranty claims

Repair Commentsetc.

Vehicle Data Car Repairs Data

Predicting Job Type from Diagnostic Trouble Codes (DTCs)

Time

Job Type: Transmission

Job Type: Transmission

EngineJob Type:

Regular check

DTC: B DTC: B,

P, C

DTC: U DTC: B DTC: B

DTC: B, P, C, U

DTC:P, B, U

DTC: P DTC: B DTC: B,P

DTC: B,P

Can the DTCs observed here predict

this Job Type?

Can the DTCs observed here predict this Job

Type?

Can the DTCs observed here predict this Job

Type?

Hierarchical Classification Framework

Vehicle Features

DF1210

DF1215

DF2980

AB1029

AB1622

AB1625

AB8622

CT3402

CT3408

CT3560

CT2409

DTC codes + other features (e.g. mileage, vehicle model, previous repairs, ...)

1st stage: N one-vs-rest logistic regression models

2nd stage: N random forest models

Your car will be repaired before you have a problem!

Example 2 - Smarter Driver Response

Unsupervised driving behavior analysis

Segmentation:From raw sensor data to driving scenes using HMM.

Feature Distribution:Quantization of physical features observed in each scene

Driving topics:Scenes are represented as a combination of driving topics, which explain driving patterns.

Parallelism using:

PL/Python ** HMM inference frompre-trained model

PL/Python

[T. Bando, K. Tabenaka, S. Negasaka, T. Taniguchi, Unsupervised drive topic finding from driving behavioral data, IEEE Intelligent Vehicles Symposium, 2013]

HMM inference using PL/PythonNote: HMM parameters had been provided to us and loaded in the database.

hmmlearn library installed in every segment!

From time-series driving behavior into natural language

Latent Dirichlet Allocation (LDA)

Document

Word

Scene

Quantizedsensorvalue

[D. Blei, Probabilistic topic models, Communications of the ACM, 2012]

Live Demo

Data Lake Business Levers

Apps

MLlibPL

/X

Model Building

Model Tuning

Continuous Model Improvement

Data Feeds

Ingest Filter Enrich

SinkSpring Cloud Data Flow

Greenplum

Operationalization - Pipeline of a Data Science Driven App

We will be able to improve your driving experience by preparing your car for the exact conditions you are

about to encounter.

It’s easy to make cars smarter - let’s make it happen!

Questions?

Additional resources & next steps

Read: Pivotal Data Science Bloghttps://blog.pivotal.io/channels/data-science-pivotal

Strategic: Pivotal Data Science Analytics Roadmapping Engagement https://pivotal.io/contact

Tune in: Next data science webinar “How Data Science can help with Fraud Detection and Cybersecurity” - Q1 2017 (Date TBD) https://pivotal.io/resources/1/webinars

Hands on: HDB Sandbox on HDP VM https://network.pivotal.io/products/pivotal-hdbGreenplum Sandbox https://network.pivotal.io/products/pivotal-gpdbApache MADlib (incubating) http://madlib.incubator.apache.org/

Dormain Drewitz
+kdas@pivotal.io if we can get the next webinar scheduled and reg page live, we can promote it here!!
Jeffrey Kelly
no reg page up and this topic is subject to change as the DS team still hasn't settled on a topic for Q1 webinar. But I'm suggesting this topic, or some variation there of.
Kaushik Das
Jeff - I modified the next topic a little bit - made it more general - we have some good stores in that area but the content will depend on what permissions we can get from customers

Recommended