25
APPLIED DATA SCIENCE Giovanni Lanzani – Chief Science Officer GoDataDriven @gglanzani

Giovanni Lanzani GoDataDriven

Embed Size (px)

Citation preview

APPLIED DATA SCIENCE

Giovanni Lanzani – Chief Science Officer

GoDataDriven

@gglanzani

WHO AM I

Italy

01

Leiden

University

02

KPMG

03

GoDataDriven

04

WHAT IS MACHINE LEARNING

LEARNING FROM DATA

• You have some (lots) of data

• You need to generalize

BEST MODEL

• Which one would you choose here?

• It’s about making a tradeoff

• This trade off is the most important job of the PO

• A 100% correct answer might not exist!!!

WHAT’S DATA SCIENCE

ULTIMATELY

• It’s about creating value from data

• Using Machine Learning, Advanced Analytics, and visualization

WHEN YOU SAY DATA SCIENCE, COMPANIES UNDERSTAND

• All the things big data

• Predictive modeling & Advanced Analytics

• More money

• Do all the cool things the others are doing

HOW TO GET THERE

TRADITIONAL DATA WAREHOUSE

ARCHITECTURE

EDW

Data consumer

Web app

Dashboard /Reporting

TraditionalBusiness app

AND NOW?

?

Data consumer

Web app

Dashboard /Reporting

TraditionalBusiness app

API

WHAT COMPANIES GOT

• A lot of POCs

• A lot of screenshots/presentations/dashboards on a laptop

• Nice stories to tell to their network, about those screenshots and especially those dashboards

• Headaches with data and infra even more scattered

BUT…

• We got a data scientist working on trees, and forests

• Neural networks!

• Deep learning!!!

WHAT DO COMPANIES ACTUALLY NEED

• Put things into production

• They don’t teach that in any data science course or MOOC (that I know)

THE THREE HURDLES

Credit to Jon Shave gdd.li/lavaredo

OVERSIMPLIFYING

Requirements

DataSources

ExplorationModeling

Products

Feedback

Data scientist MLengineer

Dataengineer

Dataengineer

🤦🤦♀️🤦🤦

Customers

KAGGLE CURSE

• gdd.li/toldYouSo

• Many data scientists approach the problem at hand with a Kaggle-like mentality: delivering the best model in absolute terms, no matter what the practical implications are.

• In reality it's not the best model that we implement, but the one that combines quality and practicality: a continuous balancing act

• Netflix competition

SOLVING THEM

BUSINESS CASE

Business case for

• True Positives

• True Negatives

Cost of

• False Positives

• False Negatives

DATA

Data {insert something here}

should be pro grade

SKILLS

• Participate in actually building production quality systems OR being proficient enough in R or python to hack together a prototype on a very small dataset?

• Supply of the second group keeps growing while demand is flat or shrinking

• Especially as executives get burned by “data scientists” who don't know how to help them build things of value

HIRING

• Companies that are not engineering driven, often have trouble hiring good technical people

• The “IQ” test is not really representative of applied data science

• At GoDataDriven we do a “at home, at your convenience” assessment

• Real dataset, real business question, real product

• Models are software: treat them as such

TAKEAWAYS

• POs should know “their stuff”

• Automate all the data movements

• Hire data scientists that are good at programming (or hire machine learning engineers)

QUESTIONS?

• We’re hiring

• Data & Machine Learning Engineers!

[email protected]