102
data science @ The New York Times [email protected] [email protected] @chrishwiggins references: bit.ly/icerm

data history / data science @ NYT

Embed Size (px)

Citation preview

Page 1: data history / data science @ NYT

data science @ The New York Times

[email protected]@nytimes.com@chrishwiggins

references: bit.ly/icerm

Page 2: data history / data science @ NYT

data science @ The New York Times

references: bit.ly/icerm

Page 3: data history / data science @ NYT

data science @ The New York Times

references: bit.ly/icerm

Page 4: data history / data science @ NYT

data science @ The New York Times

references: bit.ly/icerm

Page 5: data history / data science @ NYT

data science @ The New York Times

references: bit.ly/icerm

Page 6: data history / data science @ NYT

“data science” jobs, jobs, jobs

references: bit.ly/icerm

Page 7: data history / data science @ NYT

“data science” jobs, jobs, jobs

references: bit.ly/icerm

Page 8: data history / data science @ NYT

“data science” jobs, jobs, jobs

references: bit.ly/icerm

Page 9: data history / data science @ NYT

data science: mindset & toolset

drew conway, 2010

references: bit.ly/icerm

Page 10: data history / data science @ NYT

modern history:2009

references: bit.ly/icerm

Page 11: data history / data science @ NYT

“data science” blogs, blogs, blogs

references: bit.ly/icerm

Page 12: data history / data science @ NYT

“data science” blogs, blogs, blogs

The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.

The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science.

references: bit.ly/icerm

Page 13: data history / data science @ NYT

“data science” blogs, blogs, blogs

references: bit.ly/icerm

Page 14: data history / data science @ NYT

“data science” ancient history: 2001

references: bit.ly/icerm

Page 15: data history / data science @ NYT

“data science” ancient history: 2001

references: bit.ly/icerm

Page 16: data history / data science @ NYT

data science context

references: bit.ly/icerm

Page 17: data history / data science @ NYT

home schooled

references: bit.ly/icerm

Page 18: data history / data science @ NYT

PhD in topology

references: bit.ly/icerm

Page 19: data history / data science @ NYT

“By the end of late 1945, I was a statistician rather than a topologist”

references: bit.ly/icerm

Page 20: data history / data science @ NYT

invented: “bit”

references: bit.ly/icerm

Page 21: data history / data science @ NYT

invented: “software”

references: bit.ly/icerm

Page 22: data history / data science @ NYT

invented: “FFT”

references: bit.ly/icerm

Page 23: data history / data science @ NYT

“the progenitor of data science.” - @mshron

references: bit.ly/icerm

Page 24: data history / data science @ NYT

“The Future of Data Analysis,” 1962John W. Tukey

references: bit.ly/icerm

Page 25: data history / data science @ NYT

introduces: “Exploratory data anlaysis”

references: bit.ly/icerm

Page 26: data history / data science @ NYT

Tukey 1965, via John Chambers

references: bit.ly/icerm

Page 27: data history / data science @ NYT

TUKEY BEGAT S WHICH BEGAT R

references: bit.ly/icerm

Page 28: data history / data science @ NYT

Tukey 1972

references: bit.ly/icerm

Page 29: data history / data science @ NYT

? 1972

references: bit.ly/icerm

Page 30: data history / data science @ NYT

Jerome H. Friedman

references: bit.ly/icerm

Page 31: data history / data science @ NYT

Tukey 1975

In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information

references: bit.ly/icerm

Page 32: data history / data science @ NYT

TUKEY BEGAT VDQI

references: bit.ly/icerm

Page 33: data history / data science @ NYT

Tukey 1977

references: bit.ly/icerm

Page 34: data history / data science @ NYT

TUKEY BEGAT EDA

references: bit.ly/icerm

Page 35: data history / data science @ NYT

fast forward -> 2001

references: bit.ly/icerm

Page 36: data history / data science @ NYT

“The primary agents for change should be university departments themselves.”

references: bit.ly/icerm

Page 37: data history / data science @ NYT

data science @ The New York Times

histories

1. in academia -> Bell: as heretical statistics (see also Breiman)

2. in industry: as job description

historical rant: bit.ly/data-rant

Page 38: data history / data science @ NYT

data science @ The New York Times

[email protected]@nytimes.com@chrishwiggins

references: bit.ly/icerm

Page 39: data history / data science @ NYT

biology: 1892 vs. 1995

biology changed for good.

references: bit.ly/icerm

Page 40: data history / data science @ NYT

genetics: 1837 vs. 2012

ML toolset; data science mindset

references: bit.ly/icerm

Page 41: data history / data science @ NYT

genetics: 1837 vs. 2012

references: bit.ly/icerm

Page 42: data history / data science @ NYT

genetics: 1837 vs. 2012

ML toolset; data science mindset

arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost

Page 43: data history / data science @ NYT

data science: mindset & toolset

references: bit.ly/icerm

Page 44: data history / data science @ NYT

1851

references: bit.ly/icerm

Page 45: data history / data science @ NYT

news: 20th century

church state

references: bit.ly/icerm

Page 46: data history / data science @ NYT

church

references: bit.ly/icerm

Page 47: data history / data science @ NYT

church

references: bit.ly/icerm

Page 48: data history / data science @ NYT

church

Page 49: data history / data science @ NYT

news: 20th century

church state

references: bit.ly/icerm

Page 50: data history / data science @ NYT

news: 21st century

church state

engineering

references: bit.ly/icerm

Page 51: data history / data science @ NYT

1851 1996

newspapering: 1851 vs. 1996

references: bit.ly/icerm

Page 52: data history / data science @ NYT

example:

millions of views per hour2015

Page 53: data history / data science @ NYT

references: bit.ly/icerm

Page 54: data history / data science @ NYT

data science: the web

references: bit.ly/icerm

Page 55: data history / data science @ NYT

data science: the web

is your “online presence”

references: bit.ly/icerm

Page 56: data history / data science @ NYT

data science: the web

is a microscope

references: bit.ly/icerm

Page 57: data history / data science @ NYT

data science: the web

is an experimental tool

references: bit.ly/icerm

Page 58: data history / data science @ NYT

data science: the web

is an optimization tool

references: bit.ly/icerm

Page 59: data history / data science @ NYT

1851 1996

newspapering: 1851 vs. 1996 vs. 2008

2008

references: bit.ly/icerm

Page 60: data history / data science @ NYT

“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank

references: bit.ly/icerm

Page 61: data history / data science @ NYT

every publisher is now a startup

references: bit.ly/icerm

Page 62: data history / data science @ NYT
Page 63: data history / data science @ NYT

news: 21st century

church state

engineering

references: bit.ly/icerm

Page 64: data history / data science @ NYT

news: 21st century

church state

engineering

references: bit.ly/icerm

Page 65: data history / data science @ NYT

learnings

references: bit.ly/icerm

Page 66: data history / data science @ NYT

learnings

- supervised learning- unsupervised learning- reinforcement learning

references: bit.ly/icerm

Page 67: data history / data science @ NYT

learnings

- supervised learning- unsupervised learning- reinforcement learning

cf. modelingsocialdata.org

references: bit.ly/icerm

Page 68: data history / data science @ NYT

stats.stackexchange.com

references: bit.ly/icerm

Page 69: data history / data science @ NYT

from “are you a bayesian or a frequentist” —michael jordan

L =NX

i=1

' (yif(xi;�)) + �||�||

Page 70: data history / data science @ NYT

supervised learning, e.g.,

cf. modelingsocialdata.org

Page 71: data history / data science @ NYT

supervised learning, e.g.,

“the funnel”

cf. modelingsocialdata.org

Page 72: data history / data science @ NYT

interpretable supervised learning

supe

r co

ol s

tuff

cf. modelingsocialdata.org

Page 73: data history / data science @ NYT

interpretable supervised learning

supe

r co

ol s

tuff

cf. modelingsocialdata.org

arxiv.org/abs/q-bio/0701021

Page 74: data history / data science @ NYT

optimization & learning, e.g.,

“How The New York Times Works “popular mechanics, 2015

Page 75: data history / data science @ NYT

optimization & prediction, e.g.,

“How The New York Times Works “popular mechanics, 2015

(some models)

(som

e mo

neys

)

Page 76: data history / data science @ NYT

recommendation as supervised learning

Page 77: data history / data science @ NYT

recommendation as predictive modeling

bit.ly/AlexCTM

Page 78: data history / data science @ NYT

unsupervised learning, e.g,

cf. daeilkim.com ; import bnpy

Page 79: data history / data science @ NYT

modeling your audiencebit.ly/Hughes-Kim-Sudderth-AISTATS15

Page 80: data history / data science @ NYT

modeling your audience(optimization, ultimately)

Page 81: data history / data science @ NYT

also allows recommendation as inferencemodeling your audience

Page 82: data history / data science @ NYT

prescriptive modeling, e.g,

Page 83: data history / data science @ NYT

prescriptive modeling, e.g,

Page 84: data history / data science @ NYT

Reporting

Learning

Test

Optimizing

Exploreunsupervised:

supervised:

reinforcement:

Page 85: data history / data science @ NYT

Reporting

Learning

Test

Optimizing

Exploreunsupervised:

supervised:

reinforcement:

Page 86: data history / data science @ NYT

common requirements in data science:

Page 87: data history / data science @ NYT

common requirements in data science:

1.people2.ideas3.things

cf. USAF

Page 88: data history / data science @ NYT

things:what does DS team deliver?

Page 89: data history / data science @ NYT

things:what does DS team deliver?

- build data prototypes- build APIs- impact roadmaps

Page 90: data history / data science @ NYT

- build data prototypes

Page 91: data history / data science @ NYT

- build data prototypes

cf. daeilkim.com

Page 92: data history / data science @ NYT

- build data prototypes

cf. daeilkim.com

Page 93: data history / data science @ NYT

- in puppet, w/python2.7- collaboration w/pers. team

- build APIs

Page 94: data history / data science @ NYT

- impact roadmaps

flickr/McJex

Page 95: data history / data science @ NYT

data science: ideas

Page 96: data history / data science @ NYT

data skills

- data engineering- data science- data visualization- data product- data multiliteracies- data embeds

cf. “data scientists at work”, ch 1

Page 97: data history / data science @ NYT

data skills

- data engineering- data science- data visualization- data product- data multiliteracies- data embeds

cf. “data scientists at work”, ch 1

Page 98: data history / data science @ NYT

data science: people

- new mindset > new toolset

Page 99: data history / data science @ NYT

data science: people

Page 100: data history / data science @ NYT

summary:pay attention to:

1.people2.ideas3.things

cf. USAF

Page 101: data history / data science @ NYT

thanks to the data science team!

Page 102: data history / data science @ NYT

data science @ The New York Times

[email protected]@nytimes.com@chrishwiggins