21
Models: pets and herds Carlos J. Gil Bellosta [email protected] September 2014 Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 1 / 21

[Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Embed Size (px)

DESCRIPTION

Models: pets and herds". Carlos J. Gil Bellosta

Citation preview

Page 1: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Models: pets and herds

Carlos J. Gil [email protected]

September 2014

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 1 / 21

Page 2: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

This is a pet...

Source: http://jessfalcone.wordpress.com

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 2 / 21

Page 3: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

... and this is a herd

Source: http://bonfirehealth.com/negative-influences-comparisons-social-cues-herd/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 3 / 21

Page 4: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Some people treat computers as pets...

Source: aliexpress.com

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 4 / 21

Page 5: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

... an others like herds

Source: Failure Trends in a Large Disk Drive Population, Pinheiro et al.

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 5 / 21

Page 6: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

This is a statistical model treated as a pet

Source: http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 6 / 21

Page 7: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Pets are very demanding and require...

1 variable selection,

2 checks for outilers,

3 assessment of the goodness of fit,

4 finding confidence intervals,

5 calculating p-values,

6 interpretating the results,

7 discuss the generalization,

8 ...

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 7 / 21

Page 8: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Models... as herds?

Source: http://www.gotmedieval.com

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 8 / 21

Page 9: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Model construction: population

Source: http://timyeo.wordpress.com/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 9 / 21

Page 10: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Model construction: data enrichment (aka left join)

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 10 / 21

Page 11: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

But subject data is often messy...

Source: http://arquitectolegista.com.ar/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 11 / 21

Page 12: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

... and contains temporal data...

Source: http://thirdorderscientist.org/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 12 / 21

Page 13: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

... that is difficult to fit in a box (table)

Source: http://cutestcatpics.com/cat-trying-to-fit-into-tiny-box/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 13 / 21

Page 14: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

We have a whole dataset per subject!

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 14 / 21

Page 15: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

... and a model per subject?

Source: http://www.unc.edu/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 15 / 21

Page 16: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

(Most) models are sophisticated summaries of data

Source: http://xkcd.r-forge.r-project.org/

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 16 / 21

Page 17: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Do you seek α? Build a model per stock!

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 17 / 21

Page 18: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Fitting a million models in the nineties was all of anachievement (for some)

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 18 / 21

Page 19: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Beyond recency and frequency: self exciting processes

Source: Bursting transition in a linear self-exciting point process, Onaga, T. et al

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 19 / 21

Page 20: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

One logistic regression per Gmail user...

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 20 / 21

Page 21: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta

Challenges: statistical, computational,... and more!

This approach faces many challenges:

1 Computational: how do you fit so many models? (but Spark rocks!)2 Statistical: how do you...

1 perform variable selection?2 evaluate the fit?3 deal with outliers?4 ...

3 And finally, how do you sell these approaches to business people(ex-Google)?

Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 21 / 21