34
Demys&fying Data Science with and Intro to Machine Learning

Demystifying Data Science with an introduction to Machine Learning

Embed Size (px)

DESCRIPTION

Demystifying data science is the slide deck to accompany @brightsparc presentation to SEEK.

Citation preview

Page 1: Demystifying Data Science with an introduction to Machine Learning

Demys&fying  Data  Science  

with  and  Intro  to  Machine  Learning  

Page 2: Demystifying Data Science with an introduction to Machine Learning

Data  science  is  everywhere  

Page 3: Demystifying Data Science with an introduction to Machine Learning

Sexiest  job  in  21st  century*  

 McKinsey  Global  Ins&tute  report  es&mates  that  by  2018,  “the  United  States  alone  could  face  a  shortage  of  140,000  to  190,000  people  with  deep  analy&cal  skills  as  well  as  1.5  million  managers  and  analysts  with  the  know-­‐how  to  use  the  analysis  of  big  data  to  make  effec&ve  decisions”  

Source:  Harvard  business  Review  Oct’  2012  

 

Page 4: Demystifying Data Science with an introduction to Machine Learning

So  what  is  Data  Science?  

Page 5: Demystifying Data Science with an introduction to Machine Learning

Source:  Hilary  Mason  ex-­‐Chief  data  science  bit.ly    

Page 6: Demystifying Data Science with an introduction to Machine Learning

Who  are  these  unicorns?  

Page 7: Demystifying Data Science with an introduction to Machine Learning

Bit  about  me  

@brightsparc  

Page 8: Demystifying Data Science with an introduction to Machine Learning

I  thought  it  was  all  about  stats?  

Page 9: Demystifying Data Science with an introduction to Machine Learning

It’s  a  broader  skillset  

Source:  h[p://blogs.wsj.com/cio/2014/02/14/it-­‐takes-­‐teams-­‐to-­‐solve-­‐the-­‐data-­‐scien&st-­‐shortage/  

Page 10: Demystifying Data Science with an introduction to Machine Learning

Data  science  pipeline  

Source:  h[p://cacm.acm.org/blogs/blog-­‐cacm/169199-­‐data-­‐science-­‐workflow-­‐overview-­‐and-­‐challenges/fulltext  

Page 11: Demystifying Data Science with an introduction to Machine Learning

Where  does  Kaggle  fit  it?  

   

Degree  breakdown  in  top  100   Areas  of  study  

Page 12: Demystifying Data Science with an introduction to Machine Learning

What’s  the  deal  with  big  data?  

Page 13: Demystifying Data Science with an introduction to Machine Learning

Apache  Hadoop  Ecosystem  

Page 14: Demystifying Data Science with an introduction to Machine Learning

It’s  like  Map  Reduce  you  know  

Page 15: Demystifying Data Science with an introduction to Machine Learning

So  what  about  machine  learning?  

Pioneer  in  machine  learning,  created  a  checkers  game  that  played  itself  

“Give  machines  the  ability  to  learn  without  explicitly  programming  them.”  Arthur  L.  Samuel  (1959)  

Page 16: Demystifying Data Science with an introduction to Machine Learning

Types  of  algorithms  

Page 17: Demystifying Data Science with an introduction to Machine Learning

Some  examples  

Page 18: Demystifying Data Science with an introduction to Machine Learning

Machine  learning  process  

Page 19: Demystifying Data Science with an introduction to Machine Learning

Build  a  model  

Underfit   Overfit  

Linear  Regression  Solve  for  values  of  θ  in  the  Hypothesis  func&on    hθ(x)  

Page 20: Demystifying Data Science with an introduction to Machine Learning

Gradient  descent  algorithm  

Minimize  cost  func&on  which  is  ½  of  average  square  error  of  predic&on  vs.  the  training  data.  

Page 21: Demystifying Data Science with an introduction to Machine Learning

Demo:  House  prices  

Page 22: Demystifying Data Science with an introduction to Machine Learning

Cross  valida&on  –  split  training/test  

Page 23: Demystifying Data Science with an introduction to Machine Learning

Supervised  learning  model  

Page 24: Demystifying Data Science with an introduction to Machine Learning

Recommender  systems  

Collabora&ve  filtering  –  predict  ra&ngs  for  similar  items  given  other  users  behavior  

Page 25: Demystifying Data Science with an introduction to Machine Learning

Collabora&ve  filtering  method  

Source:  h[p://cran.r-­‐project.org/web/packages/recommenderlab/vigne[es/recommenderlab.pdf  

Page 26: Demystifying Data Science with an introduction to Machine Learning

Similar  users  based  on  distance  

Manha[an  distance   Euclidian  distance  

Page 27: Demystifying Data Science with an introduction to Machine Learning

Demo:  Music  recommender  system  

Pearson  Correla&on  Coefficient    

Page 28: Demystifying Data Science with an introduction to Machine Learning

Visualiza&on  frameworks  

Tableau  

D3.js   Processing  

Raphaël.js  

Page 29: Demystifying Data Science with an introduction to Machine Learning

What  about  online  experimenta&on?  

Page 30: Demystifying Data Science with an introduction to Machine Learning

What  will  the  future  look  like  

•  Online  collabora&on  

•  Open  Data  

Page 31: Demystifying Data Science with an introduction to Machine Learning

Next  gen  distributed  compu&ng  

100x  faster  in  memory,  and  10x  faster  even  when  running  on  disk.  

Page 32: Demystifying Data Science with an introduction to Machine Learning

Deep  learning,  a  new  fron&er?  

Geoffrey  Hinton  @Google  

Page 33: Demystifying Data Science with an introduction to Machine Learning

How  can  I  get  started?  •  MOOCs  –  Coursera  Machine  Learning    (Andrew  Ng  -­‐  Stanford)  

–  Learning  from  Data  (Abu-­‐Mostafa  -­‐  Caltech)  

•  Other  references  –  Collec&ve  Intelligence  – Mining  of  massive  data  sets  –  Open-­‐Source  Data  Science  Masters  

•  Frameworks  –  Python  –  Scikit  learn  –  Java  –  WEKA  and  Cascading  

Page 34: Demystifying Data Science with an introduction to Machine Learning

Ques&ons