6
Deep Learning For Prac//oners Lecture 2: Which applica/ons benefit from deep learning? Anantharaman Narayana Iyer [email protected] 17 th June 2014 Note: Notes that contain code examples for these slides and detailed analysis will be published separately later.

Deep Learning For Practitioners, lecture 2: Selecting the right applications deep learning

  • Upload
    ananth

  • View
    168

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning For Practitioners,  lecture 2: Selecting the right applications deep learning

Deep  Learning  For  Prac//oners    Lecture  2:  Which  applica/ons  benefit  from  

deep  learning?  

Anantharaman  Narayana  Iyer    [email protected]  

17th  June  2014  

Note:  Notes  that  contain  code  examples  for  these  slides  and  detailed  analysis  will  be  published  separately  later.  

Page 2: Deep Learning For Practitioners,  lecture 2: Selecting the right applications deep learning

Review  of  previous  lecture  •  Deep  learning  as  a  major  machine  learning  discipline  

has  received  phenomenal  aNen/on  of  late  due  to:  –  Breakthrough  results  reported  by  the  research  

community    for  certain  class  of  applica/ons,  beNering  the  current  state  of  the  art  

–  Substan/al  investments  by  technology  companies  such  as:  Google,  Facebook,  MicrosoU,  IBM    

•  While  there  is  no  single  unique  architecture,  deep  networks  are  typically  built  using  some  variant  of  Autoencoders  or  Restricted  Boltzmann  Machines  with  key  characteris/cs  of:  –  Deep  architecture:  Mul/ple  layers  performing  

complex,  nonlinear  computa/ons,  cascading  the  layerwise  outputs.  

–  Automated  feature  extrac/on:  each  layer  produces  as  its  output  an  abstracted  form  of  its  inputs  (e.g.  Edges  from  raw  pixels).  One  may  add  a  classifier  layer  (e.g  SVM)  on  top  of  the  abstracted  features  and  can  view  the  classifica/on  as  being  done  on  the  most  abstract  features  automa/cally  generated  by  the  system.  (An  example  with  code  illustrated  in  the  next  lecture)  

Page 3: Deep Learning For Practitioners,  lecture 2: Selecting the right applications deep learning

Looking  through  the  prac//oner’s  prism  

•  To  address  real  world  problems,  prac//oners  need  to  be  aware  of  where  deep  learning  yields  best  results,  prac/cal  considera/ons,  limita/ons  and  when  not  to  use  it.  

 •  This  requires  looking  at  the  research  results  

and  other  claims  from  a  prac/cal  perspec/ve  and  stay  clear  of  common  misconcep/ons.  

Page 4: Deep Learning For Practitioners,  lecture 2: Selecting the right applications deep learning

“If  all  you  have  is  a  hammer  everything  looks  as  a  nail”  •  Deep  learning  has  proved  its  poten/al  in  some  applica/on  domains  (e.g.  

Computer  Vision,  Speech  recogni/on),  holds  early  promise  in  several  other  areas  (e.g  Natural  Language  Processing)  but  this  is  not  a  universal  tool  to  provide  the  best  result  for  “any”  AI  task.  

•  When  does  it  have  the  poten/al  to  perform  best?  –  When  structure  of  the  problem  being  solved  naturally  maps  to  a  mul/  layer  

architecture  •  If  the  problem  we  are  trying  to  solve  can  be  decomposed  in  to  processing  hierarchical  

abstract  features  and  these  features  are  derivable  from  the  input  data  through  a  set  of  poten/ally  nonlinear  transforma/ons,  deep  learning  based  solu/on  might  be  effec/ve.    

•  As  a  corollary,  problems  that  don’t  exhibit  a  mul/  layer  structure  may  not  see  much  incremental  benefit  compared  to  tradi/onal  methods  

–  Data  availability  •  While  tradi/onal  architectures  require  expert  designed  features,  deep  learning  systems  

automa/cally  learn  these  features,  given  the  raw  input.  •  In  order  to  learn  the  features,  extensive,  unsupervised  pretraining  using  large  volumes  of  

data  is  oUen  required.  Hence  any  advanced  solu/on  based  on  deep  learning  is  likely  to  require  availability  of  such  data.  

Page 5: Deep Learning For Practitioners,  lecture 2: Selecting the right applications deep learning

“More  data  or  beNer  models?”  •  Data  Vs  Algorithm:  research  shows  that  

training  a  system  with  more  data,  the  performance  asympto/cally  approaches  same  levels  regardless  of  the  model.  

•  One  may  be  led  to  believe  that  shallow  networks,  trained  with  huge  data  might  equal  the  performance  of  deep  networks.    –  Unfortunately,  much  of  the  available  data  in  the  

web  is  unlabeled  and  without  an  effec/ve  unsupervised  training  model,  the  data  is  not  useful.  Deep  networks  with  unsupervised  pretraining  phase,  can  leverage  the  data  beNer.  

•  Another  no/on  could  be  that  any  algorithm  or  model  selec/on  for  a  deep  network  is  good  enough  if  you  give  it  a  huge  volume  of  data.  –  Choosing  an  op/mal  algorithm  and  design  is  

very  cri/cal  as  deep  networks  are  resource  heavy  due  to  mul/ple  layers  and  weights.  A  good  intui/on  on  the  problem  structure  is  important  to  make  right  choices  of  the  model.  

Page 6: Deep Learning For Practitioners,  lecture 2: Selecting the right applications deep learning

Automated  Feature  Learning  and  data  preprocessing  Though  deep  learning  systems  extract  features  automa/cally,  the  task  of  data  preprocessing  is  s/ll  non-­‐trivial.  

–  The  input  data  should  be  complete  enough  so  that  the  features  relevant  for  the  given  problem  can  be  extracted.  •  Consider  the  example  of  detec/ng  anomalies  in  the  opera/on  of  a  nuclear  reactor.  The  

input  to  be  given  to  a  deep  learning  system  should  include  signals  from  all  the  relevant  sensors  and  missing  any  of  them  may  result  in  inadequate  performance  

–  The  op/mum  size  of  the  input  data  adequate  for  the  job  needs  to  be  determined.  •  Suppose  we  need  to  perform  face  detec/on,  given  the  input  images.  What  should  be  the  

right  input  size?  Should  it  be  10  x  10  or  100  x  100  pixels?  High  dimensionality  increases  the  model  parameters  substan/ally,  requiring  more  compute  resources.  

–  Input  vector  representa/on  must  be  determined  •  E.g,  for  an  NLP  problem,  words  from  a  vocabulary  V  may  be  represented  in  “one-­‐hot”  form  

where  each  word  in  V  is  represented  by  a  posi/on.  Here,  the  number  of  features  for  a  given  word  w  equals  the  size  of    the  vocabulary  |V|  and  a  sentence  with  k  words  will  be  represented  as  k  *  |V|  sized  input  vector.  When  the  size  of  vocabulary  becomes  large  (say  over  10000  words),  this  representa/on  increases  the  dimensionality  substan/ally.  

–  For  many  problems,  data  cleaning  and  preprocessing  are  s/ll  required  •  E.g.  For  many  NLP  problems,  beNer  performance  may  be  obtained  easier  through  some  

preprocessing  steps  (such  as  stopword  removal,  stemming  etc)  rather  than  lehng  the  deep  learning  system  handle  the  data  in  its  raw  form.