12
Basis Technology – Human Language Technology Conference 2012 1 Clouds, Search or HLT The 'forecast'? Benson Margulies Executive Vice President and Chief Technology Officer

A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Embed Size (px)

DESCRIPTION

What’s all this cloud stuff, anyway? What kinds of problems do organizations set out to solve with ‘a cloud,’ or even ‘the cloud’? What are a few of the major government initiatives involving this technology? How does HLT in general, and Search in particular, fit? This talk will take a tour of the technology behind clouds and the sometimes-foggy ambitions of the projects that use them, and look in particular detail at the challenges of applying cloud technologies to Text Analytics. View more slides from the Human Language Technology Conference 2012 here: http://info.basistech.com/hlt-2012-slides

Citation preview

Page 1: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 1

Clouds, Search or HLT The 'forecast'?

Benson Margulies Executive Vice President and Chief Technology Officer

Page 2: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 2

Clouds, Search or HLT The 'forecast'?

Page 3: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 3

Meteorology - or - Why Clouds

•  Lie  on  the  grass  and  look  up  at  the  clouds  •  Everyone  sees  something  different  

•  Computerized  Clouds  are  no  different  •  Applica;ons  Always  Available  •  Data  Always  Available  •  Tools  for  Processing  Big  Data  

Page 4: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 4

Big Data and Clouds =~ Hadoop

•  It's  not  just  a  maFer  of  size  •  Hadoop  ...  

o  Takes  in  structured  data  sets  o  Op;mizes  stateless,  batch  processes  o  Moves  computa3on  to  data  

•  All  of  which  is  great  if  that's  what  you  have  •  The  world  is  more  complicated  than  that  

Page 5: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 5

What it Doesn't Do So Easily

•  On-­‐the-­‐fly  (non-­‐batch)  processing  •  Stateful,  non-­‐local,  processing  •  For  example,  consider  a  search  engine  

o  All  about  online:  a  document  arrives,  users  want  to  find  it.  

o  All  about  global  state:  relevancy  involves  global  data  across  the  whole  index.  

Page 6: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 6

More on Search-in-a-Cloud

•  Good  News:  'conven;onal'  technologies  scale  to  very  large  indices.  o  Solr  o  SolrCloud  o  Elas;c  Search  o  ...  

•  How?  Shards.  o  'hash'  to  split  docs  o  queries  go  everywhere  

Page 7: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 7

Search-in-a-Cloud less good news

•  Alterna;ves  are  s;ll:  o  Limited  o  Research  o  or  both  

•  Solandra  o  Scaling  via  Cassandra  o  'just  another  sharded  solu;on'  o  Just  the  thing  if  you  like  Cassandra  

•   or  Accumulo  o  So  far,  very  basic  inverted  index  o  beFer  things  coming  

Page 8: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 8

Other HLT tasks ...

•  'Extrac;on'  is  'straighZorward'  •  Text  comes  in,  en;;es  or  rela;onships  come  

out.  •  Results  end  up  in  graph  DB  or  bigtable  or  ...  •  Scale  via  Hadoop  or  whatever  •  The  Challenge  of  Mixing  and  Matching  •  But  ...  what  if  you  want  a  feedback  loop?  

Page 9: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 9

Interoperation

•  Lot's  of  focus  on  applica;ons  o  e.g.  Ozone  Widgets  

•  Not  so  much  on  backend  processes  • What  good  is  'data  everywhere'  if:  

o  you  can't  deploy  processing  to  exploit  it?  o  you  can't  fit  together  pieces  of  the  puzzle?  

•  A  stovepipe  in  a  cloud  is  s;ll........  •  A  stovepipe  

Page 10: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 10

Harder Unstructured Problems

•  Imagine  you  wanted  to  cluster  ...  •  New  items  show  up  •  Need  to  find  'best'  exis;ng  cluster  

o  It  could  be  'anywhere'  

•  Need  to  update  to  reflect  each  new  item  •  (If  you're  wondering  what  we're  clustering  ...)  

Page 11: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 11

Rosette Concrete Examples

•  Straight  Search  o  RoseFe  Solr  Plugins  work  all  the  same  o  SolrCloud  hashes/shards  o  RoseFe  runs  on  the  target  node  

•  Extrac;on  and  similar  processes  o  Same  story,  using  Update  Request  Processor  

Page 12: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference

Basis Technology – Human Language Technology Conference 2012 12

Rosette and Hadoop

•  Stateless  APIs  lead  to  simple  implementa;on  •  Non-­‐code  resources  lead  to  some  issues  •  Stateful  processes  (e.g.  RNI)  ...  back  to  Solr