Upload
basis-technology
View
210
Download
0
Tags:
Embed Size (px)
DESCRIPTION
What’s all this cloud stuff, anyway? What kinds of problems do organizations set out to solve with ‘a cloud,’ or even ‘the cloud’? What are a few of the major government initiatives involving this technology? How does HLT in general, and Search in particular, fit? This talk will take a tour of the technology behind clouds and the sometimes-foggy ambitions of the projects that use them, and look in particular detail at the challenges of applying cloud technologies to Text Analytics. View more slides from the Human Language Technology Conference 2012 here: http://info.basistech.com/hlt-2012-slides
Citation preview
Basis Technology – Human Language Technology Conference 2012 1
Clouds, Search or HLT The 'forecast'?
Benson Margulies Executive Vice President and Chief Technology Officer
Basis Technology – Human Language Technology Conference 2012 2
Clouds, Search or HLT The 'forecast'?
Basis Technology – Human Language Technology Conference 2012 3
Meteorology - or - Why Clouds
• Lie on the grass and look up at the clouds • Everyone sees something different
• Computerized Clouds are no different • Applica;ons Always Available • Data Always Available • Tools for Processing Big Data
Basis Technology – Human Language Technology Conference 2012 4
Big Data and Clouds =~ Hadoop
• It's not just a maFer of size • Hadoop ...
o Takes in structured data sets o Op;mizes stateless, batch processes o Moves computa3on to data
• All of which is great if that's what you have • The world is more complicated than that
Basis Technology – Human Language Technology Conference 2012 5
What it Doesn't Do So Easily
• On-‐the-‐fly (non-‐batch) processing • Stateful, non-‐local, processing • For example, consider a search engine
o All about online: a document arrives, users want to find it.
o All about global state: relevancy involves global data across the whole index.
Basis Technology – Human Language Technology Conference 2012 6
More on Search-in-a-Cloud
• Good News: 'conven;onal' technologies scale to very large indices. o Solr o SolrCloud o Elas;c Search o ...
• How? Shards. o 'hash' to split docs o queries go everywhere
Basis Technology – Human Language Technology Conference 2012 7
Search-in-a-Cloud less good news
• Alterna;ves are s;ll: o Limited o Research o or both
• Solandra o Scaling via Cassandra o 'just another sharded solu;on' o Just the thing if you like Cassandra
• or Accumulo o So far, very basic inverted index o beFer things coming
Basis Technology – Human Language Technology Conference 2012 8
Other HLT tasks ...
• 'Extrac;on' is 'straighZorward' • Text comes in, en;;es or rela;onships come
out. • Results end up in graph DB or bigtable or ... • Scale via Hadoop or whatever • The Challenge of Mixing and Matching • But ... what if you want a feedback loop?
Basis Technology – Human Language Technology Conference 2012 9
Interoperation
• Lot's of focus on applica;ons o e.g. Ozone Widgets
• Not so much on backend processes • What good is 'data everywhere' if:
o you can't deploy processing to exploit it? o you can't fit together pieces of the puzzle?
• A stovepipe in a cloud is s;ll........ • A stovepipe
Basis Technology – Human Language Technology Conference 2012 10
Harder Unstructured Problems
• Imagine you wanted to cluster ... • New items show up • Need to find 'best' exis;ng cluster
o It could be 'anywhere'
• Need to update to reflect each new item • (If you're wondering what we're clustering ...)
Basis Technology – Human Language Technology Conference 2012 11
Rosette Concrete Examples
• Straight Search o RoseFe Solr Plugins work all the same o SolrCloud hashes/shards o RoseFe runs on the target node
• Extrac;on and similar processes o Same story, using Update Request Processor
Basis Technology – Human Language Technology Conference 2012 12
Rosette and Hadoop
• Stateless APIs lead to simple implementa;on • Non-‐code resources lead to some issues • Stateful processes (e.g. RNI) ... back to Solr