29
The Case for Lucene/Solr: A Manager’s Guide to Real World Open Source Search Applications By Lucid Imagination

The Case for Lucene/Solr:

  • Upload
    lammien

  • View
    227

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Case for Lucene/Solr:

                                                   

               

The  Case  for  Lucene/Solr:    A  Manager’s  Guide    to  Real  World    Open  Source    Search  Applications          By  Lucid  Imagination    

Page 2: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page ii

 Abstract  In  today’s  information-­‐driven  environment,  search  is  a  critical  solution  to  problems  when  it  slashes  the  time  and  effort  separating  end  users  from  the  data  they  value.  Search  spans  the  range  of  business  models  and  use  cases—from  driving  direct  customer  sales,  to  analytics  and  business  intelligence,  employee  productivity,  and  reduced  administrative  overhead.  Making  the  best  use  of  search  requires  two  perspectives:  both  a  look  at  the  business  requirements  for  a  search  application  and  a  view  to  new  business  opportunities  created  by  using  search  to  leverage  the  organization’s  content  resources.      Thousands  of  organizations  across  different  sectors  and  business  models  have  harnessed  Apache  Lucene/Solr  to  search  their  rapidly  growing  and  diversifying  content  resources.  Underlying  this  broad  adoption  is  the  extraordinary  power,  scalability,  and  versatility  of  open  source  search  technologies.      This  paper  provides  an  overview  of  both  the  requirements  and  the  opportunities  for  search  applications.  It  then  explores  how  real  world  organizations  are  successfully  using  Lucene/Solr  search  applications  to  meet  those  opportunities,  presenting  how  the  technology  is  used  for  specific  business  models  and  use  cases  across  industries.  In  addition,  it  offers  a  baseline  for  setting  search  requirements  that  managers  and  architects  can  use  to  adopt  Lucene/Solr,  and  adapt  this  open  source  search  technology  to  the  unique  needs  of  their  business.                      ©  2010,  Lucid  Imagination  

Page 3: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page iii

 

Table  of  Contents  Introduction ............................................................................................................................................................... 1  Understanding  Search  Opportunities  and  Requirements ...................................................................... 2  

What  Data  and  Documents  Are  You  Searching? ................................................................................ 3  Who  Needs  the  Results  and  Why? ........................................................................................................... 3  Where  Is  Search  Integrated  with  IT  Infrastructure? ....................................................................... 5  How  Is  the  Search  Interface  Presented  to  the  User?........................................................................ 5  

The  Real  World:  Applications  and  Case  Studies ......................................................................................... 7  Yellow  Pages,  Local  Search,  and  Searching  Classifieds........................................................................ 8  Media .......................................................................................................................................................................10  E-­‐commerce..........................................................................................................................................................12  Job  and  Career  Sites ..........................................................................................................................................14  Libraries,  Archives,  and  Museums  (LAMs)  Search ..............................................................................16  Social  Media  Search...........................................................................................................................................18  Enterprise  (Intranet)  Search.........................................................................................................................21  

Business  Use  Case  Matrix ...................................................................................................................................23  Appendix:  Lucene/Solr  Features  and  Benefits..........................................................................................24    

Page 4: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 1

Introduction As  fast  as  companies,  communities,  and  consumers  produce  data—about  each  other,  products,  opinions,  research,  and  everything  else  imaginable—they  need  faster,  more  versatile  search  capabilities  to  find  the  information  they  need  to  create  opportunities  for  competitive  advantage.  In  today’s  information-­‐driven  environment,  search  addresses  the  critical  problems  created  by  the  explosive  growth  of  content  by  slashing  the  time  and  effort  users  expend  in  finding  data  they  value.  Search  spans  the  range  of  business  models  and  use  cases:  from  driving  direct  customer  sales,  to  analytics  and  business  intelligence,  employee  productivity,  and  reduced  administrative  overhead.    

Apache  Lucene/Solr1  open  source  search  technology  has  been  implemented  across  the  broadest  range  of  applications  and  business  models—and  likely  in  ways  that  can  fit  the  needs  of  your  organization.  In  successful  operation  today  at  thousands  of  enterprises,  Lucene/Solr  technology  scales  from  tens  of  thousands  to  hundreds  and  billions  of  documents;  searches  data  that  is  structured,  unstructured,  and  in  combination;  data  inside  and  outside  the  firewall;  and  ranges  in  use  from  a  simple  website  search  box  through  sophisticated  faceted  navigation.  It  addresses  equally  diverse  business  processes  and  mission  critical  applications.  Across  the  spectrum,  Lucene/Solr  helps  users  find,  make  sense  of,  and  act  upon  information  quickly  and  efficiently.  

In  this  white  paper,  we’ll  review  real-­‐world  case  studies  for  Lucene/Solr  functionality  across  business  sectors  to  demonstrate  its  versatility  and  varied  applicability.  The  diversity  of  examples  provides  strong  evidence  of  Lucene/Solr’s  flexibility  and  power  as  a  search  technology.  The  examples  also  attest  to  the  innovation  and  transparency  inherent  to  the  open  source  development  model.  Our  focus  is  on  familiarizing  the  audience  of  business  managers  and  application  owners  with  existing  Lucene/Solr  applications;  the  substantial  technical  advantages  to  developers  are  covered  elsewhere.                                                                                                                    1 Lucene and Solr are complementary technologies that offer very similar underlying capabilities; Solr is the Lucene Search Server. Since Lucene serves as the core of Solr’s search capabilities, this paper refers to the two as Lucene/Solr. For more information, see the Appendix.

Page 5: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 2

We’ll  first  survey  the  key  requirements  and  business  use  cases  of  search  and  then  look  at  where  they  are  built  into  search  applications.  Our  objective  is  to  provide  business  managers  and  application  owners  with  a  broad  perspective  on  how  Lucene/Solr  search  technology  is  used  to  build  solutions  to  compelling  business  problems.  In  the  Appendix,  we  provide  an  overview  of  Lucene/Solr’s  key  features  and  benefits,  with  a  basic  outline  of  the  capabilities  offered  to  meet  the  broadest  range  of  business  needs.    

Understanding Search Opportunities and Requirements Search  technology  has  come  a  long  way  from  its  roots  in  matching  keywords  with  appearance  in  documents  and  obtaining  undifferentiated  results.  Search  today  empowers  users  by  delivering  actionable  information  quickly  and  efficiently,  across  multiple,  diverse  sources  of  data.  The  business  use  cases  range  from  executing  mission  critical  commercial  transactions  (e.g.,  e-­‐commerce  sites)  to  unlocking  employee  and  end-­‐user  productivity  in  the  search  for  a  single  relevant  document  (e.g.,  enterprise  search).    

Given  the  breadth  of  capability  of  the  problem  domain,  it’s  useful  to  look  at  search  and  ask  two  fundamental  questions:  “How  it  can  it  solve  my  business  problems?”  and  “What  new  business  opportunities  can  search  solve  for?”  

In  considering  how  search  technology  solves  business  problems,  it  is  useful  to  start  with  an  elucidation  of  the  requirements  you’ll  need  to  consider  for  your  search  application.  At  the  same  time,  be  sure  to  look  more  broadly  at  the  capabilities  that  Lucene/Solr  offers,  as  it  can  help  open  up  new  frontiers  for  incorporating  search  and  leveraging  more  value  from  data  repositories.    

Starting  with  some  basic  questions—what,  who,  how,  and  where—you  can  clarify  the  high-­‐level  business  requirements  specific  to  your  business  needs,  which  in  turn  allow  you  to  make  the  best  decisions  for  your  search  application.  The  process  of  looking  at  the  fundamentals  also  raises  new  questions  about  how  and  where  the  search  technology  offered  by  Lucene  and  Solr  can  create  new  business  opportunities.  

Let’s  look  at  four  fundamental  questions  you  should  address  in  understanding  search  opportunities  and  requirements:  

• What  data  and  documents  are  you  searching?    • Who  needs  the  results  and  why?    • Where  is  search  integrated  with  IT  Infrastructure?        • How  is  the  search  interface  presented  to  the  user?    

Page 6: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 3

What Data and Documents Are You Searching? Business  today  is  driven  more  than  ever  by  the  end-­‐users’  creation  and  consumption  of  real-­‐time  information.  A  key  differentiating  capability  of  search  technology  is  ingesting  a  broad  range  of  content  types  and  processing  large  collections  of  diverse  data  in  real  time  in  order  to  deliver  actionable  information.  Two  aspects  to  consider:  

• Types  of  Content  Content  comes  in  multiple  formats:  HTML  pages,  XML  files,  PDFs,  images,  PowerPoint  presentations,  Excel  spreadsheets,  Word  documents,  log  files,  multimedia  content,  and  more.  Content  resides  in  various  repositories,  including  databases,  file  servers,  content  management  systems,  archiving  systems,  collaboration  applications,  and  employee  desktops  and  laptops.  Search  technology  must  be  able  to  locate,  organize,  and  aggregate  data  whatever  its  form  or  location.    

• Frequency  of  Updating  Content  Organizations  update  content  at  varying  intervals,  driven  by  differing  business  processes  and  models—social  media  or  news  applications  have  real-­‐time  content  need,  whereas  an  e-­‐commerce  application  might  re-­‐index  in  response  to  new  inventory  on  a  batch  basis  and  a  research  institution  might  add  to  its  collection  less  often  still.  Search  applications  need  to  be  adaptable  to  the  differences  in  content  change  frequency.  

Who Needs the Results and Why? Business  search  puts  a  high  priority  on  end  user  experience  and  results  in  which  the  searched  content  is  tuned  to  the  unique  needs  of  each  user.  Because,  after  all,  the  human  dimension—the  usefulness  of  results  and  the  efficacy  of  interaction—is  the  acid  test  of  a  search  application.  Internet  search  applications  like  Google,  Yahoo,  and  Bing  are  now  common  and  mature.  They  have  raised  user  expectations  about  key  qualities  of  the  search  experience...but  they  solve  a  very  different  problem.    

While  Internet  searches  can  produce  millions  of  results  in  milliseconds,  they  rely  on  measures  like  website  popularity  or  URLs  and  domain  names—not  relevant  and  not  generally  applicable  to  purpose-­‐built  applications  for  businesses.  What’s  more,  they  rely  on  generalizing  relevancy  for  a  global  population  of  all  Internet  users,  without  being  tied  to  business  rules,  or  business  process  logic,  or  the  opportunity  cost  of  improved  precision  for  a  specific  set  of  data  or  search  users.  

Business  search  applications  cannot  rely  on  such  brute  force  coarse  approaches  to  tune  their  results.  They  need  far  more  control  and  precision.  They  have  to  be  able  to  deliver  highly  useful  results  while  matching,  if  not  exceeding,  the  levels  of  user  experience  that  people  have  come  to  expect  by  virtue  of  their  daily  interactions  with  commercial  search  engines.  Key  points  of  consideration  from  a  business  perspective  are:  

Page 7: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 4

• Relevance  Relevance  is  entirely  a  factor  of  the  goals  of  the  search  application’s  users.  The  application  must  have  the  mechanisms  to  recognize  the  subjective  needs  of  users  and  tune  results  accordingly.  It  must  also  provide  easier  ways  to  narrow  search  criteria  without  requiring  users  to  come  up  with  perfect  query  terms.  Flexibility  for  drilling  deeper  will  make  results  richer  and  valuable.  Mechanisms  to  apply  filters,  proximity  values,  and  sorting  parameters  to  narrow  search  scope  can  also  lead  to  a  richer  set  of  more  useful  results,  with  less  time  and  effort.  

• Cost  of  Relevance    As  business  goals  are  driven  by  revenue  opportunities  and  cost  savings,  it  is  critical  to  tie  relevance  to  the  economics  of  the  business.  For  example,  a  public-­‐facing  retail  site  should  focus  on  matching  merchandise  to  search,  site  stickiness,  and  customer  loyalty.  It  requires  search  technology  that  streamlines  and  simplifies  the  shopping  experience  with  relevant  results  directly  contributing  to  sales  revenue.  For  knowledge  workers,  internal  search  applications  should  help  make  employees  more  productive  by  reducing  the  amount  of  time  and  effort  to  find  documents  they  need  to  do  their  jobs.  Multiple  studies  show  that  information  workers  can  spend  20–30%  of  their  time  searching  for  information.  

• Precision  Ranking  Result  accuracy,  sorted  by  attributes  like  relevance,  date,  field,  or  any  document  property  feature,  makes  the  search  process  better.  End  users  generally  abandon  a  search  before  tackling  the  fine  points  of  Boolean  logic  or  scrolling  for  a  result  buried  too  far  down.    

• Query  Response  Speed  Today,  5–7  seconds  is  the  typical  threshold  for  end-­‐user  patience.  Too  much  wait  time  for  search  results  frustrates  users,  and  causes  them  to  abandon  pages.  Fast,  relevant  results  cannot  be  limited  by  search  technology  hamstrung  by  data  influx  or  query  overload.  Query  response  time  should  also  work  hand-­‐in-­‐hand  with  the  refinement  of  multiple  search  attributes,  so  that  increasingly  complex  queries  do  not  extract  a  performance  penalty.  

Page 8: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 5

Where Is Search Integrated with IT Infrastructure? Useful,  valuable  search  technology  rarely  exists  in  isolation.  Searched  data  is  transformed  into  actionable  information  when  it  is  integrated  with  the  organization’s  information  infrastructure:  business  process  to  business  intelligence  to  content  management  systems.  A  robust  search  technology  must  be  customizable  to  integrate  with  the  existing  systems  seamlessly.    

• Application  Integration  A  key  requirement  for  a  search  application  is  its  extensibility  for  integration  with  existing  infrastructure  and  applications  like  content  management  systems,  databases,  and  the  full  range  of  business  processes  and  applications.  It  should  have  interfaces  that  support  ingestion  of  data  as  well  as  delivery  of  results  in  readily  consumable  formats—because  in  many  cases,  results  are  consumed  by  other  applications,  not  a  human.  

• Scalability  We  can  assume  that  data  will  change  and  grow.  So  scalability  is  a  key  factor  for  search  application.  Applications  should  grow  to  address  future  needs  without  penalties  for  the  breadth  of  data  or  for  the  count  of  documents  indexed.  The  search  application  should  be  able  to  grow  with  the  requirements  of  the  organization,  without  needing  additional  large  investments  in  hardware  to  match  the  pace  of  growth.  Proprietary  search  vendors  often  charge  for  search  by  the  number  of  documents  indexed.  In  a  world  where  constantly  expanding  content  growth  is  the  norm,  such  costs  can  be  a  real  and  substantial  drag  on  the  cost  of  ownership  for  search  applications,  many  times  resulting  in  negative  return.    

• Security  Every  organization  has  its  own  security  requirements  and  access  controls.  Search  technologies  need  to  comply  with  the  security  policies  of  the  enterprise,  controlling  results  that  have  restricted  access.  The  search  technology  should  also  be  able  to  make  use  of  document-­‐level  security  from  other  sources.    

How Is the Search Interface Presented to the User? The  user  interface  is  where  search  delivers  on  findability  and  presents  actionable  results.  The  search  application  is  only  as  good  as  the  convenience  of  submitting  queries,  reviewing  and  refining  results,  and  finding  information.  Key  aspects  to  consider:    

Page 9: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 6

• Navigation  Users  benefit  from  guidance  that  makes  their  queries  more  productive.  Techniques  such  as  faceted  search  with  result  clustering,  advance  hinting  (“did  you  mean”),  “more  like  this,”  and  drop  down  menus  for  setting  search  scope  help  users  achieve  desired  results  faster,  making  a  search  application  both  user-­‐  and  information-­‐friendly.  It  is  also  important  to  allow  users  to  draw  associative  connections  between  results—using  the  technology  to  uncover  relationships  and  discover  more  about  what  they  were  seeking  than  they  knew  at  the  outset.    

 

The  NetFlix  search  application  is  powered  by  Solr;  it  adds  the  fuzzy  dimension  to  search,  with  auto-­completion  of  movie  names,  correction  of  misspelled  names  of  actors,  and  suggests  titles  closest  to  the  query.  As  a  result,  85%  of  users  have  found  the  movie  they  were  looking  for  ranked  at  the  #1  spot  in  the  results.    

 • Discovery  

Search  application  functionality  should  extend  beyond  the  generic  presentation  of  a  result  list  of  documents  that  contain  a  keyword.  Highlighting  keywords  in  searched  results,  expanding  searches  with  synonyms  and  spell  checking,  and  offering  users  ways  to  learn  a  bit  more  about  documents  in  the  results  without  having  to  load  the  document  are  great  ways  to  significantly  improve  usability.    

 • Intuitive  Intelligence  

Search  applications  must  go  beyond  keyword  search  to  help  users  retrieve  accurate  information  even  when  they  are  not  sure  of  the  best  keywords.  Additionally,  they  should  reduce  misinterpretations  where  homonyms,  spelling  errors,  and  ambiguous  keywords  are  involved  (e.g.,  is  “apple”  a  fruit  or  a  computer  company?).  

Page 10: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 7

The Real World: Applications and Case Studies With  an  understanding  of  the  fundamentals  of  search  business  applications  in  hand,  it  is  helpful  to  gain  additional  context  on  business  usage  through  a  survey  of  organizations  that  have  successfully  used  Lucene/Solr  for  powerful  search  applications.    All  of  these  cases  were  built  on  the  capability  of  Lucene/Solr  to  provide  innovative,  high-­‐performance,  cross-­‐platform,  feature-­‐rich  search  technology  suitable  for  nearly  every  application.  By  powering  diverse  search  applications  for  thousands  of  organizations  such  as  AT&T,  Zappos,  McClatchy,  Smithsonian,  MTV  Networks,  LinkedIn,  MySpace,  Comcast,  Monster,  Netflix,  and  many  more,  Lucene/Solr  has  provided  mission  critical  capability  that  turns  search  into  a  robust  competitive  advantage.    For  these  organizations,  Lucene/Solr  solutions  regularly  index  and  search  hundreds  of  millions  of  documents  with  subsecond  response  time,  unencumbered  by  costly  licensing  or  vendor  lock-­‐in.  Together  they  represent  a  compelling  argument  for  the  broad  applicability  of  Lucene/Solr  across  the  full  range  of  business  opportunities  and  search  needs.  Business  use  case  studies  we’ll  review  include:  

• Yellow  Pages,  Local  Search,  and  Searching  Classifieds  • Media  • E-­‐commerce    • Job  and  Career  Sites    • Libraries,  Archives,  and  Museums  (LAMs)  Search    • Social  Media  Search    • Enterprise  (Intranet)  Search    

Page 11: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 8

Yellow Pages, Local Search, and Searching Classifieds In  the  business  of  online  local  search,  geographic-­‐based  (location)  relevance  generates  competitive  advantage.  Online  directories  need  to  provide  a  rich,  interactive  search  experience  to  users  to  increase  site  views  and  stickiness,  which  in  turn  translates  into  increased  advertising  revenue.  Simplified  location-­‐based  search,  intuitive  faceted  query  response,  and  data  mashups  are  a  few  features  that  define  search  functionality  for  an  online  directory.  

Lucene/Solr  solutions  offer  accurate  search  results,  factoring  in  location,  users’  reviews,  and  ratings,  alongside  paid  advertising.  By  taking  advantage  of  Solr’s  open  source  model—with  search  algorithms  that  are  completely  transparent—companies  can  invest  in  configuring  their  search  solutions  to  match  their  business  logic,  rather  than  trying  to  infer  or  pay  for  exposure  proprietary  back-­‐end  logic.    

   

 

 

 Success  Stories  

• YP.com,  a  division  of  AT&T  Interactive  • Zvents.com,  local  event  search  service    • Yelp.com,  the  community  local  search  site  

 

 

 

 1The  Kelsey  Group’s  Global  Print  Yellow  Pages,  Internet  Yellow  Pages  and  Local  Search  Five  Year  Outlook  

Requirements    

• Intelligent  results  going  beyond  keyword  search  

• Deeper,  faceted  navigation  

• Seamless  integration  with  latest  Web  2.0  tools  

• Lower  IT-­‐related  costs  • Geocentric  user  

experience  • Search  numeric  values    

Solr  Solution  

• Customizable  Search  Index  which  can  be  tuned  transparently  to  account  for  key  findability  drivers  

• Drop  down  filters  for  narrowing  or  widening  the  scope  of  search  

• Seamless  integration  with  existing  technologies  

• Native  numeric  encoding  and  search  capabilities  

• Reduced  server  footprint  for  lower  TCO  than  most  commercial  vendors    

   

Internet  Yellow  pages  and  local  online  search  is  forecast  to  grow  to  $27.8  billion  in  2011.  

The  Kelsey  Report1  

M

 

Page 12: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 9

         

Case  Study  1    

 

 

 

yp.com  by  AT&T  Interactive    

AT&T  Interactive  is  an  online  and  mobile  search  and  advertising  company.  Their  leading-­‐edge  portal,  yp.com—an  online  business  listing  and  advertising  site—was  originally  implemented  with  a  commercial  proprietary  search  application.  It  faced  issues  of  scalability,  vendor  lock-­‐in,  and  performance.  With  help  from  Lucid  Imagination,  AT&T  successfully  migrated  to  a  Solr-­‐based  search  solution  that  leveraged  the  flexibility  of  open  source  without  compromising  features  and  functionality.    And  they  did  so  with  a  much  smaller  budget.    

Business  Needs  

• Addressing  the  need  to  factor  in  location  to  support  geographic  search,  and  include  relevant  comments  • Striking  a  balance  between  organic  search  and  advertised  content  • Indexing  highly  unstructured  content  such  as  user  comments    • Increasing  relevancy  of  results  and  boosting  paid  search  results  for  preferential  placement  of  advertisers  • Linguistic  support  to  enable  search  experience,  such  as  spellchecking,  synonyms,  find-­‐similar,  etc.  • Integrating  with  latest  Web  2.0  tools  • Reducing  server  footprint  

 The  Solr  Solution    

• Context-­‐specific  relevancy,  geographic  proximity,  ad  placement,  and  user  comments  • Faceting,  drop  down  filters  to  narrow/widen  the  scope  of  search    • Functional  support  for  creating  new  features    • Spell-­‐correction,  and  location-­‐optimized  search  results  to  show  users  businesses  nearest  to  them  first  • Seamless  integration  with  many  Web  2.0  tools  to  create  innovative  features  and  mashups  • Lowers  TCO  by  reducing  the  number  of  search  servers  from  120  to  two  dozen  servers    

Page 13: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 10

 

Media Brand  reinforcement,  premium  content,  and  easy  accessibility  are  the  main  business  motivators  for  online  media  and  publishing  companies.  Relevant  information  improves  time  on  the  site  and  encourages  users  to  explore  related  content,  boosting  subscription  rates  and  site  views.  These  translate  into  a  virtuous  cycle  of  additional  revenue  generation.  Given  that  content  is  the  business,  the  need  for  a  robust  search  application  ties  directly  to  competitive  advantage.    

Lucene/Solr  provides  a  customized,  function  rich  solution  for  the  media  and  publishing  industry.  It  addresses  dynamic  challenges  of  content  diversity,  content  freshness,  and  content  acquisition  ,  and  gives  companies  a  platform  on  which    to  build  a  world-­‐class  innovative  search  experience  to  differentiate  themselves  in  a  highly  competitive  marketplace.    

 

 

 

 

       

 

Success  Stories  

• McClatchy  Newspapers  • Netflix    • Comcast  Interactive  • MTV  Networks,  a  division  of  Viacom  • The  Motley  Fool,  fool.com    • Fanfeedr.com,  personalized  sports  aggregator  

Requirements  

• Real-­‐time  indexing  of  petabytes  of  structured  and  unstructured  data    

• Deeper  search  capability  • Improved  query  

response  time  • Reduced    infrastructure  

and  customization  costs    Solr  Solution  • Reverse  indexing  • Intelligent,  faceted  search  

to  enable  contextual  and  linguistic  relevance  

• Easy  configuration  for  parsing  structured  and  unstructured  data  

• Easy  and  seamless  installation  for  lower  TCO  

• Customization  with  open  source  code    

 

“Solr  has  done  wonders  for  us.  It  is  easy  to  understand  and  deploy,  and  has  reduced  our  costs  drastically.”  

Doug  Steigerwald,    McClatchy  Interactive  

M

 

Page 14: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 11

 

 

Case  Study  2    McClatchy—Leading  Newspaper  Publisher  

The  third  largest  newspaper  publisher  in  the  United  States,  McClatchy  Company  owns  30  daily  newspapers  in  29  markets  across  the  country.  To  win  online,  McClatchy  knew  it  had  to  have  a  robust  search  solution,  to  empower  the  McClatchy  audience  with  the  information  they  wanted  and  secure  loyalty  from  readers  and  sponsorships  from  advertisers.  Working  with  Lucid  Imagination,  McClatchy  migrated  from  proprietary  search  software  to  open  source  and  chose  Solr  for  its  high  performance,  comprehensive  capabilities,  and  superior  value    

Requirements  • Proliferating  content  and  data  sources  (text,  videos,  audios,  images),  with  real-­‐time  

streaming    • Empowering  end  users  with  ease  of  use  • Supporting  peak  traffic  and  popular  search  spikes  with  consistent  performance  • Providing  scalability  for  a  database  growing  by  orders  of  magnitude  annually  • Providing  flexibility  to  support  customization  • Controlling  IT  costs  while  exceeding  performance  benchmarks  of  competition  

 The  Lucene/Solr  Solution    

• Deeper  content  by  indexing  both  structured  and  unstructured  data  in  real  time,  effortlessly  • Indexes  millions  of  documents,  with  search  results  delivered  in  milliseconds    • User-­‐friendly  navigation  with  drop  down  filters,  faceted  navigation,  linguistic  corrections,  

etc.      • Excellent  performance,  even  in  peak  hours,  by  load-­‐balancing  search  requests  across  servers    • Scalability  without  impact  on  performance    • High  degree  of  customization,  since  it’s  open  source  • Integration  with  existing  IT  infrastructure  and  eliminates  associated  license  fees  to  cut  costs  • 8-­‐fold  reduction  in  server  footprint    

Page 15: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 12

E-commerce    E-­‐commerce  businesses  must  provide  a  compelling  shopping  experience  in  order  to  maintain  brand  equity  and  thrive  in  a  very  highly  competitive  market  landscape.  By  reducing  the  time  and  effort  required  to  navigate  available  merchandise  and  find  what  they  want,  superior  search  contributes  directly  to  a  satisfying  buying  experience  for  customers.  Search  then  translates  directly  into  higher  revenues  and  customer  loyalty.  Instant  results,  intuitively  organized,  advanced  faceting  for  easy  browsing,  synchronizing  results  with  images,  and  integration  with  user  ratings  are  among  the  must  have  features  of  an  e-­‐commerce  search  application.  

Lucene/Solr  gives  companies  the  ability  to  build  their  sites  around  the  concept  of  “searchendizing”—putting  the  desired  merchandise  at  the  top  of  the  results  list—which  can  make  the  difference  between  sales  made  and  sales  lost.  Faceting,  database  integration,  real-­‐time  indexing,  and  query  monitoring  all  enable  users  to  find  products  they  want,  driving  conversion  rates  and  enabling  a  winning  online  experience.  2      

 

 

Success  Stories  

• Buy.com  • Sears.com  • Macys.com  • Zappos.com  • Advanceautoparts.com  • Dollardays.com  

                                                                                                                 2  “Consumers  will  spend  more  than  $340  billion  online  by  2013,  says  Forrester,”    Internet  Retailer,  27  November  2009,  http://www.internetretailer.com/dailyNews.asp?id=32630.  

Online  retail  sales  in  the  B2C  market  are  expected  to  reach  $340  billion  by  201321    

Forrester  Research  

Requirements  

• Multidimensional,  dynamic  search  

• Faster  results  • Real-­‐time  indexing  

of  products  • Faceting  and  

browsing  capabilities  

• Seamless  integration  with  existing  IT  infrastructure  

 

Solr  Solution  

• Faceted  search  for  deeper  drill  down  and  browsing    

• Intuitive  search  capabilities  for  cross-­‐channel  shopping  experience    

• System  administration  tools  for  data  loading,  index  replication,  monitoring,  logging,  and  cache  management  

• Query  monitoring  for  better  highlighting  of  popular  products    

 

Page 16: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 13

                 

Case  Study  3  

Zappos  

Zappos  is  the  premier  destination  for  online  shoe  shopping.  At  Zappos,  the  mission  is  excellent  online  customer  service—customers  should  be  able  to  browse  shoe  styles,  sizes,  shapes,  and  colors  more  easily  than  any  other  shoe  store,  on  or  offline.  To  achieve  this,  Zappos  wanted  a  robust,  flexible,  multifunctional  search  solution/application.  After  evaluating  many  commercial  search  technologies,  Zappos  zeroed  in  on  Solr,  working  with  Lucid  Imagination  to  ensure  continued,  successful  deployment.  

Requirements  

• Simplified,  attractive  user  experience  that  makes  it  easy  to  find  and  buy  • Relevant  results,  fast  • Navigation  across  attributes,  such  as  size,  color,  and  style  for  broader  and  deeper  results  • Indexing  products  as  they  were  entered  in  the  catalogs  • Cross-­‐functional  navigation  to  give  customers  a  realistic  shopping  experience  • Intuitive  intelligence  to  provide  alternate  suggestions  • Analytical  capabilities  to  drive  business  strategy  • Facilitating  control  on  results  • Integration  with  existing  IT  infrastructure  

 The  Solr  Solution  

• Search  results  in  subseconds,  across  categories  • Faceting,  for  easy  browsing  and  discovery  and  a  compelling  user  experience    • Real-­‐time  indexing  of  products  • Synchronization  of  visuals,  specs,  filters,  and  promotions  to  make  shopping  experience  true  to  life  • Information  on  user  activity  to  help  build  strategy  on  product  promotions  • Controls  to  rank    popular  or  high-­‐stock  products  in  results    where  users  are  more  likely  to  buy  them  • Facilitates  integration  with  heterogeneous  open  source  environment  

Page 17: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 14

   

Job and Career Sites  

Job  portals  are  countercyclical  to  the  economy.  When  the  economy  flourishes,  posted  jobs  grow  in  number;  when  it  sags,  candidates  flock  in  to  post  their  resumes.  Success  for  an  online  job  portal  is  tied  to  the  efficiency  of  its  search  capability—matching  résumés  to  job  listings  and  vice  versa—so  both  employers  and  prospective  employees  can  zero  in  on  just  the  right  opportunity.  

For  example,  an  employer  may  want  to  navigate  through  filters  to  narrow  the  scope  of  a  candidate  search,  such  as  education,  previous  employer,  salary  history,  skillsets,  etc.;  a  job  seeker  may  want  to  expose  these  attributes,  but  keep  a  current  employer’s  name  confidential.  A  job-­‐seeker  may  want  to  apply  to  jobs  within  a  particular  geographic  area.  

Lucene/Solr  not  only  provides  such  flexibility  but  also  addresses  other  complexities  of  this  industry  by  enabling  linguistic  intelligence  (such  as  identical  acronyms  that  correspond  to  different  entities;  variations  in  spelling,  imperfectly  constructed  search  queries);  indexing  unstructured  data  (résumés);  and  managing  ever-­‐growing  data.  

           

Success  Stories  

• Monster  • The  Big  Jobs  • eBharatJobs  • Careerjet  

Requirements  

• Linguistic  intelligence  for  more  relevant  results  

• Control  search  results  to  maintain  privacy  

• Deeper  search  capability  

• Numeric  search  • Faster  query  

response  • Reduced  

infrastructure  and  customization  costs    

Solr  Solution  • Intelligent,  faceted  

search  to  enable  contextual  and  linguistic  relevance  

• Easy  configuration  for  parsing  structured  and  unstructured  data  

• Easy  and  seamless  installation  for    lower  TCO  

• Business  process  integration  and  Customization  with  open  source  code      

 

“I  think  the  breakthrough  was  when  we  tried  it,  and  we  realized,  wow,  this  thing  could  really  scale.”    

Peter  Keegan,  Monster.com  

M

 

Page 18: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 15

     

Monster.com  

Monster  is  the  largest  job  search  engine  in  the  world,  with  over  a  million  jobs  posted  at  any  one  time.  By  2008  it  had  150  million  résumés  in  its  database,  serving  over  63  million  job  seekers  per  month,  now  running  on  average  300  to  400  queries  per  second  with  an  average  response  time  of  40  milliseconds.  To  provide  the  highest  level  of  service  and  support  to  their  customers—both  employers  and  job  seekers—Monster  has  an  unmatched  marketplace  for  employment  opportunities,  with  Lucene-­‐based  search  at  the  heart  of  its  business  model.    The  Requirements    

• Managing  high  volumes  of  data,  continually  increasing  by  double  digit  percentages  annually  • Maintaining  constant  inventory  updates  and  providing  faster  results  • Removing  technological  barriers  that  limit  the  scope  of  information  • Enabling  end  users  to  refine  search  and  drill  deeper  without  any  performance  impact  • Providing  security  controls  to  ensure  end  user  privacy  • Facilitating  scalability  and  flexibility  in  tandem  with  company’s  vision  and  growth  plans  

 The  Lucene  Solution    

• High  volumes  of  data  by  clustering  data  to  reduce  the  index  size    • Real-­‐time  indexing  for  fresher,  faster  query  results    • Intuitive  search  to  enable  in-­‐depth  cross-­‐functional  job  and  résumé  browsing  • Faceted  search  and  ‘single  click’  filters  for  search  refinement    • Security  controls  to  manage  user  information  • Unlimited  scalability  and  customization  leveraging  open  source  licensing  

 

Page 19: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 16

Case  Study  4  Libraries, Archives, and Museums (LAMs) Search The  core  asset  of  educational  and  research  institutions  is  knowledge  archived  and  accumulated  over  decades.  In  the  world  of  academic  search,  the  diversity  of  information  for  any  query—text,  illustration,  audio/video  media,  or  data  in  any  other  format—makes  unstructured  formats  a  key  aspect  of  the  searchable  archive.    

Lucene/Solr  gives  academic  and  research  institutions  the  power  to  turn  information  into  knowledge  by  going  beyond  keyword-­‐driven  search  to  expose  a  rich  variety  of  results  and  exploration.  Based  on  the  open  source  model,  it  not  only  integrates  with  the  existing  IT  infrastructure  but  also  leverages  the  existing  classification  hierarchies  to  give  structure  to  terabytes  of  information  spread  across  disparate  collections,  significantly  reducing  overhead  and  enabling  flexible  and  scalable  deployment.  

               

Success  Stories  

• Smithsonian  Institute    • Europeana,  the  European  Union  online  cultural  archive  • The  US  Library  of  Congress  and  World  Digital  Library  • Stanford  University  Library  • University  of  Michigan  Graduate  Library  

       

Requirements    

• Management  of    multiple  formats  of  data  and  documents  

• Customization  and  scalability    

• Linguistic  support  in  queries    

• Faster  results    

Solr  Solution  

• Optimized  index  infrastructure  limits  size  without  compromising  speed  or  flexibility  

• Easy  customization  for  implementing  taxonomy  rules  

• Faceted  search  to  narrow  results  to  a  specific  source  across  diverse  sets  of  data  

• Instant  results  • Seamless  integration  

with  IT  infrastructure  for  lower  TCO  

   

“With  Solr,  you  can  do  so  many  things  without  writing  a  lick  of  code.  I  hadn't  realized  how  easy  it  is  to  extend  our  custom  request  handler,  response  writer,  and  update  handler.  Just  move  it  all  to  Solr  and  let  it  do  the  heavy  lifting.”  

Sjored  Siebinga,  Europeana  

Page 20: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 17

 

 Case  Study  5          

Smithsonian  

The  Smithsonian  Institution  is  the  flagship  museum  collection  of  the  United  States,  supporting  a  research  institute  that  provides  “one-­‐stop”  searching  for  2  million  records,  including  nearly  a  quarter  of  a  million  media  files  (images,  media  files,  online  journals,  and  other  resources)  distributed  across  dozens  of  archives,  databases,  museums,  and  libraries.  To  make  this  treasure  of  information  easily  accessible  to  people,  the  Smithsonian  needed  an  efficient  search  solution  that  could  overcome  the  following  challenges:  

The  Challenges  

• Managing  a  complicated  taxonomy  that  could  no  longer  accommodate  a  growing  data  index  • Indexing  disparate  types  of  content,  including  documents,  videos,  and  images  • Making  information  available  from  a  large  database  • Providing  access  controls  to  restrict  information    • Integrating  with  existing  legacy  tools    

 Smithsonian  chose  Lucene/Solr,  and  worked  with  Lucid  Imagination  to  create  an  optimized,  well-­‐designed  solution.  

The  Solr  Solution  

• Efficient  index  strategy  to  manage  a  mix  of  structured  and  unstructured  data  • Holistic  search,  by  optimizing  configuration  to  reduce    the  number  of  servers  and  better  handling  query  

requests  • Filtering  information  through  faceted  search    • Access  controls  to  restrict  information  based  on  membership  profiles  • Integration  with  the  existing  IT  infrastructure  • Provides  guidance  and  assistance  on  setting  replicated  search  environment  

   

Page 21: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 18

Social Media Search Search  solutions  must  support  differentiated  business  models  matching  Web  2.0  innovations,  including  user-­‐generated  content  and  mashups,  without  compromising  scalability—a  challenge,  given  the  virtually  limitless  content  on  the  Internet.  Success  and  differentiation  is  measured  by  how  well  the  site  provides  relevant  results  to  grow  its  user  base  and  keeps  them  engaged.  

Increasingly,  the  technological  factors  driving  Web  2.0  application  paradigms  are  finding  their  way  into  the  enterprise,  unlocking  collaboration  and  productivity  in  new  ways  that  challenge  conventional  organizational  bounds—and  that  rely  in  equal  measure  on  search  to  create  the  connections  between  employees  to  enable  discovery,  cross-­‐pollination,  and  more  efficient  collective  effort.  

Lucene/Solr  not  only  provides  fast  results  but  also  facilitates  flexible,  intuitive  navigation  to  help  end  users  connect  with  others.  It  boosts  the  reach  and  performance  of  search,  while  cutting  implementation  costs  and  lowering  barriers  to  innovation.      

Success  Stories  

• Digg  • Myspace  • LinkedIn  • Reddit  • Technorati  • Scout  Labs  • Xmarks.com  

Requirements    

• Deliver  search  results  as  soon  as  content  is  available  

• Deeper  drill  down  capabilities  

• Intuitive  interface    

Lucene/Solr  Solution  

• Near-­‐instant  results  with  segmentable  indexing    

• Intuitive  search    • Data-­‐driven  

spellchecking  based  on  user  search  histories  

Linguistic  support  through  ‘Did  you  mean"  functionality    

Highlighting  keywords  • Deeper  drill  down  

with  faceting  • Real-­‐time  content  

updating  

 

 “With  Solr,  we  really  treat  it  as  kind  of  a  platform  where  we  can  build  other  kind  of  things  on  top  of  it…  We  have  a  very  valuable  set  of  data,  and  we  really  want  to  explore  new  ways  of  building  new  features  from  that  data  set.”  

—Sammy  Yu,  Digg.com  

Page 22: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 19

 

 Case  Study  6  

Digg.com  

Digg  displays  the  wisdom  of  the  crowds.  By  leveraging  the  mass  collaboration  of  readers  distributed  across  the  Internet—everything  on  Digg  is  submitted  by  the  public  community  for  the  public  community—it  builds  on  the  easy  findability  of  information  valued  by  the  marketplace  of  readers  and  consumers.    

Digg  realized  early  on  that  to  succeed  in  the  business  of  information,  they  needed  to  make  information  available  to  their  audience  as  effortlessly  as  possible.  They  saw  the  following  challenges  as  roadblocks  for  implementing  a  base  search  application:  

Requirements  

• Managing  unstructured  data  (13  million  documents  and  growing)  in  real  time  • Providing  results  faster  • Facilitating  smart  navigation  to  provide  information  in  digestible  portions  • Recognizing  and  eliminating  duplicate  content  • Providing  semantic  and  linguistic  smart  application  • Facilitating  scalability  while  retaining  costs    

 Digg  selected  Solr  for  its  unmatched  flexibility  and  functionality.  

The  Solr  Solution  

• Highly  customizable  and  flexible  • Results  in  subseconds,  with  simple-­‐to-­‐use  pull  downs  to  refine  results  • Fuzzy  duplicate  detection  (by  coding)  • Unlimited  scalability  and  seamless  integration  with  the  heterogeneous  environment  

Page 23: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 20

 Case  Study  7    

 

 

 

 

 

 

 

 

 

 

LinkedIn  

Connecting  50  million  registered  users  from  200  countries  across  170  industries  and  matching  them  to  the  right  professional  contacts  is  what  LinkedIn  is  all  about.  LinkedIn’s  business  is  premised  on  intelligent  search  application  that  could  overcome  the  following:    

The  Challenges  

• Managing  an  ever-­‐growing  database,  with  one  new  member  joining  and  creating  a  profile  every  second  

• Indexing  unstructured  data  in  real  time  • Giving  instant  query  responses,  even  in  peak  traffic  hours  • Providing  intuitive  navigation  and  intelligent  linguistic  support  • Integrating  with  other  Web  2.0  tools  to  build  user  profiles  that  integrate  data  from  multiple  

sources  They  chose  Lucene  to  implement  the  search  function  at  the  core  of  their  business  model.    

The  Lucene  Solution  

• Used  index  segmentation  for  faster  results  and  to  limit  index  base  • Provided  faceted  search  and  intelligence  support  features  like  changing  the  view  of  search  

results  and  auto-­‐completion  of  contacts    • Calculated  relative  relevance,  ranking  results  on  the  fly  based  on  relationship  between  the  user’s  

profile  and  the  other  profiles  being  searched    • Integrated  with  the  latest  web  tools;  for  example,  incorporating  videos  in  search  results  • Provided  "scale  as  you  grow”  facility  through  the  flexibility  of  the  open  source  model    

Page 24: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 21

 

Enterprise (Intranet) Search Enterprises  today  have  a  global  footprint,  which  leads  to  the  creation  of  multiple  content  types  and  the  use  of  disparate  applications  and  content  management  systems  across  business  centers.  The  result  is  often  silos  of  unmanaged  data  spread  across  the  intranet  of  an  enterprise—a  situation  where  information  is  omnipresent  but  cannot  be  used.  

To  achieve  a  competitive  advantage,  enable  intelligent  decisionmaking,  eliminate  duplication  of  work,  and  lower  the  cost  of  ownership,  enterprises  need  a  search  application  that  gives  structure  to  unstructured  data;  provides  a  single  gateway  to  search  across  multiple  enterprise  repositories,  with  speed,  flexibility,  and  intuitive  intelligence.    

Lucene/Solr  is  a  solid  match  for  enterprise  search.  As  a  customizable  and  multifunctional  search  application,  Lucene/Solr  provides  robust  search  features  at  minimal  cost.  The  open  source  development  model  behind  Lucene/Solr  integrates  seamlessly  with  legacy  tools,  and  brings  down  the  total  cost  of  ownership  significantly.    

Given  the  sensitive  nature  of  enterprise  content,  Lucene/Solr  facilitates  document-­‐level,  role-­‐based  security.  And  with  the  transparent  search  algorithms  and  configurability  for  relevancy,  Lucene/Solr  enables  intranet  search  with  the  precise  control  enterprise  content  owners  require,  ensuring  that  results  consistently  deliver  the  right  documents  to  the  right  people.  

 

 

 

 

Requirements  

• Single  interface  to  access  enterprise  data    

• Faster  results    • Control  over  search  

results    • Ready  integration  

with  existing  content  management  software  

 Solr  Solution  

• Single  gateway  for  all  types  of  data  

• Dynamic  boosting  of  content  

• Transparent  search  algorithms  and  relevancy  tuning  

• Customization  and  easy  integration  with  open  source  code  

   

“The  search  and  discovery  software  market  grew  19  percent  in  2008  to  $2.1  billion”  

Sue  Feldman,  IDC  

M

 

Page 25: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 22

 Case  Study  8  

Food  and  Drug  Administration  

The  Food  and  Drug  Administration  (FDA)  is  a  U.S.  government  agency  responsible  for  regulating  and  supervising  the  safety  of  foods  medications,  veterinary  products,  tobacco,  and  cosmetics.  The  FDA  has  a  large  repository  of  information  that  dates  back  multiple  decades,  and  exists  in  formats  ranging  from  early  optical  character  recognition  to  recent  electronic  formats.  To  mine  this  knowledge  base,  the  FDA  is  developing  a  semantic  mining  framework  using  open  source  tools  such  as  Apache  Lucene  and  Solr.  

Requirements    

• Integrating  petabytes  of  data  highly  distributed  across  the  intranet  of  an  enterprise  • Managing  multiple  indices  for  documents  stored  in  distributed  repositories    • Managing  and  maintaining  archival  data  and  evolving  vocabularies  • Indexing  unstructured  data  in  real  time  • Recognizing  and  eliminating  duplicate  content  • Handling  concurrent  queries  and  delivering  fast  and  relevant  results  • Restricting  search  results  according  to  agency  access  control  policies    • Integrating  with  existing  infrastructure  without  additional  overhead  

The  Lucene  Solution  

• A  single  gateway  to  search  across  multiple  enterprise  repositories  • Duplicate  detection    • Fast  and  relevant  results  with  content  analysis  and  query  interpretation  algorithms  • Filters  results  based  on  access  controls  and  security  policies  of  an  enterprise    • Facilitates  integration  with  existing  enterprise  infrastructure  to  reduce  TCO  

 

 

Page 26: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 23

Business Use Case Matrix To  simplify  mapping  your  search  needs  to  existing  search  applications  in  the  real  world,  the  matrix  below  compares  business  use  cases  against  key  search  requirements.  While  not  an  exhaustive  list,  the  matrix  highlights  the  different  business  use  cases  across  sectors  and  business  models,  reflecting  the  adaptability  of  Lucene/Solr  across  the  various  domains  of  search  applications  and  use  cases.  

 

Users Content Content Update Frequency

Verticals

Internal Customer Facing Original Aggregated High Medium Low

Access Control

Enterprise (Intranet) √ √ √ √

Schools/ Universities

√ √ √ √ √ √

Education Libraries √ √ √ √ √

Job Portals √ √ √ √

Social Networks √ √ √ √ √

News √ √ √ √

Media Media √ √ √ √

E-Commerce Sites √ √ √ √ √ √

Financial Services √ √ √ √ √

Yellow Pages √ √ √

Horizontal Portals √ √ √ √

 

Page 27: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 24

Appendix: Lucene/Solr Features and Benefits

Lucene  and  Solr  are  complementary  technologies  that  offer  very  similar  underlying  capabilities.  In  choosing  a  search  solution  that  is  best  suited  for  your  requirements,  key  factors  to  consider  are  application  scope,  development  environment,  and  software  development  preferences.    

Lucene  is  a  Java  technology-­‐based  search  library  that  offers  speed,  relevancy  ranking,  complete  query  capabilities,  portability,  scalability,  and  low  overhead  indexes  and  rapid  incremental  indexing.    

Solr  is  the  Lucene  Search  Server.  It  presents  a  web  service  layer  built  atop  Lucene  using  the  Lucene  search  library  and  extending  it  to  provide  application  users  with  a  ready-­‐to-­‐use  search  platform.  Solr  brings  with  it  operational  and  administrative  capabilities  like  web  services,  faceting,  configurable  schema,  caching,  replication,  and  administrative  tools  for  configuration,  data  loading,  statistics,  logging,  cache  management,  and  more.  

Lucene  presents  a  collection  of  directly  callable  Java  libraries  and  requires  coding  and  solid  information  retrieval  experience.  Solr  extends  the  capabilities  of  Lucene  to  provide  an  enterprise-­‐ready  search  platform,  eliminating  the  need  for  extensive  programming.    

Solr  provides  the  starting  point  for  most  developers  who  are  building  a  Lucene-­‐based  search  application.  It  comes  ready  to  run  in  a  servlet  container  such  as  Tomcat  or  Jetty,  making  it  ready  to  scale  in  a  production  Java  environment.    

With  convenient  ReST-­‐like/web-­‐service  interfaces  callable  over  HTTP,  and  transparent  XML-­‐based  configuration  files,  Solr  can  greatly  accelerate  application  development  and  maintenance.  In  fact,  Lucene  programmers  have  often  reported  that  they  find  Solr  contains  “the  same  features  I  was  going  to  build  myself  as  a  framework  for  Lucene,  but  already  very  well  implemented.”  Using  Solr,  enterprises  can  customize  the  search  application  according  to  their  requirements,  without  involving  the  cost  and  risk  of  writing  the  code  from  the  scratch.  

Lucene  provides  greater  control  of  your  source  code  and  works  best  in  development  environments  where  resources  need  to  be  controlled  exclusively  by  Java  API  calls.  It  works  best  when  constructing  and  embedding  a  state-­‐of-­‐the-­‐art  search  engine,  allowing  programmers  to  assemble  and  compile  inside  a  native  Java  application.  While  working  with  Lucene,  programmers  can  directly  control  the  large  set  of  sophisticated  features  with  low-­‐level  access,  data,  or  state  manipulation.    

Enterprises  that  do  not  require  strict  control  of  low-­‐level  Java  libraries  generally  prefer  Solr,  as  it  provides  ease  of  use  and  scalable  search  power  out  of  the  box.    

Page 28: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 25

 As  functional  siblings,  Lucene  and  Solr  have  become  popular  alternatives  for  search  applications;  the  two  differ  mainly  in  the  style  of  application  development  used.  Key  benefits  of  search  with  Lucene/Solr  include:    

 • Search  Quality:  Speed,  Relevance,  and  Precision  Lucene/Solr  provides  near-­‐real-­‐time  

search  and  strong  relevance  ranking  to  deliver  contextually  relevant  and  accurate  results  very  quickly.  Tailor-­‐made  coding  for  relevancy  ranking  and  sophisticated  search  capabilities  like  faceted  search  help  users  in  sorting,  organizing,  classifying,  and  structuring  retrieved  information  to  ensure  that  search  delivers  desired  results.  Search  with  Lucene/Solr  also  provides  proximity  operators,  wildcards,  fielded  searching,  term/field/document  weights,  find-­‐similar  functions,  spell  checking,  multilingual  search,  and  much  more.    

• Lower  Cost  and  Greater  Flexibility,  Plug  and  Play  Architecture  Lucene/Solr  reduces  recurring  and  nonrecurring  costs,  lowering  your  TCO.  As  open  source  software,  it  does  not  require  purchase  of  a  license  and  is  freely  available  for  use.  The  open  source  code  can  be  used  as  is,  modified,  customized,  and  updated  as  appropriate  to  your  needs.  Solr  is  easily  embedded  in  your  enterprise’s  existing  infrastructure,  reducing  costs  of  installation,  configuration,  and  management.    

• Open  Source  Platform  for  Portability  and  Easy  Deployment  Because  Lucene/Solr  is  an  open-­‐source  software  solution,  it  is  based  on  open  standards  and  community-­‐driven  development  processes.  It  is  highly  portable  and  can  run  on  any  platform  that  supports  Java.  For  instance,  you  can  build  an  index  on  Linux  and  copy  it  to  a  Microsoft  Windows  machine  and  search  there.  This  unsurpassed  portability  enables  you  to  keep  your  search  application  and  your  company’s  evolving  infrastructure  in  tandem.  Lucene,  in  turn,  has  been  implemented  in  other  environments,  including  C#,  C,  Python,  and  PHP.  At  deployment  time,  Solr  offers  very  flexible  options;  it  can  be  easily  deployed  on  a  single  server  as  well  as  on  distributed,  multiserver  systems.  

• Largest  Installed  Base  of  Applications,  Increasing  Customer  Base  Lucene/Solr  is  the  most  widely  used  open  source  search  system  and  is  installed  in  around  4,000  organizations  worldwide.  Publicly  visible  search  sites  that  use  Lucene/Solr  include  CNET,  LinkedIn,  Monster,  Digg,  Zappos,  MySpace,  Netflix,  and  Wikipedia.  Lucene/Solr  is  also  in  use  at  Apple,  HP,  IBM,  Iron  Mountain,  and  Los  Alamos  National  Laboratories.  

Page 29: The Case for Lucene/Solr:

                                                 

The Case for Lucene/Solr: Real World Search Applications A Lucid Imagination White Paper • January 2010   Page 26

• Large  Developer  Base  and  Adaptability  As  community  developed  software,  Lucene/Solr  provides  transparent  development  and  easy  access  to  updates  and  releases.  Developers  can  work  with  open  source  code  and  customize  the  software  according  to  business-­‐specific  needs  and  objectives.  Its  open  source  paradigm  lets  Lucene/Solr  provide  developers  with  the  freedom  and  flexibility  to  evolve  the  software  with  changing  requirements,  liberating  them  from  the  constraints  of  commercial  vendors.    

• Commercial-­Grade  Support  for  Mission  Critical  Search  Applications  from  Lucid  Imagination  Lucid  Imagination  provides  the  expertise,  resources,  and  services  that  are  needed  to  help  enterprises  deploy  and  develop  Lucene-­‐based  search  solutions  efficiently  and  cost-­‐effectively.  Lucid  helps  enterprises  achieve  optimal  search  performance  and  accuracy  with  its  broad  range  of  expertise,  which  includes  indexing  and  metadata  management,  content  analysis,  business  rule  application,  and  natural  language  processing.  Lucid  Imagination  also  offers  certified  distributions  of  Lucene  and  Solr,  commercial-­‐grade  SLA-­‐based  support,  training,  high-­‐level  consulting  and  value-­‐added  software  extensions  to  enable  customers  to  create  powerful  and  successful  search  applications.