45
Is Enterprise Search Ripe for Open Source Disruption? Larry Cannell Senior Analyst Burton Group [email protected] www.burtongroup.com Brian Pinkerton Chief Architect Lucid Imagination www.lucidimagination.com

Is Enterprise Search Ripe for Open Source Disruption?

Embed Size (px)

DESCRIPTION

Presenation given by Larry Cannell, Senior Analyst of Burton Group and Brian Pinkerton, Chief Architect of Lucid Imagination at Enterprise 2.0 San Francisco 2009.

Citation preview

Page 1: Is Enterprise Search Ripe for Open Source Disruption?

Is Enterprise Search Ripe for Open

Source Disruption?

Larry Cannell

Senior Analyst

Burton Group

[email protected]

www.burtongroup.com

Brian Pinkerton

Chief Architect

Lucid Imagination

www.lucidimagination.com

Page 2: Is Enterprise Search Ripe for Open Source Disruption?

Agenda

•Why Open Source and Search?

•Enterprise Opportunities to Use Open Source Search

•Market Analysis

•Lucid Imagination

Open Source Search2

Page 3: Is Enterprise Search Ripe for Open Source Disruption?

Agenda

•Why Open Source and Search?

•Enterprise Opportunities to Use Open Source Search

•Market Analysis

•Lucid Imagination

Open Source Search3

Page 4: Is Enterprise Search Ripe for Open Source Disruption?

Can You Tell the Difference?4

Page 5: Is Enterprise Search Ripe for Open Source Disruption?

Can You Tell the Difference? Netflix5

Page 6: Is Enterprise Search Ripe for Open Source Disruption?

Can You Tell the Difference? CNET6

Page 7: Is Enterprise Search Ripe for Open Source Disruption?

Can You Tell the Difference? Best Buy7

Page 8: Is Enterprise Search Ripe for Open Source Disruption?

Can You Tell the Difference? Wikipedia8

Page 9: Is Enterprise Search Ripe for Open Source Disruption?

Can You Tell the Difference? Monster9

Page 10: Is Enterprise Search Ripe for Open Source Disruption?

Which Site Uses Open Source Search?

Netflix

Best Buy

CNET

Wikipedia

Monster

10

Page 11: Is Enterprise Search Ripe for Open Source Disruption?

Which Site Uses Open Source Search?

Best Buy

Monster

11

Page 12: Is Enterprise Search Ripe for Open Source Disruption?

Lucene and Solr Gets Funded

Why Open Source and Search?12

Page 13: Is Enterprise Search Ripe for Open Source Disruption?

Agenda

•Why Open Source and Search?

•Enterprise Opportunities to Use Open Source Search

•Market Analysis

•Lucid Imagination

Open Source Search13

Page 14: Is Enterprise Search Ripe for Open Source Disruption?

Basic Website/Intranet Search

Enterprise Opportunities14

Page 15: Is Enterprise Search Ripe for Open Source Disruption?

Basic Website/Intranet Search

Vertical Search

Enterprise Opportunities15

Page 16: Is Enterprise Search Ripe for Open Source Disruption?

Basic Website/Intranet Search

Vertical Search

No compelling reason to use open source

Only consider if you have more headcount than budget

Enterprise Opportunities16

Page 17: Is Enterprise Search Ripe for Open Source Disruption?

Basic Website/Intranet Search

Vertical Search

No compelling reason to use open source

Only consider if you have more headcount than budget

Best opportunities for open source search

Enterprise Opportunities17

Page 18: Is Enterprise Search Ripe for Open Source Disruption?

Agenda

•Why Open Source and Search?

•Enterprise Opportunities to Use Open Source Search

•Market Analysis

•Lucid Imagination

Open Source Search18

Page 19: Is Enterprise Search Ripe for Open Source Disruption?

Numerous Options

• Beagle

• DataparkSearch

• egothor

• Htdig

• Hounder

• Lemur

• MG4J

• Minion

• Mnogosearch

• Namazu

• OpenFTS

• regain

• Red Piranha

• Simplexo

• Sphinx

• Swish-e

• Swish ++

• Terrier

• Wumpus

• Zettair

19

Page 20: Is Enterprise Search Ripe for Open Source Disruption?

Honorable Mention20

Page 21: Is Enterprise Search Ripe for Open Source Disruption?

The Short List21

Page 22: Is Enterprise Search Ripe for Open Source Disruption?

The Short List22

Page 23: Is Enterprise Search Ripe for Open Source Disruption?

Lucene Family Tree

Lucene

LucenePorts

23

Page 24: Is Enterprise Search Ripe for Open Source Disruption?

Lucene Family Tree

Lucene

NutchLucenePorts

Hadoop(2002) (2005)

(2000)

24

Page 25: Is Enterprise Search Ripe for Open Source Disruption?

Lucene Family Tree

Lucene

Nutch

Solr

LucenePorts

Hadoop(2002) (2005)

(2005)

(2000)

25

Page 26: Is Enterprise Search Ripe for Open Source Disruption?

Content Set

UserInterface

SearchEngine

Search Repository

ContentIngestion

Administration

26

Page 27: Is Enterprise Search Ripe for Open Source Disruption?

Content Set

UserInterface

SearchEngine

Search Repository

ContentIngestion

Administration

Lucene

27

Page 28: Is Enterprise Search Ripe for Open Source Disruption?

Content Set

UserInterface

SearchEngine

Search Repository

ContentIngestion

Administration

Solr

28

Page 29: Is Enterprise Search Ripe for Open Source Disruption?

The MySQL of search servers?

• Search server based on Lucene

•Easy initial setup

•Web services-like interface (XML over HTTP)

•Support for non-Java clients

•Caching, performance tuning, high-availability, load balancing

•Faceted browsing, similar documents

Solr’s Potential to Disrupt29

Page 30: Is Enterprise Search Ripe for Open Source Disruption?

The MySQL of search servers?

• Search server based on Lucene

•Easy initial setup

•Web services-like interface (XML over HTTP)

•Support for non-Java clients

•Caching, performance tuning, high-availability, load balancing

•Faceted browsing, similar documents

• Commoditizes vertical search

•Could have similar impact on application development as

ODBC/JDBC

• Consider the 1000s of applications enabled by

ODBC/JDBC

•Vertical search can now be applied to almost any application

Solr’s Potential to Disrupt30

Page 31: Is Enterprise Search Ripe for Open Source Disruption?

Open Source Search

References

•Burton Group’s Collaboration and Content Strategies

•Open Source Search: Bringing Enterprise Search Out into the Open

•Enterprise Information Search: Transforming Search into an Insight

Engine (January 2010)

•A Complex Query: What’s the Right Enterprise Search Engine?

•Open Source Communication, Collaboration, and Content

Management: Cutting-Edge Innovation, Low-Cost Imitation, or Both?

Page 32: Is Enterprise Search Ripe for Open Source Disruption?

Open  Source  Search

Brian  PinkertonChief  Architect

1

Page 33: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Why  Open  Source  for  Search?

Large  scale:  billions  of  documents;  hundreds  of  cluster  nodes

Uses  modern  architectures  to  achieve  massive  scalability

Some  of  the  biggest  search  indexes  are  on  open  source  soFware

High  Performance

Fast  response  8me

Flexible  relevance

Use  built-­‐in  relevance  (on  par  with  others)  or  augment

Stand-­‐alone,  integrated,  or  embedded

Mature,  yet  not  stuck  in  8me

Con8nued  momentum  on  all  facets  of  the  products

Great  support  from  the  community

2

Page 34: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Example:  Searching  Social  Media

Everyone  collaborates  with  everyone  on  everything  everywhere

You’ve  heard  the  hype

Much  is  probably  just  that

But  it’s  changing  Web  habits

And  it’s  pushing  the  state  of  the  art  in  search

Enterprise  adop8on  is  trailing  the  wide  Web,  but  it’s  coming

Will  you  be  ready?

3

Page 35: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Search  is  Essen;al

Too  much  content  to  navigate  without  filtering

Some8mes,  only  analy8cs  can  do  the  job

Other  8mes,  users  expect  to  search,  not  navigate

Used  for  surfacing  more  than  just  plain  old  search  results  

4

Page 36: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

How  is  Social  Media  Transforming  Search?

5

20th  Century Web  1.0 Web  2.0

Business-­‐generated  content Power-­‐user  content;  HTML  only User-­‐generated  content

Searches  the  a\ributes Searches  the  content Both,  plus  the  interac(on

Normalized  data  model Flat  data  model Ad  hoc  normaliza8on

Transac8onal  models Batch  processing Powered  by  now

Batch  analy8cs Few  analy8cs User-­‐driven  analysis

Page 37: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Examples  of  Searching  Social  Media

6

Pioneer  in  blog  searching:  Technora8 Lucene  →  Solr

Analyizing  the  Interac8on:  Scout  Labs Lucene

Bo\om-­‐up  relevance:  digg Solr

People  are  the  content:  LinkedIn Lucene

People  and  places:  Yelp Lucene

Pa\erns  from  the  people:  Xmarks Lucene

Searching  the  Social  Universe:  MySpace Lucene.NET

Page 38: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Technora;:  Blog  Search

Technora;  is  a  blog-­‐discovery  engine300,000  new  posts  per  day

Surge  of  posts  in  the  morning

Separate  indexes  for  blog  and  post  data

Noisy,  user-­‐generated  content

Search  used  behind  the  scenes  to  build  the  user  interface

New  index  keeps  only  a  limited  8me  available

7

Page 39: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Scout  Labs:  Analyzing  the  Interac;on

Scout  Labs  is  a  social-­‐media  monitoring  tool

Mines  the  stream  of  interac8on  across  many  forms  of  social  media:  blogs,  comments,  tweets,  forums,  mailing  lists

The  interac8on  can  be  messy,  so  Scout  Labs  provides  summaries

Analy8cs  provide  comparisons

Sen8ment  summarizes  adtudes  

Because  of  the  analy8cs,  must  keep  more  data  online  -­‐  this  can  get  expensive

8

Page 40: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

digg:  BoMom-­‐up  Relevance

Digg  shows  user-­‐submiMed  links  in  real  ;me

Users  vote  up  or  down  on  submissions

Content  is  indexed  in  near-­‐real  8me

Results  are  scored  by  a  combina8on  of  factors  (recency,  number  of  diggs,  etc.)

9

Page 41: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

LinkedIn:  People  are  the  Content

LinkedIn  is  a  business  social  network50  million  members

Faceted  search

facets  on  loca8on,  industries,  companies,    rela8onship,  etc.

not  all  are  easy  to  implement

Sor8ng  by  relevance  +  rela8onship

requires  significant  query-­‐8me  work

10

Page 42: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Yelp:  People  and  Places

Yelp  facilitates  user  reviewsSearches  business  meta-­‐data  plus  review  content

Heavy  geographic  component

Results  are  structured  by  establishment,  but  searchable  by  review

11

Page 43: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Xmarks:  PaMerns  from  the  People

Xmarks  provides  bookmark  sync  and  Web  discovery

First  provided  bookmark  sync;  adopted  by  millions  of  users

Aggregates  bookmark  folder  structure  and  meta-­‐data  by  URL

This  descrip8ve  content  is  mined  to  provide  a  searchable  index

Needed  new  ranking  algorithms  to  provide  good  relevance  and  filter  out  the  noise

12

Page 44: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

MySpace:  Searching  it  all

MySpace  does  it  all:Many  content  types  from  all  over  the  site

User  generated  content  +  user  interac8ons

Near  Real  Time

New  content  and  users  arriving  24x7

Both  end-­‐user  and  administra8ve  func8ons

admin  func8ons  include  log  file  searching

automated  tasks  help  iden8fy  spam,  other  problems

Massive  scale:  billions  of  records,  petabytes  of  source  data

new  content  at  the  rate  of  1TB  every  week

13

Page 45: Is Enterprise Search Ripe for Open Source Disruption?

Lucid  Imagina8on,  Inc.

Social  Media  is  Pushing  Search  In  New  Direc;ons

Searches  the  product  of  interac8on  among  users,  not  just  content

Aggregates  data  from  mul8ple  sources  at  search  8me

Operates  in  real-­‐8me,  as  data  is  produced

Extends  the  tradi8onal  no8ons  of  relevance

Builds  analy8cs  on  top  of  search

and...  you  can  build  all  of  this  on  open  source  products!

14