The Enterprise Search Market in a Nutshell

  • Published on
    12-Apr-2017

  • View
    1.877

  • Download
    0

Embed Size (px)

Transcript

  • 1

    The Enterprise Search Market in a Nutshell

    Iain Fletcher

    ifletcher@searchtechnologies.com

    October 19, 2015

    ICIC 2015, Nice

    mailto:ifletcher@searchtechnologies.com

  • 2

    Agenda

    About Search Technologies (30 seconds)

    The enterprise search market

    Likely future architectures for supporting

    important search applications

  • 3

    Search Technologies: Background

    San Diego

    London UK

    San Jose, CR

    Cincinnati

    San Francisco

    Washington (HQ)

    Frankfurt DE

    Founded 2005

    180 employees

    600+ customers

    Independent consulting company

    Focus on enterprise search

    Working will all leading platforms

    Prague, CZ

  • 4

    600+ Customers

    http://www.lenovo.com/http://www.lenovo.com/http://www.chick-fil-a.com/Home.asphttp://www.chick-fil-a.com/Home.asphttp://commons.wikimedia.org/wiki/Image:US-FederalTradeCommission-Seal.svghttp://commons.wikimedia.org/wiki/Image:US-FederalTradeCommission-Seal.svghttp://www.petco.com/http://www.petco.com/http://www.teleflora.com/http://www.teleflora.com/

  • 5

    The Enterprise Search Market

  • 6

    High-level Search Engine Classifications

    1. Part of a portfolio, many are recently acquired technologies

    E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,

    Oracle/Endeca

    2. Stand-alone specialists, often deployed to address specific apps or

    challenges

    E.g. GSA, Coveo, Attivio, Sinequa, Recommind

    3. Open source, with or without support or proprietary add-ons

    Raw: Lucene, Solr, Elasticsearch

    With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK

    4. Cloud-based services, typically based on open source technology

    E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)

  • 7

    The dominant market share is currently with

    SharePoint, open source, and the GSA

    SharePoint 2013 search is credible, and bundled

    Search teams are under pressure to use it, or to provide a

    compelling reason to do otherwise

    Solr and Elasticsearch are robust and reliable

    Thanks to very wide-spread deployment

    The Google brand sells and a lot of GSAs have been

    shipped during the past few years

    Market Observations

  • 8

    Functional Observations

    Core indexing / searching is generally fast and reliable

    Search is a maturing / converging technology

    Key differences remain in peripheral functionality, such as

    content processing prior to indexing, and query processing

    Coveo, Attivio, Sinequa etc. have well-developed indexing

    pipelines, UI tools, and a range of data connectors

    SharePoint and GSA are delivered with limited content

    processing functionality and limited connectivity

    Solr, Elasticsearch, AWS Cloudsearch and Azure search dont

    provide a formal indexing pipeline, UI, or connectors

  • 9

    Further Observations

    The search engines with less focus on peripheral issues

    such as content processing and connectivity have dominant

    market share

    Connectivity is often challenging, especially when

    combined with continual data growth, and document-level

    security requirements

    The movement of data sets to the cloud adds further

    complexity for enterprise search systems

    Hybrid indexing environments will be with us for some years

    Some content sets in the cloud, some behind the firewall

  • 10

    Great Search requires Attention to Detail

    E.g. in content processing

    prior to indexing Normalization

    Names, dates, synonyms.

    Entity identification and resolution

    Categorization

    Document vector extraction

    Document splitting and concatenation

    Link & popularity analysis

    Dupe & near-dupe detectionIndex

    security

    category

    metadata

  • 11

    Future Directions for Search

    So what will search architectures look like in the future?

    Important influences:

    The business need for organizational and analytical agility

    The convergence of search and (big data) analytics

    Continual growth in data volumes, and evolution in

    repository / storage fashions

  • 12

    Converging Architectures

    Lets take a brief look at:

    1. The Big Data Architecture, as evangelized by IBM,

    Cloudera, etc.

    2. Recent Search Architectures

    Background Info

  • 13

    The Big Data Architecture

    Designed for Structured Data

  • 14

    The Traditional Search Architecture

    Integrated Search EngineContentSources

    Connectors Index Pipeline SearchIndexEmployee

    Directory

    CMS

    File Share

    UI

    Etc.

    Designed for Unstructured Content

  • 15

    The Traditional Search Architecture

    Integrated Search EngineContentSources

    Connectors Index Pipeline SearchIndexEmployee

    Directory

    CMS

    File Share

    UI

    Etc.

    As data volumes grow, re-indexing

    becomes challenging

    The rate at which content can be

    acquired from repositories is usually the

    bottleneck

    Designed for Unstructured Content

  • 16

    The Traditional Search Architecture

    Integrated Search EngineContentSources

    Connectors Index Pipeline SearchIndexEmployee

    Directory

    CMS

    File Share

    UI

    Etc.

    A few documents-per-second?

    There are only 2.6 million seconds in a

    month

    RE-INDEX

  • 17

    A Better Search Architecture

    Re-indexing rates greatly improved

    Touch-time with repositories can be managed autonomously

    Search EngineContentSources

    ConnectorsIndex

    PipelineSearchIndex

    EmployeeDirectory

    CMS

    Etc.

    RE-INDEX

    Content

    Processing

    SecureCache

    Iterative

    Development

  • 18

    The Future Architecture?

    Hadoop

    Search EngineContentSources

    ConnectorsIndex

    PipelineSearchIndexEmployee

    Directory

    CMS

    Etc.

    RE-INDEX

    Content

    Processing

    SecureCache

    Iterative

    Development

    This environment will encourage ever more sophisticated text analytics

    We expect to see much innovation in text analytics during the next few years

    The deliverable is a better, and richer search index

  • 19

    An Established Architecture

    Hadoop

    Search EngineContentSources

    ConnectorsIndex

    PipelineSearchIndexEmployee

    Directory

    CMS

    Etc.

    RE-INDEX

    Content

    Processing

    SecureCache

    Iterative

    Development

    Google.com works something like this, since 2004

  • 20

    An Integrated Search/Analytics Architecture

    Hadoop

    ContentSources

    Connectors

    CMS

    File system

    Rapid Indexing

    Content

    Processing

    SecureCache

    Iterative

    Development

    ETL

    DataSources

    Data Warehouse

    Logfiles

    Etc.

    Etc. Search App.

    Search App.

    Analysis App.

    Analysis App.

    Encourages agile exploitation of data and content resources

  • 21

    Summary 1

    Search and Big Data applications are tending towards to the same architecture

    Autonomous connectivity and content processing simplifies and de-risks if you can get it right

    The foundation of great search is still a clean, rich and detailed index

    The search index itself is a mature technology, almost a commodity

    Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing

  • 22

    The compulsory analyst quote.

    And finally.

    Enterprise Search Can Bring Big Data Within Reach

    Multiple, purpose-built indexes that are derived from enriched content are necessary.

    http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/

    * Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog

    http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/

  • 23

    The Enterprise Search Market in a Nutshell

    Iain Fletcher

    ifletcher@Searchtechnologies.com

    October 20, 2015

    Questions?

    mailto:ifletcher@Searchtechnologies.com

  • 24

    Spare Slides

  • 25

    Reference Architecture

    Content sources

    Connectors

    Indexes

    Semantics

    Text Mining

    Quality Metrics

    Content Processing Pipelines

    Big Data Framework

    Indexes

    Queryparsing

    Search Engine

    Web Browser

    Staging Repository

  • 26

    Where is the Focus?

    The Business View

    The Implementation View

    ApplicationContent Capture & PreparationData Store

    / Index

    ApplicationContent Capture

    & PreparationData Store

    / Index