52
Enterprise Search 8/12/2011 – Damien Dewitte

Enterprise Search - Introduction

Embed Size (px)

DESCRIPTION

Need a good introduction to Enterprise Search?

Citation preview

Page 1: Enterprise Search - Introduction

Enterprise Search8/12/2011 – Damien Dewitte

Page 2: Enterprise Search - Introduction

2.

Enterprise SearchSetting the scene

Damien Dewitte

Lead ECM consultant

Page 3: Enterprise Search - Introduction

3.

search

The enterprise search promiseSome thoughts on search scenariosMake your content “findable”Search: How it worksThe enterprise search market

Contents

Page 4: Enterprise Search - Introduction

4.

Page 5: Enterprise Search - Introduction

5.

While on the Intranet …

Page 6: Enterprise Search - Introduction

6.

the Enterprise Search promise

Page 7: Enterprise Search - Introduction

7.

The Enterprise Search Promise

IDC 2001:”The High Cost of Not Finding Information”Ø Cost=

Poor decisions based on faulty or poor informationDuplicated efforts within different divisions/projectsLost sales due to customer’s inability to find product and servicesLost productivity due to employees inability to find information

Page 8: Enterprise Search - Introduction

8.

The Enterprise Search Promise

Google (2008)

Page 9: Enterprise Search - Introduction

9.

The Enterprise Search Promise

Page 10: Enterprise Search - Introduction

10.

The Invisible Intranet

Using Search on an Intranet usually leaves a huge portion of existing valuable information ‘invisible’, becauseØ Some information silos are not indexed:

Databases with structured content

External sources

Isolated departmental content repositories

Individual desktops

Content applications ‘in the cloud’

Digital ArchivesØ Some Information is “over-secured”Ø Some Information is trapped in proprietary file formats, which can not

be indexedØ Some Information can not be extracted as text

Rich Media files (Audio, Video)

Badly scanned documents

Page 11: Enterprise Search - Introduction

11.

The Enterprise Search Promise

Page 12: Enterprise Search - Introduction

12.12

The Enterprise Search Promise

RDBMS(JDBC, ODBC,SQLNet, DW,

DM)

Applications(e.g. ERM, CRM,

Help Desk)

Legacy Data(e.g. ISAM, VSAM, IMS)

Message Queues(e.g. TIBCO, MQ-Series)

DMS(e.g. M’Soft CMS,

Documentum)

eMail Systems(e.g. Notes,Exchange)

Files(e.g. Word, Excel,pdf, images, mp3)

Portals(e.g. WebSphere,

WebLogic)

WWW(HTML, XML, WML,

JavaScript)

Private Webs(e.g. news feeds,

Intranets)

Direct Push

UNSTRUCTUREDSTRUCTURED REAL--TIME

Enterprise Search PlatformSI

TE S

EAR

CH

MA

IL S

EAR

CH

BI S

EAR

CH

DM

S SE

AR

CH

CO

RPO

RAT

ESE

AR

CH

ECO

MM

ERC

ESE

AR

CH…

Page 13: Enterprise Search - Introduction

13.13

The Enterprise Search Promise

“There’s no reason to expect that search is going to get that much better. The basic algorithms by which search is done have not improved much since about 1975.

The only way to improve the situation is by enhancing search engines with more deterministic metadata.

If you look at the victory of Google, it wasn’t because they had better search techniques. It’s because they deployed one key metadata value – how many pages are linked to this one – to enhance the relevancy of their results.The same concepts need to be applied to the enterprise.”

(Tim Bray)

Page 14: Enterprise Search - Introduction

14.

Some thoughts on search scenarios

Page 15: Enterprise Search - Introduction

15.

Enterprise versus web search

Web EnterpriseContent Mainly HTML and

PDFAll formats and sources, including databases and legacy systems

Security Focus on system security

Also restricting user access to specific content

Updates Via (scheduled) crawling

Push updates to the index (near real time)

Volume On average: 1000 files

Potentially: > 1.000.000 “records”

Metadata management

Centrally in e.g. Web CMS

Consolidate metadata from various source systems

Relevance Popularity via hyperlinks

Popularity via “social” instruments?

Page 16: Enterprise Search - Introduction

16.

Enterprise versus web search

Probably the cheapest website search you can find

Page 17: Enterprise Search - Introduction

17.

Structured versus unstructured

Start by filtering

Start by typing

Page 18: Enterprise Search - Introduction

18.

Search versus research

“Meeting minutes social collaboration project” “Amplexor

proposal for Intranet”

“Timesheets april 2009”

“Ecm and Green IT in Europe”

“Does ECM have impact on governmental decisions in Spain?”

“I know you’re out there..”“Life is like a box of chocolates, …You never know what you gonna get”

“average time spent on searching for content”

Page 19: Enterprise Search - Introduction

19.

Search versus research

Search based onØ Information Type (Meeting minutes,

Proposal, Invoice, Timesheet, …)Ø Document Format (PDF, DOC, PPT, e-

mail, …)Ø Organisational Source

Projects

Products

Processes– HR– Compliance– Marketing– IT– …

…Ø Publication Date, Modification dateØ Author

“Meeting minutes social collaboration project”

Search queries are more or less predictable (after analysis)

Page 20: Enterprise Search - Introduction

20.

Search versus research

Research based onØ Entities:

People

Geographical locations

Companies & Brands

…Ø Source: Internal or ExternalØ Publication Date RangeØ Natural language search

“Does ECM have impact on governmental decisions in Spain?”

Search queries are unpredictable. The system should be “taught” how to interpret a query. (natural language search, entity extraction from content, …

Page 21: Enterprise Search - Introduction

21.

Metadata

What is metadata?Ø Information about the information:

Descriptive

Structural

Administrative

Types of metadata:Implicit (e.g. creation date, publication date, URL, filename, file format, source system, …)

Explicit (e.g. owner, topic, summary, expiry date, status, …)

Guiding metadata input with:Taxonomies

Folksonomies

Ontologies

Page 22: Enterprise Search - Introduction

22.22

Taxonomies

Page 23: Enterprise Search - Introduction

23.

Folksonomies

http://taggalaxy.de

Page 24: Enterprise Search - Introduction

24.

Ontologies

Taxonomies, representing knowledge as a set of concepts within a domain, and the relationships between those concepts

http://en.wikipedia.org/wiki/Geopolitical_ontology

Page 25: Enterprise Search - Introduction

25.

Metadata

Statement 1: “A performant Enterprise Search Engine should not require information workers to add metadata. It should just Crawl all my information sources”

But:Ø Will users understand the

results displayed? (title, author, …

Ø How will they filter results?Ø Does it really help to crawl

1.000.000 records if 900.000 have becomeirrelevant over time?

Page 26: Enterprise Search - Introduction

26.

Metadata

Statement 2: “Google doesn’t need metadata”

Are you sure?

Page 27: Enterprise Search - Introduction

27.

Metadata

So you think Google doesn’t need metadata?

Page 28: Enterprise Search - Introduction

28.

Simple example of the semantic web

Page 29: Enterprise Search - Introduction

29.

Metadata

Statement 3: Adding metadata is so time consuming my information workers will never do it.

Yes, but:Ø In an structured ECM approach, it is possible to automate lots of the

metadata input, because it can be deduced from some business rulesØ If you’re not 100% sure you will need a metadata field for a specific

purpose, then don’t create it.Ø Convince users about the value of the metadata fields which remainØ Make it user friendly for content contributors to add metadata

Page 30: Enterprise Search - Introduction

30.

Metadata

Avoid defining metadata around the document, if it should already be present IN the document.

Page 31: Enterprise Search - Introduction

31.

Make content findable

Page 32: Enterprise Search - Introduction

32.

Findability

Findability is not obtained just by implementing search technology

AIIM.org: “Information Organization and Access (IOA) refers to a collection of technologies to help you organize and find information”, which includes:Ø enterprise searchØ content classificationØ categorization and clusteringØ fact and entity extractionØ taxonomy creation and managementØ information presentation (i.e., visualization)Ø information governance

Page 33: Enterprise Search - Introduction

33.

Findability Tips & Tricks

The more value content has, the more effort should be spent in managing it (and making it findable)

Page 34: Enterprise Search - Introduction

34.

Findability Tips & Tricks

One search interface doesn’t solve it all. Keep in mind thatØ Specific content sources or Lines of Business might require specialized

search screens

Page 35: Enterprise Search - Introduction

35.

Findability Tips & Tricks

Define specific search scopes, if your information governance permits …

Page 36: Enterprise Search - Introduction

36.

Findability Tips & Tricks

Landing Pages are still “in”!Ø Projects Overview PageØ Knowledge base page

(links to knowledge bases)Ø Practical Guide

(categorized hyperlinks to practical information)

Ø ToolsØ FormsØ Filtered listings (e.g.

Automatic listing of all FAQ Content types)

Page 37: Enterprise Search - Introduction

37.

How search works

Page 38: Enterprise Search - Introduction

38.

How it works

CO

NN

ECTO

RS

Pipeline

SEARCH QU

ERY &

RESU

LTPR

OC

ESSING

FILTER

Query

Results

Alert

VerticalApplications

Portals

CustomFront-Ends

MobileDevices

DATABASECONNECTO

R

FILETRAVERSE

R

WEBCRAWLER

ContentPush

DO

CU

MEN

TPR

OC

ESSING

Pipeline

WebContent

Files,Documents

Databases

CustomApplications

CO

NN

ECTO

RS

TUNING, ADMINISTRATION

Index Files

Pipeline

Multimedia

Architecture

Page 39: Enterprise Search - Introduction

39.

How it works

Connect to content sources and get dataØ Web pages (e.g. XML, HTML, WML): CrawlerØ Files, documents (e.g. Word, Excel, pdf): File

traverserØ Database content (e.g. Oracle, DB2): Database

connectorsØ Applications (e.g. Sharepoint, Documentum,

Exchange, CMS/DMS): Application connectors

CO

NN

ECTO

RS

Pipeline

SEARCH QU

ER

Y &

RES

ULT

PR

OC

ESS

ING

FILTER

Query

Results

Alert

VerticalApplications

Portals

CustomFront-Ends

MobileDevices

DATABASECONNECTO

R

FILETRAVERSE

R

WEBCRAWLE

R

ContentPush

DO

CU

MEN

TPR

OC

ESSING

Pipeline

WebContent

Files,Documents

Databases

CustomApplications

CO

NN

ECTO

RS

TUNING, ADMINISTRATION

Index Files

Multimedia

Page 40: Enterprise Search - Introduction

40.

How it works

WebContent

CO

NN

ECTO

RS

Pipeline

SEARCH QU

ERY /R

ESULT

PRO

CESSIN

G

FILTER

Query

Results

Alert

VerticalApplications

Portals

CustomFront-Ends

MobileDevices

DATABASECONNECTO

R

FILETRAVERSE

R

WEBCRAWLE

R

DO

CU

MEN

TPR

OC

ESSING

Pipeline

CO

NN

ECTO

RS

TUNING, ADMINISTRATION

Index Files

Files,Documents

Databases

CustomApplications

ContentPush

Pipeline

Multimedia

Analyze and index content to make it searchable

Ø Convert and process content through pre-processing pipeline:

Lemmatization/stemming, entity extraction, taxonomy classification

Custom logic (e.g. adding special tags)

Ø Write content to index files

Page 41: Enterprise Search - Introduction

41.

Search EngineHow It Works

Analyze query

Ø Use query language or query APIØ Convert and process query through query pipeline:

Linguistic processing Custom logic (e.g. query term

modification/addition)

WebContent

CO

NN

ECTO

RS

Pipeline

SEARCH

QU

ERY

PRO

CESSIN

G

FILTER

Query

Results

Alert

VerticalApplications

Portals

CustomFront-Ends

MobileDevices

DATABASECONNECTO

R

FILETRAVERSE

R

WEBCRAWLE

R

ContentPush

DO

CU

MEN

TPR

OC

ESSING

Pipeline

CO

NN

ECTO

RS

TUNING, ADMINISTRATION

Index Files

Files,Documents

Databases

CustomApplications

Multimedia

Page 42: Enterprise Search - Introduction

42.

How it works

Match query to content index

Ø Query- and content adaptive matchingØ Exploit all information and structure in the data

CO

NN

ECTO

RS

Pipeline

SEARCH QU

ERY /R

ESULT

PRO

CESSIN

G

FILTER

Query

Results

Alert

VerticalApplications

Portals

CustomFront-Ends

MobileDevices

DATABASECONNECTO

R

FILETRAVERSE

R

WEBCRAWLE

R

DO

CU

MEN

TPR

OC

ESSING

Pipeline

CO

NN

ECTO

RS

TUNING, ADMINISTRATION

Index Files

WebContent

ContentPush

Files,Documents

Databases

CustomApplications

Pipeline

Multimedia

Page 43: Enterprise Search - Introduction

43.

CO

NN

ECTO

RS

How it works

Return results to user

Ø Convert and process results through result pipeline:

Resort, filter for security, organize for dynamic drilldown

Ø Pass results on to application (generated or through API) Ø Push results to alert engine and then external environment (e.g. mail, queue)

WebContent

Pipeline

SEARCH RESU

LTPR

OC

ESSING

FILTER

Query

Results

Alert

VerticalApplications

Portals

CustomFront-Ends

MobileDevices

DATABASECONNECTO

R

FILETRAVERSE

R

WEBCRAWLE

R

ContentPush

DO

CU

MEN

TPR

OC

ESSING

Pipeline

CO

NN

ECTO

RS

TUNING, ADMINISTRATION

Index Files

Files,Documents

Databases

CustomApplications

Multimedia

Page 44: Enterprise Search - Introduction

44.

Mediafin

Page 45: Enterprise Search - Introduction

45.

How it works

Federated Search: Relies on the indexes and the relevance algorithms of the under laying search engines

Page 46: Enterprise Search - Introduction

46.

the Enterprise Search market

Page 47: Enterprise Search - Introduction

47.

The Enterprise Search Market

What’s the vendors focus?Ø Business IntelligenceØ Text-mining (linguistic support!)Ø E-CommerceØ Image/Video: Visual Information retrievalØ Audio/Video: speech recognitionØ eDiscoveryØ …

Page 48: Enterprise Search - Introduction

48.

The Enterprise Search Market

Enterprise search products can be:Ø Specialized — products that use search to address a need in a

specific area like customer service or to supplement business intelligence platforms

Ø Integrated — products that merge search capabilities with other information management functions like content management, collaboration or analytics; the goal of these products is to become deeply ingrained in the technology portfolio so that the use of the tool becomes a ubiquitous part of the information workplace

Ø Detached — products like Google’s appliance focused on ease of deployment and flexibility

Page 49: Enterprise Search - Introduction

49.

The Enterprise Search Market

Forrester (september 2011) evaluated twelve vendors/products in its Market Overview (not including open source):Ø Autonomy IDOL 7 Acquired by HPØ Attivio AIE 1.3Ø Coveo Platform 6.5Ø Endeca Latitude 2 Acquired by OracleØ Exalead CloudView 5.1Ø Fabsoft Mindbreeze 5.0Ø Google Search Appliance 6.8Ø IBM Content Analytics with Enterprise Search 2.2Ø ISYS Enterprise Server v9.7Ø Microsoft FAST Search for SharePoint Server 2010Ø Sinequa ES 7Ø Vivisimo Velocity 8.0

Page 50: Enterprise Search - Introduction

50.

The Enterprise Search Market

Important TrendsØ Social and collaborative featuresØ Mobile supportØ Audio/VideoØ CloudØ Spatial supportØ Semantics/text analyticsØ Search Based Applications

(“SBA”)

Page 51: Enterprise Search - Introduction

51.

Wrap up

Search Technology platforms are mature and are available on the market in abundance and multiple flavors.

But,

make sure you are:

Cost-effective (what’s the business case? Priorities?)

Consistent in Content classification and Governance

Continuously monitoring usage and improving relevance

Clever & Pragmatic

Creative (User interface, multi-device)

Page 52: Enterprise Search - Introduction

52.

Thank you!