16
Good Information Is Hard to Find: Guidelines for Managers Considering Open Source Enterprise Search A Lucid Imagination White Paper

Guidelines for Managers Considering Open Source Enterprise Search

Embed Size (px)

DESCRIPTION

"Open source is a good solution for implementing search across your enterprise.Download this free white paper and learn about the advantages of open source: * Lower costs * Pay at the point of value * Transparent development * Lower overall risk * Professional support from the technology expertshttp://www.lucidimagination.com/files/file/whitepaper/LIWP_ConsiderOpenSource.pdf "

Citation preview

Good Information

Is Hard to Find:

Guidelines for Managers

Considering Open Source Enterprise Search

A Lucid Imagination White Paper

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 1

Abstract Enterprise search helps employees, customers, and partners find the most relevant and

timely information, enabling them to make smart, efficient decisions about doing business

with your company. Open source has provided strong benefits in enterprise software such

as operating systems, databases, and middleware, now unleashes value in enterprise

search. Lucid Imagination brings market-leading expertise to open source enterprise

search, and can help any organization quickly design and optimize search solutions based

on Lucene and Solr.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 2

Table of Contents

Abstract ..................................................................................................................................................................... 1

Introduction and Overview ............................................................................................................................... 3

The Advantages of Open Source ...................................................................................................................... 5

Lower Costs ......................................................................................................................................................... 5

Pay at the Point of Value ................................................................................................................................. 6

Transparent Development ............................................................................................................................ 6

Re-tool the employees, retire the software ............................................................................................. 7

Lower Overall Risk ........................................................................................................................................... 7

About Lucid Imagination .................................................................................................................................... 8

Engagement Scenarios ..................................................................................................................................... 10

Considering Alternatives to Legacy Packaged Search Applications ........................................... 10

Building on In-house Lucene/Solr Expertise ...................................................................................... 12

Next Steps ............................................................................................................................................................. 13

Appendix: About Apache Lucene and Solr ............................................................................................... 14

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 3

Introduction and Overview Raising the collective intelligence of company employees can make them smarter and more

efficient—but how do you enable them to keep up with the vast, ever-changing amount of

data your organization produces? Many operations seem to be better at creating data than

using it to operate more productively. Using search tools designed for the Web can make it

difficult to find relevant, timely corporate information, mostly because corporate data is

not much like Web data:

• Corporate data can be stored in a variety of different and unstructured formats,

including documents and database records.

• A document’s popularity is not necessarily what makes it useful to a specific search.

• Information may require controlled access, yet still be discoverable to those users

with the appropriate permissions.

Two state-of-the-art, open source search technologies—Lucene and Solr—are available for

free from the Apache Software Foundation. Lucene is a powerful search engine and library;

Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based

applications.1 Rich, flexible text query tools and sophisticated ranking capabilities of

Lucene/Solr enable users to quickly find the most useful documents or records.

Either of these full-featured technologies delivers excellent performance, relevancy

ranking, and scalability. They are used today by thousands of organizations, powering

substantial and diverse search applications for AOL, CNET, Comcast Interactive Media, IBM,

Netflix, LinkedIn, MySpace, and many others. For these companies, Lucene/Solr solutions

regularly index and search hundreds of millions of documents with subsecond response

time, all without incurring any licensing fees.

These solutions excel at quickly and effectively searching large volumes of unstructured

text—documents or other records containing freeform text—and returning results based

1 Most organizations use Solr today as their search development platform. Because Lucene serves as the core of Solr’s search capabilities, this paper refers to them as Lucene/Solr. For more information about these technologies, see the Appendix.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 4

on how well they match the user’s query. At most companies, this means digesting and

searching through dozens of different file formats—including documents, spreadsheets,

presentations, e-mail, and records stored in databases, to name just a few—and delivering

relevant results to authorized users. Incremental update capabilities mean that

Lucene/Solr searches can track document collections easily as they grow and change,

finding information nearly as fast as it is created.

Solr can speedily facet, or categorize, data and search results based on specific field values.

An excellent example of this function is Zappos.com, the popular shoe e-tailer, where users

can quickly refine searches based on product criteria such as price or features.

For most application development teams, building a search application is not an everyday

project. By definition, enterprise search technology processes unstructured data, which can

change frequently. Expert guidance on architectural considerations, such as index

optimization, result relevance, deployment configuration, and retrieval performance can

make a tremendous difference in deploying a successful solution. By taking advantage of

expert, experienced personnel to assist with application design, development, and

deployment, organizations can leverage the full benefit of Lucene/Solr search technologies

without the cost of licensing proprietary software.

For these reasons, Lucid Imagination provides commercial-grade support, training, and

professional consulting services that are essential to designing and installing successful

enterprise applications.

This paper is intended for business decision makers who are considering options for

powerful, flexible enterprise search solutions. It provides guidelines for understanding:

• Advantages of open source software, including ways it can lower costs and risks,

• Why Lucid Imagination’s service and support is a key ingredient in achieving successful

Lucene/Solr solutions,

• Engagement scenarios—the types of situations where Lucid Imagination can help, and

• The capabilities of Lucene/Solr, which are provided in an appendix.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 5

The Advantages of Open Source Open Source has changed the IT landscape. Gartner says 85 percent of polled companies

are already using open source software, calling the use of open source software

“pervasive.”2 Most organizations are now familiar with free and open source products such

as Linux, MySQL, Apache, and SugarCRM, because of the many benefits, including:

• Lower costs

• Pay at the point of value

• Transparent development

• Control and flexibility – investing in people instead of licenses

• Lower overall risk

With Lucene/Solr’s broad, successful adoption across markets and deployments, these

advantages are now available for enterprise search applications. Let’s take a closer look at

how open source pays off.

Lower Costs

While proprietary software vendors must try to recover their development costs, this is not

the case with open source software, because it does not have capital costs associated with

source code IP. The cost of talent is less, too. Community development, adherence to

standards, and lower barriers to adoption all help increase the number of developers who

become proficient in the use of a product or technology. Together, these factors combine to

reduce upward pricing pressure.

The high license fees associated with proprietary and closed source development can

discourage developers and customers from adopting a product or technology. In contrast,

open source communities help lower costs by encouraging participation and allowing

anyone to download the source code and try it out. Most open source communities release

2 http://www.theregister.co.uk/2008/11/18/gartner_open_source/

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 6

updated binaries on a periodic basis, so users can easily try the software on their own

timetables.

Many commercial solutions combine proprietary software with service and support, and

customers may believe that buying a software license is sufficient to get a search

application up and running. In most cases, however, the technology’s purchase price makes

up less than half of the implementation cost, with the balance going to services. Both open

source and proprietary software usually require a significant amount of customization,

which means some service and support costs are inevitable.

Pay at the Point of Value

Open source project code is freely available for any use. If a company can become proficient

with the code, it can make productive use the code at any phase from evaluation to

production. Only in those areas where an open source customer sees value—for support

and integration services, or for additional functionality or expertise—does money need to

be spent. There are no restrictions on when open source software can be used.

In contrast, proprietary products typically must be purchased before they can be used, or

in some cases, even evaluated. Some vendors offer evaluation or trial versions, but these

often have reduced functionality or restrictive licenses. Because the software must be

purchased before the customer can see any value from the product, return on investment is

delayed.

Transparent Development

Community-developed software enables everyone to see what is being built and which

features are included as early as possible. Developers and customers do not need to wait

for a vendor to publish a roadmap or product launch to know what is being readied for

release. As a result, prospective users can make better, faster, and more informed decisions

relating to their software infrastructure.

Compare this to proprietary software, where customers have little if any insight into

upcoming products until very late in the product life cycle. This is typically no sooner than

the software’s beta release, when it is too late to provide input on features and

functionality. This delays assessment and adoption of innovations.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 7

Re-tool the employees, retire the software

In this tough economic climate, managers who own budgets need to review every expense

with a critical eye. Many software applications that made sense a few years back may have

out-lived their intended fit to business needs.

Any application development effort generates significant learning. The work of

development imbues it requires the expertise of in-house developers with deep knowledge

and understanding of the company, its IT infrastructure, culture, and usage requirements.

Given that software applications must keep up with an organization’s changing goals and

requirements as the needs of its market and constituents evolve, the expertise which the

technical staff develops becomes is a vital competitive asset.

This is key corollary benefit of the open source model: by retiring old software packages

and investing in staff expertise, companies combine innovative technology with their most

valuable asset – their people, establishing vital competitive advantage.

Companies who leverage savings from not purchasing software licenses to build

development talent in-house reduce the cost of addressing inevitable change. What’s more,

increasing a technical team’s ability to translate company business objectives into

technology solutions increases the likelihood that the software they build will continue to

fit that inevitable change. This is particularly true for an enterprise search solution. What’s

more, compared to closed source implementations, in-house developers can work with

open source code and supplement additional functions or expertise by relying on the

community and marketplace of readily available resources – again capturing unique

competitive advantage.

Supplementing open source development with training, consulting, and reliable support

from established industry experts reinforces a company’s competitive advantage – with the

control and flexibility needed to survive and thrive.

Lower Overall Risk

Vendors use proprietary interfaces and components to lock in customers. However, the

source code for open source software is freely available and widely supported by the

community, based on standardized, free public interfaces. If a commercial vendor goes out

of business (or is purchased by another), or tries to increase fees for a commercial product,

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 8

open source vendors may be able to step in to meet the needs of customers at market-

competitive prices.

Open source software can reduce security and operational risks, too. Widely used open

source software is essentially under constant peer review. Technical or security issues,

once exposed in the community, are readily addressed, resulting in a safer and more

reliable product.

About Lucid Imagination The benefits of open source have unlocked tremendous value in many software categories:

Red Hat’s Enterprise Linux in operating systems, MySQL in database software, Sugar in

CRM software—all have benefited from matching the efficiencies of open source with deep,

robust commercial resources to ensure successful applications. Today, Lucid Imagination’s

capabilities and expertise brings that same approach to unlocking enterprise search with

Lucene and Solr.

Lucid Imagination’s mission is to enable customers to achieve business objectives for

optimal search performance and accuracy, with lower total cost of ownership and faster

time to market. The company’s founding team consists of many key contributors and

committers to the Lucene/Solr project, as well as other experts in enterprise search

application development. Our skills, acquired across hundreds of deployments, including

best practices and technical know-how, can enhance and optimize any phase of an open

source search implementation.

Lucid Imagination’s team has a deep understanding of indexing, which is the foundation of

any search solution; it captures all the content and location of searched documents for

quick lookup, much as a book index does. We have broad experience indexing:

• Documents of widely varying sizes and formats within a very large collection,

• Documents with diverse metadata requirements, and

• Multilingual documents.

The team is also skilled at applying business rules such as boosting documents and fields,

indexing dates, or other attributes of terms and data. Lucid Imagination has developed best

practices for indexing and metadata management, and can help establish and refine

policies to meet business and technical search requirements, such as:

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 9

• How and when to add documents to an index,

• Removing documents from an index,

• Results relevancy and document/data findability

• Undeleting documents, and

• Batch and real-time updates.

The Lucid Imagination team has extensive experience with large-scale search applications, including engagements with:

• Large collections—more than one billion documents,

• High query volumes and large user populations,

• High document growth rates,

• Distributed indexing and searching,

• Replication and high availability, and

• Cloud environments.

In addition to fine-tuning search technology machinery, the Lucid Imagination team has

significant expertise in natural language processing, which optimizes the interaction of

compute resources with human-created content. Key considerations include:

• Developing structured methods for characterizing how well a set of results meets user needs,

• Establishing a tradeoff between overall net gain in the quality of results across the whole application, versus a single improvement for one query or user, and

• Improving the ability to find accurate answers by leveraging a balanced mix of content analysis and query interpretation algorithms.

The breadth of expertise offered by Lucid is available in a variety of forms suited to a range

of different business needs and deployment requirements. This enables customers to

create even more powerful and successful search applications.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 10

Engagement Scenarios

Virtually every company and organization uses some form of enterprise search, to help

customers, employees, and partners find the information they need. Many companies use

packaged commercial software applications; but, over time, their requirements evolve

beyond the original platform’s limitations. Also, licensing or customization costs may grow

too high, or the number and type of documents may expand beyond the original design’s

capacity. As companies evaluate the ongoing fit of their current search applications to an

ever changing market and organizational landscape, they naturally ask “Is there a faster,

cheaper, more effective way to do this?”

Today, thousands of companies and organizations—each with unique search and retrieval

requirements—answered this question with Lucene/Solr. The essential value of Lucid

Imagination and open source Lucene/Solr technology is that it provides commercial

support that adapts to specific requirements. Whether a company is evaluating

Lucene/Solr for a new implementation, considering replacement of a commercial search

product, or enhancing an existing Lucene/Solr implementation, Lucid Imagination offers

skills and resources to help at every phase of the project life cycle.

Considering Alternatives to Legacy Packaged Search Applications

Change happens quickly, but taking advantage of new opportunities can be limited by

existing applications and traditional ways of doing things. Organizations with legacy search

applications often realize that they are paying too much to align packaged enterprise

search applications with evolving business requirements. In other cases, they discover it is

too difficult to integrate existing software with new services, or it takes too long to meet

new corporate goals. With the power of Lucene/Solr, Lucid Imagination supplies the

expertise organizations need to produce successful search solution efforts, more quickly

and less expensively—now and going forward—than other solutions.

• Consulting services are highly customized and able to engage quickly to shorten

cycles and ramp times, minimize errors and design pitfalls, and improve production

results. Lucid Imagination’s consulting team consists of senior search technologists

who are intimately familiar with Lucene/Solr technologies and have extensive

experience in field-tested search solutions for diverse deployment scenarios.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 11

Open source software is ideally suited to low-cost prototyping, because it can

reduce time to deployment and refine the user experience. For customers striving to

integrate a highly diverse base of data and documents, Lucid Imagination offers

prototyping services to assist with the process.

• Technical training can bring everyone in the IT department up to speed on best

practices and the elements of good search design—establishing a solid base of skills

before coding begins. This can greatly reduce downstream problems and reduce

overall costs. Lucid Imagination works with in-house application and system

administration teams to provide the knowledge transfer, guidance, training, and

support required to implement an enterprise search solution that fits the

organization’s specific needs.

• When dependable, predictable support is required to accompany an organization’s

efforts on a regular basis over time, Lucid Imagination’s support subscriptions

provide reliable access to domain experts during the entire application life cycle

process.

� Technical Support features the latest tested versions and timely,

predictable support turnaround times.

� Advanced Development Support provides expert architectural design,

development, and testing guidance for building search applications using

Lucene and Solr.

� Advanced Production Support provides expert advice on configuration,

performance tuning, and optimization for applications deployed to a

production operation environment with live users and service-level

attainment regimes.

� Search Health Check, included with Advanced Support, is a comprehensive

set of services that ensures applications are designed to meet recommended

best practices for search configuration, optimization, and effectiveness.

� Custom Support packages are also available for unique situations.

• Lucid Imagination’s free 30-Day Get Started Program is available with downloads of

Lucidworks, our certified distributions of Lucene and Solr. The Get Started Program

complements Lucidworks with added guidance for questions on first-time

installation, configuration, and basic usage, as well as evaluation of Lucene/Solr and

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 12

included utilities. LucidWorks for Solr is the logical starting point for most

developers building search applications with Lucene/Solr technology for websites,

products, or internal organizational use, because it bundles the most recent and

stable Apache/Solr capabilities, along with other tools and utilities.

Building on In-house Lucene/Solr Expertise

Many organizations with in-house Lucene/Solr expertise have achieved considerable

sophistication in their deployments. Still, they may reach a point where it is difficult to

move the architecture or implementation past a particular design, deployment, or

optimization constraint. There can be many reasons for this, such as limitations on staff

expertise, design, or architecture. Configurations and policies may not have kept pace with

current best practices. A dependent part of the IT environment may have changed—

anything from upgraded complementary applications to new middleware, or expanded

data volume and variety.

For organizations that are ready to gain the required knowledge to move ahead, address

the current situation, and make sure that a deployment stays at peak performance, Lucid

Imagination recommends an in-depth engagement. Typically in a consultative format,

engagement begins with an in-depth assessment and review followed by best practices

design recommendations, and ends with a strategy proposal for achieving long-term,

sustainable innovation for search solutions.

Another key area where Lucid Imagination stands ready to help is in optimizing

performance—both in application response time and its utilization of hardware/software

resources. Lucid Imagination experts work with in-house teams to diagnose and improve

search application efficiencies.

As mentioned earlier, a significant benefit of open source software is its ability to provide

fast, low-cost prototyping as a means to reduce time to deployment and refine the user

experience. For customers that seek to integrate highly diverse bases of data and

documents, or accelerate evaluations of open source search solutions, Lucid Imagination

offers prototyping services.

While community support has always been a significant benefit of open source projects,

tough issues may not always be answered in timely fashion or with the discretion

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 13

necessary to prevent exposure of confidential organizational knowledge. That’s when Lucid

Imagination’s expert teams can help.

Some companies are already skilled in open source technologies in general and

Lucene/Solr in particular. For these, Lucid Imagination offers Technical Support and

Advanced Support. Technical Support can provide answers within defined response times

for users encountering problems with Lucene/Solr projects or production

implementations.

Different levels of support address most situations. For example, an e-commerce startup

may find that community forums provide suitable answers, but not always as quickly as

needed. Basic Technical Support provides Web-based and e-mail support at competitive

rates for customers that do not require same-day response or direct telephone support.

Lucid Imagination also offers various levels of Technical Support for larger or mission-

critical installations, including fast turnaround, diagnosis, and bug fixes. Finally, Enterprise

Technical Support includes Search Health Checks by Lucid Imagination domain experts to

help ensure optimal runtime effectiveness.

Next Steps For more information on how Lucid Imagination can help employees, customers, and

partners find the information they need, please visit http://www.lucidimagination.com to

access blog posts, articles, and reviews of dozens of successful implementations. Please e-

mail specific questions to:

Support and Service: [email protected]

Sales and Commercial: [email protected]

Consulting: [email protected]

Or call: 1.650.353.4057

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 14

Appendix: About Apache Lucene and Solr Apache Lucene/Solr offers an attractive alternative to proprietary search and discovery

software vendors. Lucene is a Java technology-based search library and Solr is a platform

built atop Lucene that provides application builders with a ready-to-use search platform.

Both Lucene and Solr are free and open source. They are available under the Apache

Software License, which allows users to modify or embed the technology as they see fit, and

to keep, sell, and/or redistribute any resulting product.

Solr is the logical starting point for most developers building search applications with

Lucene/Solr technology for websites, products, or internal organizational use. Most users

building Lucene-based search applications will find it is quicker to start with Solr, since it

contains many of the capabilities needed to turn a core search capability into a full-fledged

search application.

The full-featured core Lucene search engine library offers:

• Speed: Sub-second performance for most queries. • Relevancy ranking: Out-of-the-box rankings are as good or better than the best

commercial competitors. • Complete query capabilities: Keyword, Boolean and +/- queries, proximity operators,

wildcards, fielded searching, term/field/document weights, find-similar, spell checking, multilingual search, and much more.

• Full results processing: Sorting by relevancy, date or any field, dynamic summaries, hit highlighting, and more.

• Portability: Runs on any platform supporting Java and indexes are portable across platforms. Indexes built on Linux can be copied to a Microsoft Windows machine where they can be searched. Lucene and Solr are written entirely in Java; .NET and other versions are also available.

• Scalability: There are production applications in the hundreds of millions that can search billions of documents/records.

• Low-overhead indexes and rapid incremental indexing.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page 15

The Solr platform adds the following capabilities:

• Web services: Solr places Lucene over HTTP, allowing programs written in any language to invoke Lucene.

• Faceting: The dynamic clustering of items or search results into categories enables users to drill into search results (or even skip searching entirely) by any value in any field, as seen on popular e-commerce sites such as Amazon or Zappos.

• XML-based schema: Manages indexed fields and their characteristics. • Admin tools: Configuration, data loading, index replication, statistics, logging and cache

management, and more. • Scalable: Distributed architecture enables large-scale distributed search. • Configurable: Fixed/paid result list placement.