24
Patrick Beaucamp Founder of the Vanilla Project Mail : [email protected] How to Gain Greater Business Intelligence with Vanilla from Solr/Lucene 1 LuceneRevolution, Boston

How to Gain Greater Business Intelligence from Lucene/Solr

Embed Size (px)

DESCRIPTION

Presented by Patrick Beaucamp | Bpm-Conseil - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Vanilla, an Open Source business intelligence application by bpm-conseil.com, offers unique features such as report indexing through an embedded Lucene integration. Using Vanilla and Lucene, developers can manage both report indexing and external document indexing, which ultimately saves end users time when they search for specific keywords such as "product code," or "customer code." Vanilla can build upon an existing Solr/Lucene installation that takes care of all the indexing processes while Vanilla takes care of the Reporting/Dashboard creation. During this presentation, attendees will learn how we moved from embed Lucene Api to a Solr/Lucene platform and all the technical and business benefits from this architecture in terms of clustering, caching and access mode.

Citation preview

Page 1: How to Gain Greater Business Intelligence from Lucene/Solr

Patrick BeaucampFounder of the Vanilla Project

Mail : [email protected]

How to Gain Greater Business Intelligence with Vanilla from Solr/Lucene

1LuceneRevolution, Boston

Page 2: How to Gain Greater Business Intelligence from Lucene/Solr

Presentation AgendaVanilla powered by Lucene- Report Indexation, Search Interface- External document management- evolution & constraints

Step to Solr/Lucene Adoption- Indexation, Storage, Search- Embeded Solr/Lucene- External Solr/Lucene Platform

Keys Benefit for Vanilla powered by Solr/Lucene- Cluster Architecture- Cache Mechanism- Support for enhanced search language

2LuceneRevolution, Boston

Page 3: How to Gain Greater Business Intelligence from Lucene/Solr

Flash maps and charts : Reports, Cubes and Dashboard

Vanilla Apps : Android and Iphone

Some Vanilla Features

3LuceneRevolution, Boston

Page 4: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (1/6)Vanilla is a full Business Intelligence Platform that provide :- Reporting, Olap, Dashboard, Kpi, Maps Visualisation- Etl, Workflow, Document Management search Engine

4LuceneRevolution, Boston

Page 5: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (2/6)Report Indexation- Search engine is Apache Lucene (summer 2010)- External Document & Vanilla Report are indexed- Different Indexation strategy for documents :

– No indexation– Real Time indexation– Late Indexation

2 modules to manage indexation strategy - Enterprise Services to set document property- Norparena to Manage Indexation

5LuceneRevolution, Boston

Page 6: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (3/6)Search Interface- Search Interface available from Vanilla Portal- Search against Lucene index (inside Vanilla)- Search result is combined with Security on documents

– List contains all documents– Documents are ordered based on popularity

6LuceneRevolution, Boston

Page 7: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (4/6)External document management- various document format are available (Lucene)- additional properties can be set on documents, for later useage in search criteria- check In / check Out on document for versioning- search is run on the latest document version

7LuceneRevolution, Boston

Page 8: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (5/6)Evolution and constraints- No clustering available for search engine (embeded Api), as opposed to Vanilla Report Services- Limitation in language and keywords (internal search)- No cache to manage search resultset, as opposed to Vanilla dataset, powered by Memcached

- request from customers to be compliant with enterprise search engine → need to setup an external search architecture

8LuceneRevolution, Boston

Page 9: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (6/6)

9LuceneRevolution, Boston

Embeded Lucene Api inside Vanilla Platform - Video

Page 10: How to Gain Greater Business Intelligence from Lucene/Solr

10LuceneRevolution, Boston

Step to Solr/Lucene Adoption (1/9)Solr/Lucene is the natural evolution of any embeded Lucene platform

Solr Version : 3.5

IndexationVanilla Lucene Index can be transfert & read by a Solr/Lucene(a Solr/Lucene index is not usable inside Vanilla Platform)

StorageVanilla search Indexed can be managed by a Solr/Lucene platform

SearchSearch language is compliant

Page 11: How to Gain Greater Business Intelligence from Lucene/Solr

11LuceneRevolution, Boston

Step to Solr/Lucene Adoption (2/9)Embeded Solr/Lucene inside Vanilla Platform

No need for any changed in Vanilla code : use of solrj Api

Immediatly provide additional features such as new Keywords

Potential upgrade to Solr/Lucene Enterprise

Page 12: How to Gain Greater Business Intelligence from Lucene/Solr

12LuceneRevolution, Boston

Step to Solr/Lucene Adoption (3/9)From Embeded Lucene to Embeded Solr/Lucene inside Vanilla Platform

Page 13: How to Gain Greater Business Intelligence from Lucene/Solr

13LuceneRevolution, Boston

Step to Solr/Lucene Adoption (4/9)Embeded Solr/Lucene inside Vanilla Platform - Video

Page 14: How to Gain Greater Business Intelligence from Lucene/Solr

14LuceneRevolution, Boston

Step to Solr/Lucene Adoption (5/9)Solr/Lucene Platform with a Vanilla Platform

Need for changes in Vanilla code, to separate document management, indexation & search Api → 10 man days workload

Document Management ApiEasy to move to any Cmis compliancy

Indexation & Search ApiSolr/Lucene oriented & compliant, but now open to any other Search Platform

Page 15: How to Gain Greater Business Intelligence from Lucene/Solr

15LuceneRevolution, Boston

Step to Solr/Lucene Adoption (6/9)Coding Before

Example of Code (Api) Before the split

- Direct use of the Lucene Api

- Parse the document content using Apache TIKA

- Generate Lucene's queries

Page 16: How to Gain Greater Business Intelligence from Lucene/Solr

16LuceneRevolution, Boston

Step to Solr/Lucene Adoption (7/9)Coding After

Example of Code (Api) After the split

- Easy to use Solrj Api

- Distributed search

- Indexation with automatic parsing (using Apache Tika)

Page 17: How to Gain Greater Business Intelligence from Lucene/Solr

17LuceneRevolution, Boston

Step to Solr/Lucene Adoption (8/9)Solr/Lucene Platform with Vanilla Platform - Screenshot

Page 18: How to Gain Greater Business Intelligence from Lucene/Solr

18LuceneRevolution, Boston

Step to Solr/Lucene Adoption (9/9)Solr/Lucene Platform with Vanilla Platform - Video

Page 19: How to Gain Greater Business Intelligence from Lucene/Solr

19LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (1/4)

Clustering Search Architecture, outside of Vanilla

Search results clustering implementation (CarrotClusteringEngine) is based on the Carrot2 framework.

Page 20: How to Gain Greater Business Intelligence from Lucene/Solr

20LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (2/4)

Additional query language to perform search

Solr Uses the Lucene Search Library and Extends it!

- A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys- Powerful Extensions to the Lucene Query Language- Faceted Search and Filtering- Geospatial Search- Advanced, Configurable Text Analysis

Page 21: How to Gain Greater Business Intelligence from Lucene/Solr

21LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (3/4)

New methods to manage result set (binary, Xml, Json)

Solr enterprise search server with a REST-like API. You put documents in it (called "indexing") via

XML, JSON or binary over HTTP. You query it via HTTP GET

and receive XML, JSON, or binary results

- Advanced Full-Text Search Capabilities- Optimized for High Volume Web Traffic- Standards Based Open Interfaces - XML,JSON and HTTP

Page 22: How to Gain Greater Business Intelligence from Lucene/Solr

22LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (4/4)

Cache Mechanism

Solr caches are associated with an Index Searcher

Three cache implementations : solr.LRUCache (LRU = Least Recently Used in memory),solr.FastLRUCache, solr.LFUCache (Least Frequenty Used)

Many configuration parameters for cache optimisation

Page 23: How to Gain Greater Business Intelligence from Lucene/Solr

23LuceneRevolution, Boston

Next StepsUpgrade to Solr 4.0

New features for Document cycle Management

Roadmap for better Internationalisation :- 10 languages available (not Japaneese)- Search Translation management

Page 24: How to Gain Greater Business Intelligence from Lucene/Solr

Documentations and tutorials available on our Web sites:

www.bpm-conseil.com and forge.bpm-conseil.com

Thanks for your attention

24LuceneRevolution, Boston