21
Distributed Database Architecture Search and Indexing Nick Kabra Distributed Database Architecture 1

Solr and ElasticSearch demo and speaker feb 2014

  • Upload
    nkabra

  • View
    154

  • Download
    1

Embed Size (px)

Citation preview

Distributed Database Architecture

Search and Indexing

Nick Kabra

Distributed Database Architecture 1

Presentation AgendaTeam Introduction

Basics and History

Use Cases & Current Usage

Highlights

Appendix

DISCLAIMER: This is a knowledge-sharing session and not a recommendation for any specific technology / product

From the web

Migration

Distributed Database Architecture 2

Team Introduction

Name: Designation: Experience with Search and Indexing:How long have you been working with Solr or ElasticSearch:

Distributed Database Architecture 3

Basics

1

2

3

4

• Used for Indexing and Searching • Built on top of Lucene API

• Solr and ES take Lucene API and build features on top. API accessed through web server

• Smaller version of Google which has indexed and ranked the web pages

Search platform for Web sites. Search platform for organization.

• Lucene – search engine packaged together in set of jar files

Distributed Database Architecture 4

History

• Differences in design and architecture.

Distributed Database Architecture 5

ES was released in 2010. Additional features.

Solr released in 2008.

Key Players: Solr and ElasticSearch

1

2

3

Latest Version= Solr 4.6.1 released on Jan 28, 2014

Collection – Main logical structure for Solr

Index – Main logical structure for ES

Architecture• Distributed• Fault tolerant and auto

replicas• Coord: Only ElasticSearch

nodes + zen discovery. Split brain.

• Single leader• Automatic leader election

Solr ElasticSearch (ES)

Latest Version= ElasticSearch1.0.0 released on Feb 12, 2014

Architecture• Distributed• Fault tolerant and auto

replicas• Coord: Apache Solr +

ZooKeeper ensemble. So quorum

• Leader per shard• Automatic leader election

Distributed Database Architecture 6

Resume recommendations

Use

Cas

e1

Challenge• Company ABC helps other firms hire skilled developers, project

managers. Empower customers to find the right job candidate from a database of 8 million profiles.

• Need fast and predictable performance.• Include geo-spatial.

Success• Customer hires using the company ABC.• ABC stores searches made by customers.• Identify candidates, skills, compensation structure to

enhance the customer search experience with better matches.

• Make recommendations to customers on salaries, future market needs etc.

• Eliminate duplicate profiles with realtime indexing and percolation.

• Provides enhanced customers experience, faster responses

Opportunity• Use ES as the search engine with realtime indexing

and nested querying.Point

Distributed Database Architecture 7

Integration - Use Case 2

THE FULL

CIRCLE

KibanaVisualization engine for dynamic dashboards created in real-time or on-the-fly

ElasticSearchSearch, analyze in realtime

LogstashTake logs, scrub, parse and enrich the data

Distributed Database Architecture 8

Chatagent for 460 million documents – Use Case 3

9

Challenge6,000 customers from around the world use LiveChat daily to communicate with their customers from one person owned businesses to international organizations like LG, Apple, Adobe etc.LiveChat customers conduct 3.6 million queries and 220 million “get” operations per day on 460 million documents. LiveChat keeps these documents updated with 70 million indexing operations every day.

Solution

Advantage

• Reduce query time from 2 seconds to 100 ms• Streamline updating from hours to seconds• Guarantee maximum uptime• Scale to meet the needs of 6,000 customers• Store and search on 460 million documents• Process 3.6 million queries per day

• Scalability, indexing, Full text search allows users to search through chat archives• Faceting makes it possible to pull various statistics for LiveChat clients.• ES acts as single datastore, data updates available immediately - Now each of the documents is updated in LiveChat on an average of 20 to

30 times every 20 to 60 seconds.

Distributed Database Architecture

Current Uses

1

2

3

4

• Use Case 1

• Use Case 2

• Use Case 4

• Use Case 3

x• Use Case X

10Distributed Database Architecture

Highlights

Schema and config –Solrconfig.xml, es.yml – change no. of shards and replicas live

Scaling - nodes autobalanced,/ Solr -3755 or shard splitting /add a document

Nesting (address, users & rights, boolean, parent children)

Index=different types of documents and analyzer

Point Node discovery and fault discovery. Zookeeper

PointMultiple documents per schemaand parent-child

PointPercolator

PointAggregation+facets in ES /Facets in Solr

Distributed Database Architecture 11

Highlights (contd. 2)

Auto-load balancer and auto-sharding

Marvel metrics on 03/13/2014

Brain Split problem in ES

Structured queryDSL and query control

Real-time indexing /near real-time indexing

Query routing and Solr 5816 to be introduced

1

2

3

4

5

6

Distributed Database Architecture 12

ElasticSearch / Solr funnel

UIMA

Text analysis debugger, spell check

Decision tree faceting / Drilldown

Cloudera, Mapr, DataStaxsupport Solr

Filters for queries across nested documents

Query handling analyzer and language, term suggester,autocomplete

Realtime GET with query routing

Hortonworks, Couchbasesupport ElasticSearch

Distributed Database Architecture 13

FROM THE WEB

Web CPAThis is only an FYI: Found some customers moving from Solr to ElasticSearch but could not find any article which mentioned that clients moved from ES to Solr.Caveat: No prejudice but it would be good to hear what customers say.

Let us also check this site: http://www.ymc.ch/en/why-we-chose-solr-4-0-instead-of-elasticsearch

http://www.mgt-commerce.com/magento-elasticsearch.html

Foursquare= http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-elastic-search-and-on-a-related-note-slashem-also-works-with-elastic-search/Jetwick= http://karussell.wordpress.com/2011/02/07/why-jetwick-moved-from-solr-to-elasticsearch/Netricos= http://www.netricos.com/blog/posts/how-we-are-using-elastic-searchStumbleupon = http://www.elasticsearch.org/case-study/stumbleupon/UK govt. site= https://gds.blog.gov.uk/2012/08/03/from-solr-to-elasticsearch/Wikimedia= http://thenextweb.com/insider/2014/01/06/wikimedia-will-replace-search-elasticsearch-beta-users-february-users-march-april/#!xDKnd

Distributed Database Architecture 14

2 Parts of a whole – The Math

Solr performs very well on small indexes that don’t change very often1

Scalability, auto-sharding, GUI admin, schemaless, real-time, nested queries, routing and the way indexing and queries are handled which provide faster execution of queries and better indexing provide a distinct advantage to using ES

2

Solr

ElasticSearch

Distributed Database Architecture 15

Migration

Step 1Use river plugin to migrate

from existing Solr to ES.

Step 2Pulls the content from

existing Solr cluster and index it in ES

Step 3When you decide to switch to

Elasticsearch permanently, you would obviously switch your indexing to directly index content from your

sources to Elasticsearch. Keeping Solrin the middle is not a recommended

setup.

Distributed Database Architecture 16

If we have a small site and need search features without the distributed bells-and-whistles, both Solr and ElasticSearch are efficient

If we are planning a large installation that requires running distributed search with nesting, scalability, sharding, real-time ElasticSearch can do a better job.

Conclusion

Distributed Database Architecture 17

Both products trying to catch-up based on other product’s capabilities

Where do we go from here ?---------------------------------------The best way to define this is: Some possible next steps….

Question to ask

Distributed Database Architecture 18

Thank you!

201-925-0488

[email protected]

Architecture – Global Head

Distributed Database Architecture 19

Questions session

.

Distributed Database Architecture 20

Appendix

.HYPERLINK

Distributed Database Architecture 21