49
Round 2 Battle of the Giants Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com VS

Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Round 2

Battle of the Giants

Rafał Kuć – Sematext Group, Inc.@kucrafal @sematext sematext.com

VS

Page 2: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Ich bin ein…

Sematext consultant & engineerSolr Cookbook series author„ElasticSearch Server” author„Mastering ElasticSearch” authorSolr.pl co-founderFather and husband

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 3: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Copyright 2013 Sematext Group. Inc. All rights reserved

VS

Page 4: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Under the Hood

Copyright 2013 Sematext Group. Inc. All rights reserved

Lucene 4.3Lucene 4.3

Page 5: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ExpectationsScalabilityFault tolerananceHigh availablityFeaturesManageabilityEase of installationTools Support

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 6: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Expectations vs Reality

Only ElasticSearch nodesSingle leader

Copyright 2013 Sematext Group. Inc. All rights reserved

Solr + ZooKeeperLeader per shard

DistributedFault tolerant

Automatic leader election

Page 7: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

All Time Top Committers

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 8: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Active Contributors

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 9: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

The Code

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 10: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

The Mailing Lists

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 11: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Trends

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 12: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Collection vs Index

Collections and Indices can be spread among different nodes in the cluster

Copyright 2013 Sematext Group. Inc. All rights reserved

Collection – main logical index

Index – main logical structure

Page 13: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Apache Solr Index Structure

Field and types defined in schemaAutomatic value copyingDynamic fieldsCustom similarityCustom postings formatMultiple document types require shared schemaCan be read using API

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 14: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Index Structure

Schema - lessFields and types defined with HTTP APIMulti – field supportNested and parent – child documentsCustom similarity Custom postings format Multiple document with different structureCan be read and written using API

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 15: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Shards and Replicas

Many shards0 or more replicasReplica can become leader Replicas can be created on live cluster

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 16: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Configuration

Static in solrconfig.xmlCan be reloaded with

core reload

Static in elasticsearch.yml

Changable at runtime

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 17: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Discovery

Copyright 2013 Sematext Group. Inc. All rights reserved

Zen DiscoveryApache Zookeeper

Page 18: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Solr & ZooKeeper

Requires additional softwarePrevents split – brain situationsHolds collections configurationsZooKeeper ensemble needed

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 19: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Zen Discovery

Automatic node discoveryMulticast and unicast discovery methodsAutomatic master detectionTwo - way failure detection

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 20: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

HTTP FTW

HTTP REST API in ElasticSearch or Query String for simple queriesHTTP with Query String in Apache SolrBoth provide specialized Java API

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 21: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Results Grouping

Group on: field value query result function query

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 22: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Prospective Search

Called PercolatorMatches documents to stored queries

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 23: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Full Text Search Capabilities

Variety of queriesControl score calculationDifferent query parsers Advanced Lucene queries

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 24: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Score Calculation

Leverage Lucene scoring Control importance of: documents queries terms phrasesSimiliarity configuration

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 25: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Apache Solr and Score Influence

Index - time boostingQuery - time

Term boostsField boostsPhrases boostFunction queriesSub-queries used for boosting

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 26: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch and Score Influence

Index - timeQuery - time

Different queries provide different boost controlsCan calculate distributed term frequenciesNegative and Positive boosting queriesCustom score filters

Scripts

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 27: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Query Rescore

Reorders top N hits by using other queryExecuted on shards before results are returned to the node handling itNot executed with scan and count

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 28: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Nested Objects

Indexed as separate documentsStored in the same part of index as root docHidden from standard queries and filtersNeed appropriate queries and filters (nested)Top level documents can be sorted on the basis of nested ones

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 29: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Solr Parent – Child Relationship

Used at query timeMulti core joins possible

select?q={!join from=parent to=id}color:Yellow

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 30: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Parent – Child

Proper indexing requiredIndexed as separate documentsStandard queries don’t return child documentsRetrieve parent docs using queries and filters (has_child, has_parent, top_children)

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 31: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

FiltersUsed to narrown down query results

Good candidates for caching and reuse

Copyright 2013 Sematext Group. Inc. All rights reserved

AddictiveCan use different query parsersCan use local paramsNarrows down faceting results

Defined using Query DSLCan be used for score calculation Doesn’t narrow down faceting results

Page 32: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Faceting

Copyright 2013 Sematext Group. Inc. All rights reserved

TermsRange & queryTerms statisticsSpatial distance

Pivot Histograms

Page 33: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Real Time Or Not ?

Get not yet indexed docs from transaction logDon’t need searcher reopening

Copyright 2013 Sematext Group. Inc. All rights reserved

Separate Get and Multi Get API

Separate Realtime Get Handler

Page 34: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Data Handling

Single and batch indexing supported

Copyright 2013 Sematext Group. Inc. All rights reserved

JSON in / JSON out(and YAML)

Different formats allowed (XML, JSON, CSV, binary)

Page 35: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Partial Document Updates

Not based on LUCENE-3837Server-side doc reindexingBoth servers use versioning Decreases network traffic

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 36: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Apache Solr Partial Doc Update

Sent to the standard update handlerRequires _version_ field

curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]'

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 37: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Partial Doc Update

Special end – point exposed - _updateSupports parameters like routing, parent, replication, percolate, etc (similar to Index API)Uses scripts to perform document updates

curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true }}'

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 38: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Solr Collections API

Collection creation reload deletion shards splitting

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 39: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Indices REST API

Index creation deletion closing and opening refreshing existence checking

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 40: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Apache Solr Shard Splitting

Copyright 2013 Sematext Group. Inc. All rights reserved

admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1

Page 41: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Cluster State Monitoring

Copyright 2013 Sematext Group. Inc. All rights reserved

Multiple MBeans exposed by JMX

Multiple REST end – points exposed to get different statistics

Page 42: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Statistics API

Health and state checkNodes informationCache statisticsSegments informationIndex informationMappings information

Copyright 2013 Sematext Group. Inc. All rights reserved

SPM – „One to rule them all”

Page 43: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Cluster Settings Update

Control rebalancing recovery allocationChange cluster configuration properties

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 44: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

ElasticSearch Custom Shard Allocation

Cluster level:

Index level:

curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" }}'

curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo"}'

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 45: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Moving Shards and Replicas

Move shards between nodes on demand

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }'

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 46: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Copyright 2013 Sematext Group. Inc. All rights reserved

The Verdict

Page 47: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

And The Winner Is ?

The Users

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 48: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

We Are Hiring !

Dig Search ?Dig Analytics ?Dig Big Data ?Dig Performance ?Dig working with and in open – source ?We’re hiring world – wide !

http://sematext.com/about/jobs.html

Copyright 2013 Sematext Group. Inc. All rights reserved

Page 49: Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

Copyright 2013 Sematext Group. Inc. All rights reserved

Rafał Kuć @kucrafal [email protected]

Sematext @sematext http://sematext.com http://blog.sematext.com

ElasticSearch Server 25% off:MREESS25

Thank You !