25

Inside Solr 5 - Bangalore Solr/Lucene Meetup

Embed Size (px)

Citation preview

October 13-15, 2015 • Austin, TXhttp://lucenerevolution.org

Inside Apache Solr 5

COMMUNITY

CUSTOMERS PRODUCTS

Apache Solr + Lucidworks

Search is more than just a box.

personal. contextual. actionable.

Search makes data

Search can be smarter.

location search history query security context

Personal, contextual, relevant results: consumer-like simplicity and power in the enterprise.

Product Offering

Environment

Features

Support Level

Additional Support

AvailabilityResponse Time

Number of IncidentsPricing Model

SolrEnterprise

24x7SLA-Backed

Unlimited IncidentsPer Node

Dev Support (4 Contacts)Operational Support

Regular Health Checks

SecurityLog Analysis / SiLK Support

Dashboards & ReportingEnhanced Admin UI

Fusion

Dev Support (4 Contacts)Operational Support

Regular Health Checks

24x7SLA-Backed

Unlimited IncidentsPer Node

SecurityCrawlers & Connectors

Log Analysis / SiLK SupportEnhanced Admin UI

Data EnrichmentMachine LearningRecommendations

Advanced Relevancy Tuning

DeveloperSupport

How-To SupportKnowledge BaseFusion Support

9x5SLA-Backed

Unlimited IncidentsPer Named Developer

ProductionDevelopment

• Get Started • Dig in • Go Big • Get Finished • Sneak peak

Inside Apache Solr 5

• Easy to start/stop

./bin/solr {start|stop}

• Create collections:

./bin/solr create -c <COLL_NAME>

• No more WAR! Web container (Jetty) is now an implementation detail

• Scripts to support installing and running Solr as a service on Linux.

Get Started

JSON’s great:

• Solr 5 “does the right thing” for JSON out of the box

Except when it isn’t:

• Most data isn’t JSON

• Solr handles CSV, XML, Rich Content out of the box without having to install plugins

Your Content, Your Way

Your Content, Your Way

• Solr 5 will ship Tika 1.7, adding:

• OCR support

• PST and Matlab

• Better Date Handling

• More flexibility with spatial units

Dig In

• Stats and Pivot faceting now work together

• Focused on accuracy of results

• First few steps in unification of all facet types with stats and aggregations

• http://lucidworks.com/blog/you-got-stats-in-my-facets/

Pivots and Stats

• Schema API: REST API for adding field types, and dynamic fields

• Managing Request Handlers through API

• Implicit registration of replication, Real Time Get and Administration Handlers

• Improved APIs for managing collections

API Goodness

Lucene 5 Highlights

• Stronger index safety guarantees

• Reduced memory usage in a number of areas

• No more FieldCache (replaced w/ UninvertingReader)

• Multi-valued sorting and suggesters

• Better IO defaults when using SSDs

• More efficient handling of merging stored fields

Go Big

• Many scaling improvements focused on interactions with Zookeeper:

• Split cluster state management reduces chattiness in large multi-tenant implementations

• Improved performance for Overseer operations >40%

• Better timeout defaults based on real-world testing

• See my Lucene Revolution Keynote for more details: http://bit.ly/shalinRevKeynote

Distributed IDF

• IDF = Inverse Document Frequency = A measure of the relative importance of a word in a collection

• 4 implementations:

• LocalStatsCache: Local Stats

• ExactStatsCache: One time use aggregation

• ExactSharedStatsCache: Stats shared across requests

• LRUStatsCache: Stats shared in an LRU cache across requests

• Ease of getting started means nothing if you can’t stay running in production

• Jepsen tests simulate network partitions, data loss, i.e. “The Real World”

• https://github.com/LucidWorks/jepsen/tree/solr-jepsen

• http://bit.ly/solr-jepsen

Get Finished

Stability Improvements

• Protection of ZK content

• ReplicationHandler now has an option to throttle the speed of replication

• More control over terminating long running queries

• Finite default timeouts for select and update requests

WELCOME TO THE FUTURE

• Facets and Analytics:

• Mix and match all facet types and stats (SOLR-6352, SOLR-6353, SOLR-4212)

• Percentiles via t-digest (SOLR-6350)

• Replication performance (SOLR-6816)

• Finish off Config APIs (various)

• Data location aware ValueSource implementation for fast changing distributed data

• First class support for more languages OOTB

Near Term Road Map

Resources

Release Notes: • Solr: http://wiki.apache.org/solr/ReleaseNote50 • Lucene: https://wiki.apache.org/lucene-java/

ReleaseNote50 Lucidworks: http://www.lucidworks.com Shalin Shekhar Mangar

[email protected] • Twitter: https://twitter.com/shalinmangar

Credits

What’s new in Solr 5.0 — Anshum Gupta • http://www.slideshare.net/anshumg/solr-50

Lucidworks webinar “Inside Solr 5” - Grant Ingersoll • http://www.slideshare.net/lucidworks/webinar-inside-

apache-solr-5