71
Solr/ ElasticSearch for CF Developers By Mary Jo Sminkey

Solr/Elasticsearch for CF Developers (and others)

Embed Size (px)

Citation preview

Page 1: Solr/Elasticsearch for CF Developers (and others)

Solr/ElasticSearchfor CF Developers

By Mary Jo Sminkey

Page 2: Solr/Elasticsearch for CF Developers (and others)

Who Am I?• Senior Web Developer at CFWebtools, LLC

• ColdFusion Developer since CF3/Allaire

• Cosplayer

• Dog Trainer

• Sewer/Baker/Knitter/Origamist/Ass. Crafts

• Cancer Survivor

• Fibromyalgia/Invisibile Disabilities Advocate

Page 3: Solr/Elasticsearch for CF Developers (and others)

What is Solr? • Standalone Full-Text Search Engine with Apache Lucene Backend

• Open-source, distributed, highly scalable, enterprise grade search

• http://lucene.apache.org/solr/

• Included in ColdFusion since CF9 replacing Verity (cfsearch/cfindex)

• Thru CF11 – Solr 3

• CF2016 – Solr 5 (5.1.2)

• Current Release of Solr is version 6.2.0

Page 4: Solr/Elasticsearch for CF Developers (and others)

Why use Solr/ES instead of CF tags? • Any CF version prior to 2016 has ancient Solr 3.x versions

• Full Access to latest Solr/ES versions and patches

• Ability to use cloud-based distributed setups (essential for enterprise sites)

• Access to far more features and use of REST/JSON

• Code more easily converted to other search engines and languages

Page 5: Solr/Elasticsearch for CF Developers (and others)

Solr vs. ElasticSearch – What should I use?• Solr has been around a lot longer so more mature, very well documented and

has strong backwards compatibility. Developers often mention ES not nearly as well documented so plan on investing in other sources to really get a handle on it.

• ES being younger is built on modern standards and ideas (particularly REST), designed specifically for handling large indices and high query rates, and since it isn’t as strictly commuinty driven often can move forward quicker with new features, bug fixes, etc. (although co

• Both have very active communities and are very actively still being developed and moved forward. Solr in particular has pretty much caught up to many of the advances brought by ElasticSearch entering the marketplace such as full REST support.

Page 6: Solr/Elasticsearch for CF Developers (and others)

Solr vs. ElasticSearch – What should I use? (cont)• Solr excels at text-search applications, ElasticSearch for analytics (lots of

monitoring and metrics exposed).

• In areas like log analysis, ES is by far the more common choice to use. This is due to its very advanced “aggregations” framework, which replaced earlier faceting.

• https://www.elastic.co/blog/out-of-this-world-aggregations

• Solr uses a terse syntax, vs. ES which is much more verbose

• This makes ES generally more readible, but the terse syntax of Solr make more advanced relevancy possibilites easier to handle, of particular interest in text-search applications.

Page 7: Solr/Elasticsearch for CF Developers (and others)

Solr vs. ElasticSearch – What should I use? (cont)• SearchComponents in Solr allow for much more easily customizable

searches that can be easily reused across multiple applications or within an application.

• ES generally considered a bit easier to get started with and do clustering etc. Solr generally requires a bit more work to get your head around, forces you to read over and learn the config files to get running for instance - but this is not necessarily a BAD thing.

• If you are going to use REST, both now have excellent support although ES more REST-compliant. But if you plan to go another route, Solr tends to have better support, for instance it has excellent Java support via the Solrj library.

Page 8: Solr/Elasticsearch for CF Developers (and others)

Amazon CloudSearch• There are some other Lucene-based searches you can consider.

• Most popular of these is Amazon CloudSeach

• Easy to set up, AWS managed service with automatic scaling

• Provides most commonly used text-search features like highlighting, autocomplete, simple faceting, grouping, geospatial search, etc.

• It is considerably more limited that Solr and ES when it comes to doing advanced search relevancy tuning and/or advanced metrics.

Page 9: Solr/Elasticsearch for CF Developers (and others)

Solr vs. ElasticSearch – More Reading• http://solr-vs-elasticsearch.com/

• https://sematext.com/blog/2015/01/30/solr-elasticsearch-comparison/

• https://www.datanami.com/2015/01/22/solr-elasticsearch-question/

• http://opensourceconnections.com/blog/2015/12/15/solr-vs-elasticsearch-relevance-part-one/

• http://opensourceconnections.com/blog/2016/01/22/solr-vs-elasticsearch-relevance-part-two/

• http://harish11g.blogspot.com/2015/07/amazon-cloudsearch-vs-elasticsearch-vs-Apache-Solr-comparison-report.html

Page 10: Solr/Elasticsearch for CF Developers (and others)

So let’s look at the features of Solr(particularly Solr 4+ versions)

• Full REST API for schema management, indexing, searching, etc. (Solr 5+)

• Wide variety of built in tokenizers and analyzers

• Grouping, faceting, highlighting, spelling suggestions, autocomplete

• Filtering, document and field boosting, custom ranking, etc.

• Near Real-Time Indexing

• Extensible Plugin Architecture

Page 11: Solr/Elasticsearch for CF Developers (and others)

Solr 6 Features• Parallel SQL – The big WOW feature of version 6 is bringing SQL support

to Solr which works across SolrCloud collections. This is done by a SQL parser that converts SQL queries to Solr streaming expressions.

• SQL Request Handler – SolrCloud collections can be queried with standard SQL language using the /sql request handler.

• JDBC Driver – Connect to the SolrCloud collections with any tool that supports JDBC and query the collection directly

• Still somewhat experimental and not quite ready for primetime usage but improving rapidly.

Page 12: Solr/Elasticsearch for CF Developers (and others)

Solr 6 Features (cont)• Many other improvements and advancements with streaming

expressions (Merging search with parallel computing, across multiple sources)

• Push/Pull streaming, request/response streaming

• Solr collections able to auto-update itself via these kinds of streaming commands.

• https://sematext.com/blog/tag/streaming-expressions/

Page 13: Solr/Elasticsearch for CF Developers (and others)

Let’s Focus on Text Searches!• This is primarily what CF developers would have been using Solr

integration for – cfsearch/cfindex

• While a number of things we’re going to look at are included in the CF integration, you can do a lot more once you move to standalone Solr.

Page 14: Solr/Elasticsearch for CF Developers (and others)

Our Target Site and Objectives• Classic industries (classicindustries.com)

• Ecommerce Site for Classic Car Parts

• Customers can select a car model (catalog) and year to filter their search

• Single text box search that needs to search across multiple fields but return the best possible matches

• Search pages need to also include data like breadcrumb trail, category menus, nested structure of category totals, etc.

• We would like to add additional elements like spelling suggestions and highlighting.

Page 15: Solr/Elasticsearch for CF Developers (and others)

Step 1 - Schema• The schema defines the fields and their types that will be indexed for

searching.

• Solr/ES can both be used schema or schema-less, support dynamic field types, etc. Typically you would only use schema-less for development and then switch to the managed schema for production.

• Solr 5 and up can handle most schema changes via the REST service.

• You can also make schema changes via the Solr admin console

Page 16: Solr/Elasticsearch for CF Developers (and others)

Sample REST – Add Field TypePOST /schema Content-type: application/json

{ "add-field-type": {

"name": "simpleTextSpell", "class": "solr.TextField", "positionIncrementGap": 100, "indexAnalyzer": {

"tokenizer": { "class":

"solr.StandardTokenizerFactory" }, "filters": [

{ "class": "solr.LowerCaseFilterFactory"

}, {

"class": "solr.RemoveDuplicatesTokenFilterFactory"

} ]

} }

}

Page 17: Solr/Elasticsearch for CF Developers (and others)

Sample REST – Add FieldPOST /schema Content-type: application/json

{ "add-field": {

"name": "simpleSpell", ”type": "simpleTextSpell", ”indexed": true, ”stored”: true

}}

Page 18: Solr/Elasticsearch for CF Developers (and others)

Schema - Analyzers, Tokenizers and Filters• These are used to tell Solr how to prepare the text string for indexing

(and/or quering).

• Proper handling of this step is essential for good search results.

• While Solr has a lot of built-in field types for text fields, you may oftenneed to add your own field types to get the best results.

• With simple text fields, you often will use the same analyzer for the indexing and the query steps. The more complex handling your field needs, the more likely you may need different analyzers for the indexing vs. the querying.

Page 19: Solr/Elasticsearch for CF Developers (and others)

Schema - Sample Analyzers<fieldType name="nametext" class="solr.TextField">  

<analyzer type="index">     <tokenizer class="solr.StandardTokenizerFactory"/>    <filter class="solr.LowerCaseFilterFactory"/>     <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>

 <filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>   </analyzer>   <analyzer type="query”>

<tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/>

   </analyzer></fieldType>

Page 20: Solr/Elasticsearch for CF Developers (and others)

Schema - Tokenizers• Tokenizers determine how the text string will be split up into “tokens”.

A common first step is to split the string on whitespace and/or punctuation (sentences, etc. split into the individual words so we can search on them instead of the entire */string).

• Solr includes a whole variety of tokenizers includes ones designed for specific kinds of data, like file paths or email addresses, as well as to handle multi-lingual text.

• You can also process your token using regular expressions…. Or return the entire field as a single token.

Page 21: Solr/Elasticsearch for CF Developers (and others)

Schema - Filters• Filters are used after tokenizing to further manipulate your data.

• Some common filters actions are to convert everything to lowercase so searches aren’t case-sensitive and discarding common words that aren’t useful in searches (a, and, the, etc.)

• A dictionary filter might be used on a field that you intend to use for spelling corrections.

• You can also use a synonym filter to create word mappings to match on.

Page 22: Solr/Elasticsearch for CF Developers (and others)

Schema - CharFilters• Unlike regular filters, charFilters are used PRIOR to tokenizing your

data.

• You might use these to do things like strip out HTML tags, comments, or any other text you don’t want your search to find.

• Solr includes both a charFilter specifically for removing HTML markup as well as a Regex style replace filter.

Page 23: Solr/Elasticsearch for CF Developers (and others)

Schema - Copy Fields• Your schema can include fields which just copy data from other fields to

index. Typically they will have a different set of analyzers to manipulate the data.

• For example, I may want my search to return matches on the original words higher than the ones that match on a synonym. To do this, I would use a field copy for the synonym matches which will give lower ranking.

• Another common use is for spell checking in which you may want to copy text fields that use different field types to the one you use for spellchecking.

Page 24: Solr/Elasticsearch for CF Developers (and others)

Schema in Place, Let’s Index Our Data!• In the past most CF code would use the SolrJ library for standalone

Solr work.

• This is still an option but now we have REST as an alternative. The REST libraries are generally much easier to work with since you don’t have to figure out all the nested methods that are holding your data, it’s just all returned in a simple JSON object.

• There is very little now in Solr that you cannot do through REST, including adding or modifying cores, making all schema changes, and of course indexing and searching.

Page 25: Solr/Elasticsearch for CF Developers (and others)

Schema in Place, Let’s Index Our Data! (cont)• Solr’s REST integration has been continually improved but you may still run

into some gotchas. For instance, the handling of multiple documents in an indexing request when you need to include additional parameters like a custom boost makes it impossible in most languages to simply convert a native object to the necessary JSON object (multiple name-value pairs in an object with the same name).

• ColdFusion has its own quirks (bugs) that you have to watch out for. The most common one you’ll run into is it trying to treat a string that is all numbers as a numeric value and not wrapping it in quotes. A typical hack is to add some string in front of any such field prior to CF serializing and then doing a search and replace to remove it prior to the REST request.

Page 26: Solr/Elasticsearch for CF Developers (and others)

Sample REST – Indexing DataPOST //update?wt=jsonContent-type: application/json

{ "add": {

{ "doc" : {

”productname" : ”Sample AC Part",“sku”: “AC234”, ”catalogs" : [”Camaro”, “Impala”,”Firebird”], "id" : 8753 }

} , “boost”: 2.0

} "add": {

{ "doc" : {

”productname" : ”Sample Body Part",“sku”: “BD495”, ”catalogs" : [”Camaro”], "id" : 968944}

}}

}

Page 27: Solr/Elasticsearch for CF Developers (and others)

More on Indexing• Adding and updating data uses the same request format, if you

include a key that is already in the index, Solr will update it.

• However keep in mind that if you are making schema or major data changes, that updating won’t REMOVE old keys. To do so you need to either locate those and send delete requests, or you need to purge your data and then do a clean re-indexing (advantage of SolrCloud over using same server to index and search).

• You can delete either by key, or by query. For example, purge all data with deleteByQuery('*.*')

Page 28: Solr/Elasticsearch for CF Developers (and others)

More on Indexing (cont.)• You can also index bulk data using a data import handler, which can be

done in a number of formats.

• By default Solr doesn’t have any security on the admin or managed schema, so you will want to lock it down for production servers.

• The solr config allows you to specific auto-commit times, replication to slave servers, etc.

• Soft auto-commits can be used so updates can be made live almost immediately without the overhead of doing a hard commit (near real-time search).

Page 29: Solr/Elasticsearch for CF Developers (and others)

We Have Data, Now Let’s Search!• Solr comes with some built-in request handlers you can use or

customize, or you can add your own.

• The request handler configuration determines what settings, defaults and components (like spellcheck) are available for requests to that handler.

• The simpliest search is just to send the query parameter “q” to the search request: /select?q=front+bumper

• Search results can be returned in a variety of formats, including json, xml, csv and language-specific formats like php and ruby.

Page 30: Solr/Elasticsearch for CF Developers (and others)

Query Parsers• The parser used determines what parameters you can use.

• For text searching you generally will use the Dismax or Extended Dismax parsers, which allow for improving the relevance of your search results.

• Dismax includes term boosting, phrase boosting and minimum-should-match parameters among others.

• Extended Dismax extends this with even more boosting options including field boosting, more phrase boosting options, proximity boosting, and ignoring stopwords at query time.

Page 31: Solr/Elasticsearch for CF Developers (and others)

Filters• All types of Solr query parsers support filters.

• This is the most basic way of restricting what documents to search.

• In our sample site application, we add filters based on things like the catalog and year the user selects, if they are looking for new or outlet products, if they have drilled down into a category, etc.

• You can have any number of filters and they can include complex boolean expressions.

Page 32: Solr/Elasticsearch for CF Developers (and others)

Filter Examples• fq=catalogid:1

• fq=year:1967

• fq=newproduct:true

• fq=catalogid:(1 TO 15)

• fq=(discontinueflg:N OR availablecount:[1 TO *])

Page 33: Solr/Elasticsearch for CF Developers (and others)

Search Relevance• This is a topic we could spend an entire day on.

• Many enterprise sites have regular search audits and do extensive analysis to look at their relevancy scores and how to improve them.

• We’ll take a quick look at our example site and some things that Solr allows us to do in order to improve our search relevancy.

• We are using Extended Dismax for the maximum possible options for controlling search relevancy.

Page 34: Solr/Elasticsearch for CF Developers (and others)

Search Relevance (cont)• By default Solr is scoring documents by how many times the search

terms are found.

• What we want to do is “boost” fields and documents, etc. that we want Solr to place more emphasis on.

• We want to also look at how to handle searches that include multiple terms to search on (phrases).

Page 35: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – Example App Fields• Product Number (SKU) – we want to put matches on the SKU right at

the top in searches.

• Product Name – next most important is matches on the product name. All the most relevant search terms are included in the product name.

• Keywords – custom keywords have additional search terms and abbreviations we want to match for the product so are fairly relevant.

• Product Info – this is full description of the product that can be used for searches but due to the extensive amount of text and non-related words it can have, it’s of fairly low importance

Page 36: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – Synonyms• Solr has support for synonyms which allow you to map words to

similar ones that you want it to also consider a match.

• Synonyms can be one-way or bi-directional. For instance if there is a common misspelling people use in a search, you would map that in one directon only, to the correct spelling.

• Solr does not properly handle multi-term synonyms (see the ‘sea biscuit’ problem). This is a long-standing bug and there are some plugins to try and correct for it but they often result in issues with more complex relevancy setups.

Page 37: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – Sample App Synonyms• Since we want matches on the original search term to always appear

higher than matches for synonyms, we need to copy the fields used so we can boost them separately. These fields only need to be indexed, not stored.

• prodnamesynonym – Product Name Synonym Field. This will get a boost high enough to help matches appear above most of the other fields, but not as high as the original product name field.

• proddatasynonym – Additional Product Data Synonym Field. We’ll copy all the other text fields to this one and give it the lowest boost score.

Page 38: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – Boosting• The default value that Solr gives for boosts is 1.0

• Solr does not support negative boosts but anything below 1.0 is basically a negative boost based on the default.

• Keep in mind as well that Solr is going to score documents on how often the search terms appear as well. You can use a filter in your schema to remove duplicate tokens if you don’t want it to do this.

• The boosts for your query are set on the “qf” parameter which tells Solr which fields you want to query.

Page 39: Solr/Elasticsearch for CF Developers (and others)

Sample App Boostingprodnumbertext^20.0prodname^10.0 prodnamesynonym^5.0 keywords^2.0 productinfo^1.0 proddatasynonym^0.25

http://localhost:8983/solr/classic/select?q=front+bumper&defType=edismax&qf=proddatasynonym^0.25+productinfo^1.0+keywords^2.0+prodnamesynonym^5.0+prodname^10.0+prodnumbertext^20.0

Page 40: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – Phrase Boosting• Phrase boosting is used for multi-term searches.

• By default, Solr will score documents the same no matter where the search terms appear in the documents.

• Phrase boosting allows you to score higher the documents where the search terms are appearing next to, or close to, each other.

• The original phrase boost from the Dismax query parser boosts only for all search terms being close together. Edismax adds options for 2 and 3-word phrases in your search terms.

Page 41: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – Phrase Boosting (cont)• The phrase slop setting is used to set how far away terms can be in

order to be consider a match for phrase boost.

• If you set 2 and 3 word phrase boosting, you can use different slop settings for them.

• Phrase boosting doesn’t have any effect on what documents are returned by the search, ONLY how they get scored.

Page 42: Solr/Elasticsearch for CF Developers (and others)

Sample App Phrase Boostingprodname^50.0 prodnamesynonym^25.0 keywords^10.0 productinfo^5.0 proddatasynonym^0.25

http://localhost:8983/solr/classic/select?q=front+bumper&defType=edismax&pf=proddatasynonym^0.25+productinfo^5+keywords^10+prodnamesynonym^25+prodname^50&pf2=proddatasynonym^0.25+productinfo^5+keywords^10+prodnamesynonym^25+prodname^50&pf3=proddatasynonym^0.25+productinfo^5+keywords^10+prodnamesynonym^25+prodname^50&ps=1

Page 43: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – More Boosting• There are other boosting options, such as boosting a specific term in

the search, telling Solr to boost the documents that match that particular term over other terms in the search.

• You can also create complex functions for boosting documents.

• Another common boost is to apply one during indexing to specific documents. For instance in our classic car site, we apply a boost to the products that are our best sellers, so that all things being equal, those products will appear higher in searches.

Page 44: Solr/Elasticsearch for CF Developers (and others)

Search Relevance – More Boosting• There are other boosting options, such as boosting a specific term in

the search, telling Solr to boost the documents that match that particular term over other terms in the search.

• You can also create complex functions for boosting documents.

• Another common boost is to apply one during indexing to specific documents. For instance in our classic car site, we apply a boost to the products that are our best sellers, so that all things being equal, those products will appear higher in searches.

Page 45: Solr/Elasticsearch for CF Developers (and others)

Minimum-Should-Match• Solr supports both AND and OR searches which you can set with the q.op parameter.

• When doing an OR search with multiple search terms, you can also set the minimum number of terms that have to be matched to return a document.

• For instance if mm=75% and the user enters 4 search terms, then only 3 have to match to return the document.

• This helps ensure that as users enter more search terms that have a higher chance of not matching, you can make sure the results still require matching on more than just a single term (typical OR search) without causing a high percentage of failed searches.

• There are various options for how to set the minimum match, and you can customize it based on the number of search terms that were entered.

Page 46: Solr/Elasticsearch for CF Developers (and others)

Example Minimum-Should-Match• Match 75% of the terms

mm=75%

• 25% of the terms can be missing (same result as previous)mm=-25%

• For more than 1 term, allow 1 to be missing. For more than 4 terms, allow 2 mising. For more than 6 terms, allow 33% missing. mm=1<-1 4<-2 6<-33%

Page 47: Solr/Elasticsearch for CF Developers (and others)

Result Grouping• Allows you to group results with a similar value in a specific field together.

• This is something you may commonly have to do for something like an ecommerce site where a single product appears in multiple categories but you only want to show (or count) one copy of each product.

• Solr gives you a very wide range of options over how to handle the groups.

• You can only group on a single field which should be indexed and a text field.

• The best way to show off what grouping can do is look at our example site.

Page 48: Solr/Elasticsearch for CF Developers (and others)

Result Grouping - Example• Our classic car parts site has products that appear in multiple car

model catalogs as well as in multiple categories (3-tiers of categories).

• We’ve indexed them based on this combination of unique product id, catalog id and 3 category ids.

• So when we search, we may get multiples of a specific product and need to group on the product id to get accurate product counts.

• We also want to simplify the groups as much as possible so that we don’t have to do a lot of extra work to get to the data.

Page 49: Solr/Elasticsearch for CF Developers (and others)

Group Parameters for Search<cfscript>

var params = {};params["group"] = true;params["group.field"] = ‘product_id’;

// tells Solr to return the total number of groups found//this will be our total number of products foundparams["group.ngroups"] = true;

//this gives us the grouped documents in a flat listparams["group.format"] = 'simple';

//since the groups are just copies of the same product, //we only need one document in each group params["group.limit"] = 1;

</cfscript>

Page 50: Solr/Elasticsearch for CF Developers (and others)

Result Grouping – Cont.• This is just one way to use groups. You can create groups based on

functions and do much, much more with them.

• You can sort both the list of groups (based on the most relevant document in each) as well as inside the groups themselves.

• Likewise paging can be done both inside and external to the grouping.

Page 51: Solr/Elasticsearch for CF Developers (and others)

Collapse and Expand Results• These are an alternative to result grouping that are useful particularly

for displaying collapsed search results.

• You provide the field to “collapse” on and Solr will provide results with a single document per group for that field.

• The expand is then used to tell Solr to return the same query but this time with an “expanded” section that includes all the documents in the groups.

Page 52: Solr/Elasticsearch for CF Developers (and others)

Faceting• Faceting is widely used in commerce sites to show the customer result

counts for numerous search criteria.

• This is where a search engine like Solr really shines over using database-only solutions that require additional queries and often extensive processing to come up with the same data.

• You can request any number of facets as part of your query, and get back result counts for each of them.

• Solr provides a number of different ways to do faceting, we’ll look at some of the most common.

Page 53: Solr/Elasticsearch for CF Developers (and others)

Faceting – Field/Value• This facets on a single field based on the value.

• For text fields, you typically don’t want to do any stemming, etc. but just tokenize the value in the field as-is. In cases where you want to search on the field and do more analysis as well, a copy field can be used for the faceting.

• You can restrict matches based on criteria like a specific prefix or ones that contain a specific string.

• You can set a specific number of matches needed to include the facet, and even whether to include a count of documents that are missing a value for the field.

Page 54: Solr/Elasticsearch for CF Developers (and others)

Faceting – Field/Value Example Site• Back to our classic car site. When viewing our product results, we also want

to get a list of the current categories the product appears in and use that for a side menu.

• For this example we’ll just look at how we do the top-level category menu.

• As with groups, you generally want to facet on a field that has very little tokenizing, etc. on it so that Solr returns the original, unmodified values in the field.

• You can sort the facets by the count (highest first) or alphabetically (index sort).

Page 55: Solr/Elasticsearch for CF Developers (and others)

Faceting – Field/Value Example?q=front+hood&json.facet={       categories: {type:terms, field:categoryname}

}Sample Return:categories= [ { val = ‘Body Components’, count=50 },

{ val = ‘Hood Components’, count=10 } ]

Page 56: Solr/Elasticsearch for CF Developers (and others)

Faceting – Range• Another common facet method is range facets. You can use ranges on

any date or numeric field.

• A common use case is to return counts in various price ranges

• In addition to start and end parameters, you can further configure how Solr will facet the ranges by setting a gap to divide by and how to handle edge cases.

• You can also configure it to include counts for values that fall outside the range.

Page 57: Solr/Elasticsearch for CF Developers (and others)

Faceting – Range Examplejson.facet={ price_ranges = {'type' ='range',

'field' = 'price', 'start'= 0, 'end'= 1000, 'gap'= 250 }; }

Result:price_ranges = [ { val = 0, count=50 },

{ val = 250, count=125 }, { val = 500, count=72 }, { val = 100, count=52 } ]

Page 58: Solr/Elasticsearch for CF Developers (and others)

Faceting – Use With Filters• One issue you may run into with facets is when you use them to provide filters for

the customer.

• When you apply the filter, the other options drop out of your search, as well as your facets.

• For example, if I drill down into a category, the result set only has that category in it (and so the facet for categories has only that one category as well). But what if I want to still show a menu of ALL available categories for that search so the user can change categories?

• Your first thought might be that we’ll have to do another search without the filter… but WAIT! We actually can do this in the same search request.

Page 59: Solr/Elasticsearch for CF Developers (and others)

Faceting – Domain• Solr 5.2+ allows you to include the domain for the facet. This allows you to expand your

faceting outside the “domain” of the main search.

• To do this, we’ll first add a name to the filter that selects the category. fq={!tag=cat}categoryid:100

• Now when we set the facet for the category, we can tell it to ignore the filter for the category:

json.facet={       categories: { type:terms,

field:categoryname, domain: { 'excludeTags' = 'cat ' } } }

Page 60: Solr/Elasticsearch for CF Developers (and others)

Faceting – Pivot Facets• Also known as decision trees, these are multi-level facets.

• Return counts for field ‘foo’ for each different field ‘bar’ and so forth.

• Pivot facets can be requested fairly simply in a query, just by passing the list of fields to pivot on:

facet.pivot=category,subcategory,subsubcategory

Page 61: Solr/Elasticsearch for CF Developers (and others)

Faceting – Subfacets• Subfacets are a newer version of pivot facets (Solr 5+).

• With subfacets, you can nest any kind of facet under any other kind of facet, with completely separate settings and criteria of its own.

• Designed specifically with JSON in mind.

• Response format easier to handle.

• Lots of other improvements to allow for more advanced sorting, calcuations, etc. on all levels of the facets.

Page 62: Solr/Elasticsearch for CF Developers (and others)

Faceting – Example Subfacetsjson.facet={    categories: { type:terms,

field:categoryname, domain: { 'excludeTags' = 'cat,subcat ' }, facet: { subcategories: { type:terms,

field:subcategoryname, domain: { 'excludeTags' = 'cat,subcat ' }

} }

}}

Page 64: Solr/Elasticsearch for CF Developers (and others)

Spellchecking• You can specify in your search that you want to receive spelling

suggestions in the results.

• Generally the field you want to use for spellchecking will have minimal tokenizing and stemming on it. You often will want to use a copy field that can be used specifically for the spellchecking.

• There’s a lot of parameters to set for spellchecking, which I won’t go over here, but you’ll probably need to play around with them to see what will work best for your application. There is some performance hit for returning spell suggestions so you may want to turn them off when not needed.

Page 65: Solr/Elasticsearch for CF Developers (and others)

Spellchecking (cont)• You need to make sure the spellcheck component is enabled for the

request handler you are using. If you don’t intend to change the spellcheck parameters at all during searches, you may want to just set them as defaults in the request handler.

• Be aware that the spellchecking dictionary is not built automatically. You can send the paramater spellcheck.build=true on the url to rebuild it, or in the solr config you can set it to be rebuilt automatically on commit and/or optimize. Building on commit is generally not a good idea in production systems.

Page 66: Solr/Elasticsearch for CF Developers (and others)

Highlighting• You can also have Solr highlight the terms it matched in the search results.

• Again, there’s a lot of ways to customize the highlighter component, both what it highlights as well as what it wraps the matches in.

• Typically you will want to use termVectors, termPositions, and termOffsets in your schema definition for the field(s) you will highlight terms in, which allows you to use the FastVectorHighlighter component or with the standard highlighter will improve performance.

• With the FastVectorHighlighter you can customize it to highlight matches with different colors or html classes. It also supports Unicode.

Page 67: Solr/Elasticsearch for CF Developers (and others)

Highlighting (cont)• The highlighting does not use the same tokenizer/stemming components

as the search.

• Part of what you are configuring for the highlighting is how many highlighted “snippets” to return and how Solr is to find them and pull them out of your text fields.

• Solr returns the snippets separately so you’ll generally have to do a search-and-replace on the original field to put the highlighted terms into it.

• With the FastVectorHighlighter you will want to be sure to include a Boundary Scanner which ensures that it doesn’t truncate words.

Page 68: Solr/Elasticsearch for CF Developers (and others)

Suggestor• Used to provide automatic suggestions for query terms (auto-suggest

search box).

• While technically you could use the spellchecker for this, the suggestor is particularly developed for this use.

• As with the spellchecker, you can configure when the suggestor’s dictionary is built and you typically will want to copy fields to a field type specifically set up for this purpose that has minimal analysis on it.

• There are multiple kinds of dictionaries available to use for the suggestor, and you can get suggestions from more than one in a single request.

Page 69: Solr/Elasticsearch for CF Developers (and others)

MoreLikeThis• Enables users to search for other documents similar to one in their

current results list.

• You can customize how this component works in many ways, from which fields to use, number of documents to return, term frequency requirements, minimum and maximum word lengths to use, boosting and more.

Page 70: Solr/Elasticsearch for CF Developers (and others)

But Wait, There’s More!

• Pagination and Cursors

• Query Re-Ranking

• Transforming Results

• Result Clustering (tag cloud)

• Spatial/Geospatial Searches

• Term and Term Vectors Components

• Stats Component

• Caching

• Query Elevation

• RealTime Get

• Exporting Result Sets

• Distributed Search and Index Sharding

• Content Streams

This covers only a portion of the features of Solr. Some things we didn’t look at include:

Page 71: Solr/Elasticsearch for CF Developers (and others)

Need More Help? CFWebtools, LLC

11204 Davenport, Ste. 100Omaha, NE 68154402 408 3733 ext. 2https://www.cfwebtools.com