Lifecycle of aSolr Search
RequestChris "Hoss" Hostetter - 2017-09-14
https://home.apache.org/~hossman/rev2017/
https://twitter.com/_hossman
https://www.lucidworks.com/
Abstract:
This intermediate session for existing Solr users will provide aDeep Dive look into the lifecycle of a Solr Search Request. Wewill drill down through each layer of code, discussing whathappens at each stage -- including when & how inter-nodecommunication takes place in a multi-node SolrCloud cluster.Along the way, we will also review the various places whereusers can configure existing (or custom written) plugins tooverride or amend the default behavior.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
1 of 24 10/4/17, 4:32 PM
Agenda
Deep Dive look into the lifecycle of 4 Solr Search Requests...
Single Node: Single SolrCoreSimple Query1. Facet Query2.
SolrCloud: 2 Shards + 2 ReplicasSimple Query3. Facet Query4.
...and where various types of Plugins can be used.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
2 of 24 10/4/17, 4:32 PM
Simple QuerySingle Node: Single SolrCore
bin/solr -e techproducts
http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & rows = 10
This sample paginated query is based off of the techproductsexample configs & data that have been included in ever release of Solrsince it was first open sourced.
I have a nostalgic affection for this silly little dataset.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
3 of 24 10/4/17, 4:32 PM
HTTP (Jetty)
SolrDispatchFilterSolr Webapp/solr ➔
CoreContainer
/techproducts ➔ SolrCore/select? ➔ RequestHandler
SolrCorefoo
SolrCoreetc...
wt=json ➔ ResponseWriter
...:8983/solr/techproducts/select?...
UI: H
TML,
Jav
ascr
ipt,
Imag
es, C
SS
SolrCoretechproducts
Purple: The HTTP layer, currently implemented by JettyBlue: Solr runs as "webapp" inside the Jetty Servlet container (butthat's just an implementation detail)Black: The key pieces of the Solr webapp: misc "flat files" that powerthe Solr UI, and the SolrDispatchFilter which is responsiblefor mapping all HTTP request/responses into their internal Solrrepresentations and executing themRed: CoreContainer is singleton responsible to managing thelifecycle of SolrCoresGreen: each SolrCore encapsulates the configs & data for a single"index" (which in a SolrCloud configuration would be a replica ofsome shard or some collection)
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
4 of 24 10/4/17, 4:32 PM
SolrCore: techproductsSolrRequestHandlers SearchComponents
QueryComponent: query - prepare() - df=text&q=ipod ➔ Query - etc... - process() - etc...
SearchHandler: /select - initParams - df = text (default) - components (implicit) - query - etc...
SearchHandler: /etc...
UpdateRequestHandler : /etc...
FacetComponent: facet
etc...
Green: The SolrCore used for this (HTTP) requestBlack: Named instances of (plugable) SolrRequestHandlers.SearchHandler is the most common, and it uses a configurablelist of SearchComponentsRed: Named instances of (plugable) SearchComponents,QueryComponent is the only one used in this simple requestAll SearchComponents implement prepare() & process()methods, which are called by SearchHandler
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
5 of 24 10/4/17, 4:32 PM
SolrIndexSearcher
query
IndexSchema - SchemaFields ➔ FieldTypes
QueryComponent.prepare() + rows=10 ➔ ok?
fl=id,name ➔ ok? / q ➔ LuceneQParser
LuceneQParser + (df=text ➔ text) + "ipod" ➔ TermQuery( "inStock desc" ➔ bool ➔ BoolField.getSortField(inStock,desc) + "score desc" ➔ SortField.SCORE ) ➔ Sort
TextField: text - Analyzer - Similarity - etc...
TextField: etc.. - Analyzer - Similarity - etc...
BoolField: bool - Analyzer - Similarity - getSortField - etc...
LuceneQParser
DismaxQParser
etc...
Red: QueryComponent.prepare() and it's basic logic forvalidating & parsing the basic request paramsGreen: Named instances of (pluggable) QParserPlugins forparsing query strings (q & fq params). Here the (implicit) defaultLuceneQParserOrange: The IndexSchema which contains...
Named SchemaFields (or dynamicFields) which mapto...Purple: Named instances of (pluggable) FieldTypes whichdictate how the field names mapped to them are parsed,indexed, sorted, queried, etc...
Blue: The SolrIndexSearcher is ultimately what will bequeried with these parsed queries & sort objects
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
6 of 24 10/4/17, 4:32 PM
SolrIndexSearcher.search(...)window(start, rows, windowSize)(queryResultCache? | Index) ➔ DocList
queryQueryComponent.process()search(Query,filters[],start,rows,Sort,...) ➔ DocList
JsonResponseWriter
DocList { + searcher.doc(#) ➔ Stored Fields}➔ Bytes ➔ HTTP...
documentCache
queryResultCache
filterCache
IndexReader - InvertedIndex - Stored Fields XmlResponseWriter
etc...
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
7 of 24 10/4/17, 4:32 PM
Red: QueryComponent.process() which uses theSolrIndexSearcher to execute the Query created by it'sprepare() methodBlue: the SolrIndexSearcher includes several caches inaddition to the InvertedIndex, and when executing a query, firstevaluates the start/rows requested to fit a configured "window size"so that "page #2" type requests can result in a cache hit & re-use theresults computed for "page #1"
Orange: The low level InvertedIndex & ThequeryResultCache that can be used in it's place whenexecuting basic searchers & the DocList containing a sortedlist of (internal) doc#s and their scores for the requestedstart+rows of this queryPurple: The Stored Fields of the documents in the index & thedocumentCache used by SolrIndexSearcher toreduce disk reads when popular documents are frequentlymatched by searches
Green: Named instances of (pluggable)QueryResponseWriters which dictate how the data structuresproduced once a request is processed get serialized into bytes (forthe HTTP response returned to the original client by Jetty)
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
8 of 24 10/4/17, 4:32 PM
More Complex QuerySingle Node: Single SolrCore
http://localhost:8983/solr/techproducts/select ? q = ipod & fq = price:[* TO 1000] & sort = div(popularity,price) asc, score desc & fl = id, name, why:[explain style=nl] & facet = true & facet.field = cat
This slightly more interesting query builds off the previous example by:
Adding a "filter query" on the (numeric) price fieldChanging the primary sort criteria to be a mathematical functionagainst 2 fieldsRequesting an additional psuedo-field explaining the score of eachdocumentFaceting on the "cat" (aka: category) field
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
9 of 24 10/4/17, 4:32 PM
HTTP (Jetty)
SolrDispatchFilterSolr Webapp/solr ➔
CoreContainer
/techproducts ➔ SolrCore/select? ➔ RequestHandler
SolrCorefoo
SolrCoreetc...
wt=json ➔ ResponseWriter
...:8983/solr/techproducts/select?...
UI: H
TML,
Jav
ascr
ipt,
Imag
es, C
SS
SolrCoretechproducts
The HTTP, Webapp, DispatchFilter, CoreContainer, SolrCore, andRequestHandler layers all function exactly as in our previous (simpler)example. It's only once the SearchHandler starts looping over thecomponents that things get more interesting....
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
10 of 24 10/4/17, 4:32 PM
query
IndexSchema - SchemaFields ➔ FieldTypes
QueryComponent.prepare()etc..."price:[* TO 1000]" ➔ float ➔ PointRangeQuery(...) ➔ filters[]div(popularity,price) ➔ ValueSource(IntFieldSource,...)
FloatPointField: float - ValueSource - getRangeQuery() - etc...
IntPointField: int - ValueSource - etc...
FacetComponent.prepare()facet=true ✔facet.field=cat ➔ ok?needDocSet = true
SolrIndexSearcher
div()
sum()
etc...
Most items identical to those shown in the "simple" query are omitted forbrevity. Of the new items shown here...
Red: In addition to some additional logic inQueryComponent.prepare() method (to parse the filterquery and more complex sort) we know also see theFacetComponent.prepare() method, which does it's ownvalidation & sets a flag indicating that it needs extra info (theDocSet) once SolrIndexSearcher is asked to execute theQueryGreen: Named instances of (pluggable) ValueSourceParsersfor parsing function strings -- used here in our sort, but could also beused in queriesOrange: As before the IndexSchema, now showing thatFieldTypes are also responsible for providing the range query(filter) and ValueSources (used by the functions)
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
11 of 24 10/4/17, 4:32 PM
SolrIndexSearcher
queryQueryComponent.process()search(...) ➔〈DocList,DocSet〉etc...
JsonResponseWriter
DocList { + searcher.doc(#) ➔ Stored Fields + [explain ...]}
+ Facet Counts
➔ Bytes ➔ HTTP...
ExplainAugmenter
ChildDocTransformer
queryFacetComponent.process()For Each "cat" Index Terms: ➔ Intersect with DocSet
SubQueryAugmenter
etc...
searcher.explain(#)
documentCache
queryResultCache
filterCache
IndexReader - InvertedIndex - Stored Fields
Most items identical to those shown in the "simple" query are omitted forbrevity. Of the new items shown here...
Red: Now when QueryComponent.process() executes thesearch, the "needsDocSet" flag set byFacetComponent.prepare() is also used.FacetComponent.process() can then use the resultingDocSet (an unordered set of all matching doc# -- regardless of sort)to compute the facet counts.Olive: Named instances of (pluggable) DocTransformers (orAugmenters) which can be used to annotate individual documentsreturned in the results. For this query in particular we see theExplainAugmenter which uses the SolrIndexSearcher toget a (debugging) data structure "explaining" how the score of eachdocument was computed.Green: the JsonResponseWriter not only returns the StoredFields of each document, but also the results of anyDocTransformers. It also serializes the Facet Counts.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
12 of 24 10/4/17, 4:32 PM
Simple QuerySolrCloud: 4 Nodes, 2 Shards, 2 Replicas
bin/solr -e cloud...
http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & rows = 10
This is the same as or original simple query, still using thetechproducts sample configs & data, but from here on we'll assumewe're using a 4 node SolrCloud cluster, with the techproductscollection configured to have 2 shards, with a replication factor of 2.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
13 of 24 10/4/17, 4:32 PM
SolrDispatchFilter/techproducts ➔ tech_s1_r2
Jetty: http://host1:8983
SolrDispatchFilter/techproducts ?➔ host4
Jetty: http://host3:8983
SolrDispatchFilter/techproducts ?➔ tech_s2_r2
Jetty: http://host2:8983
SolrDispatchFilter/techproducts ➔ tech_s2_r1
Jetty: http://host4:8983
techproductstech_s1_r2
foofoo_s1_r1
foofoo_s2_r1
techproductstech_s1_r1
techproductstech_s2_r1
foofoo_s1_r2
techproductstech_s2_r2
foofoo_s2_r2
Purple: 4 Jetty instances, running on (the same port 8983 of) 4different hostsBlack: The 4 SolrDispatchFilters running inside each ofthese 4 Jetty instances, and how each of them resolves requests forthe techproducts collection.Green the individual SolrCores (which are each a replica of someshard of a collection) running in each Solr node. Note that for thepurposes of illustrating the diff possible ways a Solr request may berouted, host3 does not contain any SolrCores that are part of thetechproducts collection.
(Other Layers such as the Solr webapp and the CoreContainer havebeen omitted to save space)
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
14 of 24 10/4/17, 4:32 PM
coordinator shard1QueryComponent:prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β1: ids=X,Y,Z&fl=name ➔ ...
shard2QueryComponent:prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β2: ids=A,..,G&fl=name ➔ ...
SearchHandler: /selectRepeat until done: query.distributedProcess ➔ ShardRequests (α,β) Loop: ShardRequests query.handleResponse
QueryComponent: distributedProcess() α: shard top10 + sort values β: full fl for final top10 ids
FacetComponent
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
15 of 24 10/4/17, 4:32 PM
Purple: The HTTP Layer showing 3 hosts: an arbitrary 'coordinator'node, and 2 nodes each hosting a replica of the 2 shards for thecollectionBlack: SearchHandler. On the coordinator node,SearchHandler executes new logic to execute sub-requestscreated by it's SearchComponents to arbitrarily selected replicasof each shard. On the replicas handling these sub-requests, theSearchHandler processes these requests just as if they weresimple (single node) queries.Red: SearchComponent methods. On the coordinator nodeSearchHandler loops over every component callingSearchComponent.distributedProcess() tocreate/modify sub-requests for the individual shards, and then callsSearchComponent.handleResponse() to merge theresults from each shard and decide if/when/what additionalinformation may be needed. This process repeats until all calls todistributedProcess() on all SearchComponentsindicate that they are finished.Green & Blue: The 2 stages (α & β) of shard sub-requests needed toprocess this simple query. Note that the α-requests are identical forboth shards, but the β-requests are slightly different to request thefl fields for the matches specific to that shard.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
16 of 24 10/4/17, 4:32 PM
Shard Request αq=ipod&fl=id&fsv=true&rows=10sort=inStock desc, score desc numFound=42+314=356
Z, ZebraF, FrogB, BoatD, DeerC, Car
X, X-RayG, GongA, AppleY, Yo-YoE, Ear
Merged
Shard 1numFound=42
F〈true,6〉B〈true,6〉D〈true,5〉C〈true,3〉G〈true,2〉A〈true,1〉E〈false,5〉
Shard 2numFound=314
Z〈true,6〉X〈true,3〉Y〈false,9〉 Shard Request β
q=ipod&ids=...&fl=name
Shard 1A, AppleB, BoatC, CarD, DeerE, EarF, Frog
G, Gong
Shard 2X, X-RayY, Yo-YoZ, Zebra
Here we see hypothetical α request+responses, hypothetical βrequests+responses, & the final Merged results from both -- showing howthe IDs and sort values from the α request are used to determine whichdocuments will be in the final results, and in which order. For these specificdocuments, the β requests+responses fill in the fl fields for the finalclient.
Red & Blue: The responses from shard1 & shard2 for the α requestGreen & Purple: The responses from shard1 & shard2 for the βrequest
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
17 of 24 10/4/17, 4:32 PM
Complex Query*
SolrCloud: 4 Node, 2 Shards, 2 Replicas
http://localhost:8983/solr/techproducts/select ? q = ipod & sort = inStock desc, score desc & fl = id, name & facet = true & facet.field = cat
In the interest of time, this query is not as "Complex" as the "Complex"Single Core query we looked at before. I've omitted things like fq params,sorting on functions, and the use of DocTransformers in the flbecause nothing about how those are handled in a Single Core querychanges when they are requested by a coordinator node in a SolrCloudquery.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
18 of 24 10/4/17, 4:32 PM
coordinator shard1QueryComponent:prepare() + process() α: q=ipod&fl=id&fsv=true ➔ top ids + sort values β1: ids=X,Y,Z&fl=name ➔...
FacetComponent:prepare() + process() α: facet.limit=N + extra ➔ top terms w/counts β1: ..._terms=aa,qq,... ➔...
QueryComponent: distributedProcess() α: shard top10 + sort values β: full fl for final top10 ids
shard2
FacetComponent: distributedProcess() α: facet.field=cat w/facet.limit overrequest β: request missing counts for final top terms
SearchHandler: /select ➔ ShardRequests (α, β)
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
19 of 24 10/4/17, 4:32 PM
Purple: The HTTP Layer showing 3 hosts: an arbitrary 'coordinator'node, and 2 nodes each hosting a replica of the 2 shards for thecollection. To save space, the (largely redundant) details of therequests to shard2 are not shown.Black: SearchHandler. To save space, the details (shown inprevious diagrams) regarding how SearchHandler processesrequests when acting as a coordinator have been omitted -- the keything to note is that even with the added complexity of theFacetComponent, there are still only 2 stages of sub-requests toeach shard (α & β)Red: SearchComponent methods:
QueryComponent behaves exactly as beforeNow that FacetComponent is in use, it can modify the sub-requests created by QueryComponent to "piggy back" onthem and request additional information from each shard.
Green & Blue: The 2 stages (α & β) of shard sub-requests needed toprocess this query. Although the details of the requests to shard2 areomitted for brevity, the α-requests are identical for both shards, and(as before) the β-requests are slightly different to request both thethe fl fields for the document matches specific to that shard, as wellas the facet counts for any "candidate" terms that were not includedin the α response from that shard.
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
20 of 24 10/4/17, 4:32 PM
Shard Request αfacet.field=cat
facet.limit=N+OVERREQUEST
Shard Request βfacet.field={!_terms=...}cat
auto: 253 (3 + 250)lawn: 190 (20 + 170)
...DVD: 102 (5 + 97)
Final (Merge α+β)Shard 1games: 40
...lawn: 20
books: 10DVD: 5
...beach: 4toys: 3
Shard 2auto: 250lawn: 170
...food: 100DVD: 97
...books: 90
clothing: 90
Shard 1auto: 3food: 0
Shard 2games: 45
N
auto: 250-253 (? + 250)lawn: 190 (20 + 170)
...games: 40-130 (40 + ?)food: 100-103 (? + 100)
DVD: 102 (5 + 97)...
Merge α
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
21 of 24 10/4/17, 4:32 PM
Here we see the additional information involved in α & βrequests+responses+merging for our more complex queries compared towhat we looked at before. The information requested & merged byQueryComponent is omitted for brevity, and we focus solely on howFacetComponent modifies those requests to "overrequest" theoriginal facet.limit and what it does with the results.
In the α request, over-request additional terms from each shard beyondwhat the user asked for; In the β request, ask each shard for the detailsabout any terms that are "candidates" for the final results but where NOTalready returned by this shard in the α response.
Each term that is a candidate for the final response is shown in a uniquecolor. Black/Grey is used to indicate terms where incomplete informationis available to the coordinator, but enough is known to be confident thatthey can't possibly be candidates for the final results. Faded terms (initalics) show at what stage the coordinating FacetComponent knowsthat particular term can be eliminated for consideration.
(While the "..." ellipses are used to denote the possibility of manyadditional terms depending on the value of facet.limit=N (whichdefaults to 100), viewers may find the easiest way to understand howthese results are merged & refined is to assume N=3 and imagine theellipses do not exist in the diagram)
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
22 of 24 10/4/17, 4:32 PM
Q & A
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
23 of 24 10/4/17, 4:32 PM
Mehttps://twitter.com/_hossman
My Companyhttps://www.lucidworks.com/
These Slideshttps://home.apache.org/~hossman/rev2017/
Solr Docs & Mailing Listhttps://lucene.apache.org/solr/resources.html
Lifecycle of a Solr Search Request https://people.apache.org/~hossman/rev2017/
24 of 24 10/4/17, 4:32 PM