Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, United Nations

Preview:

Citation preview

2

02The magic of creation Any sufficiently advanced technology is indistinguishable from magic

—Arthur C. Clarke

bin/solr start -e techproductsPure magic

1.  Start the server2.  Setup the collection3.  Populate with documents4.  Commit5.  Profit!

3

03The price of magic

bin/solr start … ???

1.  What port is the server running on?2.  What is the collection name?3.  Is it static or dynamic schema? Or schemaless?4.  Which directory is schema configuration in? Data?5.  What documents have we populated already?6.  Is everything committed?

7.  WHY DOES MY QUERY NOT WORK !!! L

4

03Troubleshooting process 1.  Troubleshooting is not a linear process2.  It is not taught often or well3.  Book is coming soon(-ish….)4.  Based on my experience as: ���

Solr-based project developer and popularizer���Senior (Weblogic) tech-support for 3 years

5.  Hard to explain the book in 40 minutes6.  TreeMap is a – slightly - faster mental model7.  Adaptation of the Root Cause Analysis8.  Top-level concepts described in "The New Rational

Manager" by the Kepner and Tregoe (1997)

5

03Troubleshooting TreeMap

1.  Establish the boundaries2.  Split the problem3.  Identify the relevant part4.  Zoom in5.  Re-formulate the boundaries6.  Repeat 2-5 until fixed

6

03Establishing the boundaries – Root Cause Analysis

Iden

tity

Location

Timing

Magnitude

7

03Boundaries - Identity Identity – action we want to accomplish/problem to solve

Initial (black-box) identity – ���"echoParams is duplicated with example config, sometimes"

Zoomed-in – ���"Any query parameter that is also in request handler's defaults is duplicated"

See SOLR-6780 for full story, a.k.a "an evil freaking bug"

Gets easier with practice

8

03Boundaries - Location

Problem: Solr cannot find customer records

Could be indexing•  Record was never sent to Solr•  Wrong handler•  Invalid schema definition•  Incorrect URP pipeline•  ...

Could be searching•  Query too restrictive•  Query too permissive•  Searching wrong fields•  Searching against catch-all field•  ...

Cloud adds many more locations

Location – Place (component) where the problem happens

9

03Boundaries - Timing Timing – when/how often the problem shows itself

Reproducibility1.  Always – ideal, reproducible with debugger on, logs on/off2.  Seemingly intermittent (a.k.a sometimes) – useless3.  On trigger X (e.g. on commit) – nearly as good as always

Onset1.  Did the system work at time point X – not at time point Y =>���

What did you change in meanwhile?2.  Problem exists != Problem noticed, may have been shadowed

10

03Boundaries - Magnitude Magnitude – WHAT is the extent of the problem

•  Latest Solr or a single (or range) of old versions?•  Standard example configuration or only with custom schema?•  A single node or a whole cluster?

•  The more standard/recent config is => the easier it is to troubleshoot

11

03Boundaries – through negation and comparison “I choose a block of marble and chop off whatever I don’t need”

— (sculptor) Auguste Rodin

Clarify the problem by saying what it is NOT as well

1.  Example: "This affects Solr 5.1, BUT not Solr 5.2"

2.  The BUT part requires testing and may prove to be untrue

3.  Thinking of negative condition simplifies/purifies test case

4.  Also gives a parallel use-case that works – great for debugging

12

03Practical boundaries – what does the start script do? bin/solr start … ???

1.  Do not try to read the script – look at the ground truth2.  In Admin UI���

Dashboard -> Versions -> solr-spec (version) ���Dashboard -> JVM -> Args (command line params, abbrev.) ���Collection -> Overview -> Instance (all the directories)

3.  On command line (Unix, Mac, and like): ���ps -aef |grep java���/usr/bin/java -server -Xss256k -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/Users/arafalov/SearchEngines/solr-5.3.1/example/techproducts/solr//../logs/solr_gc.log …

4.  On Windows: use Microsoft/Sysinternals ProcessExplorer5.  Example: SOLR-8073

13

03TreeMap – black box

Indexing Searching

14

03TreeMap – black box

Indexing

15

03TreeMap – indexing - details 1.  Choose Request Handler (.e.g /update) ���

UpdateRequestHandler���ExtractingRequestHandler – Tika���

2.  Calculate all parameters���URL explicit���Handler params (defaults, appends, invariants) ���Global defaults (initParams) ���Shared param blocks (useParams)���Hardcoded���REST-driven overrides���

3.  Execute Request Handler ���Generates standard Solr document���

4.  UpdateRequestProcessors (URPs) ���Explicit chain���Parameter-supplied chain���Built-in chain������URPs is where work actually happens

5.  Mapping to schema fields Explicit field Dynamic fields CopyFields

6.  Commit Manual Delayed (commitWithin) Soft Hard

16

03Boundaries – example - discovering parameters

INFO - [ x:techproducts] ...LogUpdateProcessor; [techproducts] webapp=/solr path=/update params={} {add=[3007WFP (1515103857103863808)]}

DEBUG - [ x:techproducts] ...LogUpdateProcessor; PRE_UPDATE add{,id=3007WFP} {{params(df=text),defaults(wt=xml)}}

solr.log

http://localhost:8983/solr/techproducts/config/

"secret" API to get current config

17

03Boundaries – example - schemaless magic 1.  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema">���

2.  <processor class="solr.UUIDUpdateProcessorFactory" />3.  <processor class="solr.LogUpdateProcessorFactory"/>4.  <processor class="solr.DistributedUpdateProcessorFactory"/>5.  <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>6.  <processor class="solr.FieldNameMutatingUpdateProcessorFactory">7.  <str name="pattern">[^\w-\.]</str>8.  <str name="replacement">_</str>9.  </processor>10.  <processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>11.  <processor class="solr.ParseDateFieldUpdateProcessorFactory">12.  <arr name="format">13.  <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>14.  <str>yyyy-MM-dd</str>15.  </arr>16.  </processor>17.  <processor class="solr.AddSchemaFieldsUpdateProcessorFactory">18.  <str name="defaultFieldType">strings</str>19.  <lst name="typeMapping">20.  <str name="valueClass">java.lang.Boolean</str>21.  <str name="fieldType">booleans</str>22.  </lst>23.  <lst name="typeMapping">24.  <str name="valueClass">java.util.Date</str>25.  <str name="fieldType">tdates</str>26.  </lst>27.  </processor>28.  <processor class="solr.RunUpdateProcessorFactory"/>29.  </updateRequestProcessorChain>

18

03TreeMap – black box

Searching

19

03TreeMap – searching - details 1.  Choose Request Handler (SearchHandler) ���

/query���/export���/browse���

2.  Calculate all parameters���URL explicit���Handler params (defaults, appends, invariants) ���Global defaults (initParams) ���Shared param blocks (useParams)���Hardcoded���REST-driven overrides���

3.  Search Components��� <arr name="components">

<str>query</str> <str>facet</str> <str>mlt</str> <str>highlight</str> <str>stats</str> <str>debug</str> </arr>

4.  Query Parsers standard dismax edismax switch block join surround …. (>20 parsers)

5.  Response writers xml json python ruby php velocity csv schema.xml xsort

20

03TreeMap – searching - example

21

03TreeMap – searching - example http://localhost:8983/solr/techproducts/browse?��� q=THIS+is+a+TEST&��� wt=xml&��� echoParams=all&��� debugQuery=true

<str name="parsedquery_toString">���+(((features:this | keywords:this^5.0 | author:this^2.0 | cat:THIS^1.4 | name:this^1.2 | ���manu:this^1.1 | description:this^5.0 | text:this^0.5 | id:THIS^10.0 | resourcename:this |��� title:this^10.0) (features:is | keywords:is^5.0 | author:is^2.0 | cat:is^1.4 | name:is^1.2 | ���manu:is^1.1 | description:is^5.0 | text:is^0.5 | id:is^10.0 | resourcename:is | title:is^10.0) ��� (features:a | keywords:a^5.0 | author:a^2.0 | cat:a^1.4 | name:a^1.2 | manu:a^1.1 | description:a^5.0 | ���text:a^0.5 | id:a^10.0 | resourcename:a | title:a^10.0) (features:test | keywords:test^5.0 | author:test^2.0 |��� cat:TEST^1.4 | name:test^1.2 | manu:test^1.1 | description:test^5.0 | text:test^0.5 | id:TEST^10.0 |��� resourcename:test | sku:test^1.5 | title:test^10.0))~4) ���</str>

22

03TreeMap – searching - tools

23

03TreeMap – searching - tools

24

03TreeMap – Troubleshooting Solr cloud

1.  Good luck with exponential complexity increase.2.  Try to reproduce in a standalone instance!3.  Tools exist, but they are themselves complex (e.g. Jepsen)4.  But the TreeMap process is the same overall

Cloud adds many more locations

25

03Troubleshooting – closing notes and review 1.  Troubleshooting is both art (intuition) and science2.  The more you apply the science, the better you become at the art3.  Remember the overall process���

Establish the boundaries���Split the problem���Identify the relevant part���Zoom in���Re-formulate the boundaries���Repeat until fixed/problem identified

4.  Remember the boundaries���Identity���Location���Timing���Magnitude

26

03Troubleshooting – next step 1.  My resources and mailing list: http://www.solr-start.com/2.  Solr-users mailing list and archives���

Identify your boundary in the email3.  Books, current and upcoming4.  Google/Bing/DDG – use good keywords5.  Share what you learned

Recommended