66
OCTOBER 1114, 2016 BOSTON, MA

Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Embed Size (px)

Citation preview

Page 1: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

O C T O B E R   1 1 -­‐ 1 4 ,   2 0 1 6     •     B O S T O N ,   M A  

Page 2: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Rebuilding  Solr  6  examples  –    layer  by  layer  Alexandre  Rafalovitch  www.solr-­‐start.com  

Page 3: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Who  am  I  •  So)ware  developer  with  20+  years  of  experience    –  Including  3  years  as  Senior  Tech  Support  (BEA  Weblogic)  

•  Solr  popularizer  •  Published  book  author  on  Solr  Indexing  (for  Solr  4.3)  •  Run  hLp://www.solr-­‐start.com  resource  site  •  Solr  commiLer  (since  August  2016)  •  Past  and  present  Solr  focus  on  onboarding,  usability,  tooling,  informaSon  sharing  

Page 4: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Example  catch-­‐22  •  Search  is  a  –  surprisingly  -­‐  complex  experSse  •  Solr  is  a  complex  product  – Wide  – Deep  – History-­‐rich  

•  And  so  are  its  many  examples  

Page 5: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Fasten  the  seatbelt  •  Review  all  of  the  (Solr  6.2)  OOTB  examples  

•  Make  a  small  one  from  scratch  

•  Deconstruct  a  real  shipped  example  

•  Next  learning  acSon...    

Page 6: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

OOTB  Examples  –  how  many?  bin/solr  start  –e     -­‐e  <example>    Name  of  the  example  to  run;  available  examples:              cloud:                                SolrCloud  example                        techproducts:    Comprehensive  example                                                                      illustraSng  many  of  Solr's  core  capabiliSes              dih:                                        Data  Import  Handler                        schemaless:            Schema-­‐less  example  

Page 7: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

techproducts  example  •  Used  to  be  collec/on1  •  solr.home:  example/techproducts/solr  

– Can  restart  with    bin/solr  start  -­‐s  example/techproducts/solr  

– Actual  core  at  example/techproducts/solr/techproducts  

Page 8: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

techproducts  example  (cont.)  •  Source  configuraSon  –  server/solr/configset/sample_techproducts_config    – Not  actually  a  configset  (copy,  not  share)  

•  Can  be  rebuilt    rm  –rf  example/techproducts  

•  Has  data  (14  files  of  products,  money,  uc8  tests)  bin/post  -­‐c  techproducts  example/exampledocs/*.xml  

Page 9: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

schemaless  example  •  solr.home:  example/schemaless/solr  •  Actual  core:  example/schemaless/solr/ge?ngstarted  •  Source  configuraSon:  –  server/solr/configset/data_driven_schema_configs  –  Config  you  get  when  you  are  not  using  config:  bin/solr  create  -­‐c  newcore  

•  No  data,  but  can  take  (nearly)  anything:  bin/post  -­‐c  <name>  example/exampledocs/*.xml  

Page 10: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

schemaless  mode?  •  “Let  us  guess  what  you  mean”  –  Auto-­‐guess  field  type  based  on  first  content  occurrence  –  Create  explicit  field  definiSons  

•  booleans,  dates,  numbers,  strings  •  Always  mulSvalued  (because:  who  knows?!?)  •  Can  be  configured  (URP  chain  in  solrconfig.xml)  

–  Rewrites  managed-­‐schema  (coments  begone!)  – Makes  search  work  with    <copyField  source="*"  dest="_text_"/>  

Page 11: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

techproducts  vs  schemaless  •  Configured  techproducts  vs    

auto-­‐detecSng  schemaless  •  Strings  

"name":"Test  with  some  GB18030  encoded  characters",  "name":["Test  with  some  GB18030  encoded  characters"],  

•  Numbers  "price":0.0,  "price_c":"0.0,USD",  "price":[0.0],  

•  Booleans  "inStock":true,  "inStock":[true],  

Page 12: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

cloud  example  •  Highly  configurable  (unless  using  –noprompt)  •  solr.home:  example/cloud/nodeX/solr  •  Source  configuraSon  is  a  choice  

Please  choose  a  configuraSon  for  the  genngstarted  collecSon,  available  opSons  are:  basic_configs,  data_driven_schema_configs,  or  sample_techproducts_configs  [data_driven_schema_configs]  

•  Can  be  rebuilt:  bin/solr  stop  -­‐all  rm  -­‐rf  example/cloud  

•  Demonstrates  Config  API  (configoverlay.json)  

Page 13: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

dih  example(s)  •  Data  import  handler  –    legacy,  but  sSll  kicking  •  solr.home:  example/example-­‐DIH/solr  •  Has  5  (five!)  different  cores  

–  db        -­‐  database  import  (example/example-­‐DIH/hsqldb/ex.*)  –  solr    -­‐  import  from  another  Solr  core  (configured  for  db  core)  –  mail  -­‐  import  from  IMAP  (needs  some  configuraSon)  –  /ka    -­‐  import  rich-­‐content  (example/exampledocs/solr-­‐word.pdf)    –  rss        -­‐  external  XML  feed  (very  broken  right  now)  

•  Cannot  be  rebuilt  –  only  empSed  bin/post  -­‐c  db  -­‐type  'applica/on/json'  -­‐d  '{delete:  {query:"*:*"}}'  

Page 14: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

What  about:  bin/solr  start?  •  solr.home:  server/solr  •  No  iniSal  collecSon/cores,  have  to  create  explicitly:  – With  script  (see  bin/solr  create_core  –h  for  details):  bin/solr  create  –c  <corename>  -­‐d  <name  or  path>  

– With  Core  Admin  UI  for  non-­‐SolrCloud:  hRp://localhost:8983/solr/admin/cores?ac/on=CREATE&…  

–  With  CollecSon  API    for  SolrCloud:  hRp://localhost:8983/admin/collec'ons?ac/on=CREATE&…  

Page 15: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

basic_configs  configuraSon  •  Available  for  cloud  example    and  explicit  creaSon  

•  Schemaless  mode  is  configured,  not  enabled  •  “Minimal  Solr  configuraSon”  !?!  – managed-­‐schema:  1005  lines  – solrconfig.xml:  1484  lines  

Page 16: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

files  example  •  Specifically  tuned  for  file  indexing  – Augmented  schemaless  mode  with  language,  content-­‐type  guessing  

– Custom  /browse  end-­‐point  – Source  configuraSon:  example/files/conf  – Setup  instrucSons:  example/files/README.txt  – Bring  your  own  data  

Page 17: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions
Page 18: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

films  example  •  Schemaless  (Based  on  data_driven_schema_configs)    – Uses  Schema  API  to  add  custom  fields  – Uses  schemaless  for  rest  of  fields  

•  Comes  with  its  own  data  (1100  film  records)  •  Uses  velocity  (/browse),  Schema  API,  Request  Parameters  API  (params.json)  

•  Setup  instrucSons:  example/films/README.txt  

Page 19: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

That  was  a  good  news  •  Many  examples  •  Easy  to  get  one  running  •  Some  come  with  data  •  Some  you  can  throw  your  own  data  into  •  Lots  of  comments  

Page 20: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

This  is  the  bad  news  Files   Types   Fields   Dynamic  

Fields  managed-­‐schema  size  

solrconfig.xml  size  

basic   46   71   4   73   1005   1484  

data_driven   46   71   4   73   1005   1482  

techproducts   101   66   33   28   1149   1701  

dih  db   62   62   31   28   1129   1490  

dih  Ska   6   61   3   27   901   1466  

files   69   73   9   73   517   1508  

films  (data_driven+)  

46   71   8   73   481   1482  

Page 21: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Tip  –  genng  these  numbers    •  XML  extracSon  with  XMLStarlet  (XLST  CLI)  –  xml  sel  -­‐t  -­‐m  "//fieldType"  -­‐v  @name  -­‐n  managed-­‐schema  –  xml  sel  -­‐t  -­‐m  "//copyField"  -­‐c  .  -­‐n  managed-­‐schema  |wc  -­‐l  –  xml  sel  -­‐t  -­‐m  "//*[@docValues]"    -­‐v  "concat(local-­‐name(),  '  ',  @name,  '  docValues:',  @docValues)"  -­‐n  managed-­‐schema  

–  xml  sel  -­‐t  -­‐m  "//requestHandler"  -­‐v  "@name"  -­‐n  solrconfig.xml  

Page 22: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Why  is  it  like  this?  •  Many  examples  predate  Solr  Reference  Guide  •  grep  for  opSons,  possibiliSes,  defaults  •  Each  example  is  a  kitchen  sink  

 “Too  much  of  a  good  thing  is  also  a  bad  thing”  

 Source:  1980s  Soviet  joke  about  Virtual  Reality  

Page 23: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Go  small  –  managed-­‐schema  <schema name="demo" version="1.6">

<dynamicField name="*" type="string" indexed="true" stored="true" multiValued="true"/>

<field name="text" type="text_basic" indexed="true" stored="false" multiValued="true"/>

<copyField source="*" dest="text"/>

Page 24: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Go  small  –  managed-­‐schema(2)  … <fieldType name="string" class="solr.StrField"/>

<fieldType name="text_basic" class="solr.TextField">

<analyzer>

<tokenizer class="solr.LowerCaseTokenizerFactory" />

</analyzer> </fieldType>

</schema>

 

Page 25: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Go  small  –  solrconfig.xml  <config> <luceneMatchVersion>6.2.0</luceneMatchVersion>

<requestHandler name="/select” class="solr.SearchHandler”>

<lst name="defaults">

<str name="df">text</str> </lst>

</requestHandler>

</config>

Page 26: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Go  small  –  load  and  test  •  bin/solr  create  -­‐c  demo  -­‐d  .../demo-­‐config/  •  bin/post  -­‐c  demo  example/exampledocs/*.xml  •  Test  it  works,  using  HTTPie  (HTTP  CLI)  

 

Page 27: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions
Page 28: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Go  small  -­‐  review  •  Minimal  example  could  be  very  minimal  •  Some  things  will  not  work  – No  uniqueKey  –  no  way  to  update  documents,  no  SolrCloud  

– No  _version_  –  no  SolrCloud  –  Everything  is  mulSValued  –  no  sorSng  –  copyField  *  =>  text,  no  meaningful  relevancy,  specialized  analyzer  chain  processing  

Page 29: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  films  example  •  bin/solr  create  –c  films  •  curl  hLp://localhost:8983/solr/films/schema  ...  (add  name,  

ini/al_release_date)  •  Index  1100  records  from    

–  (Solr)  XML,    –  (generic)  JSON  (doc),  or    –  CSV  format  

•  Search  for  batman  •  Use  /browse  end-­‐point  and  search  for  batman  •  Enable  highlighSng  in  results  

Page 30: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions
Page 31: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

IniSal  stats  for  films  core  Sizes  (line  counts)  

managed-­‐schema*   481  solrconfig.xml   1482  params.json   20  

File  count  in  conf  

.txt   41  

.xml   3  

.json   1  

managed-­‐schema  (xml)   1  

*  already  has  no  comments  

Page 32: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  just  straight  tags  •  managed-­‐schema  lost  comments  during  construcSon  

•  Let's  remove  comments  from  solrconfig.xml  •  xml  ed  -­‐L  -­‐d  "//comment()"  solrconfig.xml  – Edit  in  place  – Delete  XPATH  

Page 33: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

solrconfig.xml  without  comments  Sizes  (line  counts)  

managed-­‐schema   481  solrconfig.xml   1482  

278  params.json   20  

File  count  in  conf  

.txt   41  

.xml   3  

.json   1  

managed-­‐schema  (xml)   1  

Page 34: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  what  to  clean  •  Currently  –  (explicit)  fields:  8  –  dynamic  fields:  73  

•  xml  sel  -­‐t  -­‐m  "//dynamicField"  -­‐v  @name  -­‐n  managed-­‐schema  |wc  -­‐l  

–  types:  71  –  copyFields:  1  

•  Let's  start  from  dynamic  fields  

Page 35: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  dynamic  fields  •  Used  dynamic  fields    – do  NOT  modify  schema  – DO  show  up  in  Admin  UI,  if  used  – Example  from  different  schema:  

•  Used/matched  fields  •  Generic  definiSons  

Page 36: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  in  use  dynamic  fields  

Page 37: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  in  use  dynamic  fields  

•  NO  dynamic  fields  are  used  – *  is  a  copyField  instrucSon  

•  Can  remove  them  all  •  xml  ed  -­‐L  -­‐d  "//dynamicField"    managed-­‐schema  

Page 38: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  dynamicFields  Sizes  (line  counts)  

managed-­‐schema   481    409  

solrconfig.xml   278  params.json   20  

File  count  in  conf  

.txt   41  

.xml   3  

.json   1  

managed-­‐schema  (xml)   1  

Page 39: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  field  types  •  How  many  types  out  of  71  do  we  use?  –  xml  sel  -­‐t  -­‐m  "//field|//dynamicField"    -­‐v  "@type"  -­‐n  conf/managed-­‐schema  |sort  –u  

–  long,  string,  strings,  tdate,  text_general  •  But  also  some  in  solrconfig.xml  –  booleans,  string,  strings,  tdates,  tdoubles,  text_general,  tlongs  

•  Combined  total:  9  field  type  definiSons  •  Delete  the  rest  (by  hand)  

Page 40: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  no-­‐longer  used  types  Sizes  (line  counts)  

managed-­‐schema   409    34  (!!!)  

solrconfig.xml   278  params.json   20  

File  count  in  conf  

.txt   41  

.xml   3  

.json   1  

managed-­‐schema  (xml)   1  

Page 41: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  support  files  •  Inside  lang  directory  (38  files)  –  find  lang  –name  'stopwords_*.txt'  |  wc  -­‐l  

•  stopwords_*.txt:  30  files  •  contracSons_*.txt:  4  files  

–  find  lang  -­‐type  f  |egrep  -­‐v  'stopwords_|contrac/ons_'  •  hyphenaSons_ga.txt,  stemdict_nl.txt,  stoptags_ja.txt,  userdict_ja.txt  

Page 42: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Support  files  –  sSll  in  use?  •  Check  for  usage  

–  grep  -­‐o  'stopwords_.*.txt'  managed-­‐schema  solrconfig.xml  –  grep  -­‐o  'contrac/ons_.*.txt'  ...  –  ...  

•  NO  Matches  (we  no  longer  have  related  types)  – Delete  the  whole  lang  directory  

•  What  about  files  just  inside  config  directory  – Don't  need  currency.xml,  protwords.txt    

Page 43: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  no-­‐longer  used  types  Sizes  (line  counts)  

managed-­‐schema   34  solrconfig.xml   278  params.json   20  

File  count  in  conf  

.txt   41  2  

.xml   3  2  

.json   1  

managed-­‐schema  (xml)   1  

Page 44: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  actual  field  usage  

Page 45: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Actual  field  usage  -­‐  _root_  

Page 46: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

The  mystery  of  _root_  •  In  the  original  schema  –  no  explanaSons  •  DocumentaSon  –  used  for  nested  documents:  To  support  nested  documents,  the  schema  must  include  an  indexed/non-­‐stored  field  _root_  .  The  value  of  that  field  is  populated  automa/cally  and  is  the  same  for  all  documents  in  the  block,  regardless  of  the  inheritance  depth.  

•  We  are  not  using  nested  documents  •  And  neither  does  any  other  shipped  example...    

Page 47: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  _root_  Sizes  (line  counts)  

managed-­‐schema   34  33  solrconfig.xml   278  params.json   20  

File  count  in  conf  

.txt   2  

.xml   2  

.json   1  

managed-­‐schema  (xml)   1  

Page 48: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  text_general  type  <fieldType  name="text_general"  class="solr.TextField"  posiSonIncrementGap="100"                                          mulSValued="true">        <analyzer  type="index">            <tokenizer  class="solr.StandardTokenizerFactory"/>            <filter  class="solr.StopFilterFactory"  words="stopwords.txt"  ignoreCase="true"/>            <filter  class="solr.LowerCaseFilterFactory"/>        </analyzer>        <analyzer  type="query">            <tokenizer  class="solr.StandardTokenizerFactory"/>                <filter  class="solr.StopFilterFactory"  words="stopwords.txt"  ignoreCase="true"/>                <filter  class="solr.SynonymFilterFactory"  expand="true"                                        ignoreCase="true"  synonyms="synonyms.txt"/>                <filter  class="solr.LowerCaseFilterFactory"/>            </analyzer>    </fieldType>  

Page 49: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

text_general  support  files  stopwords.txt    #  Licensed  to  the  Apache  Sokware  Founda/on  (ASF)  under  one  or  more  #  contributor  license  agreements.    See  the  NOTICE  file  distributed  with  #  this  work  for  addi/onal  informa/on  regarding  copyright  ownership.  #  The  ASF  licenses  this  file  to  You  under  the  Apache  License,  Version  2.0  #  (the  "License");  you  may  not  use  this  file  except  in  compliance  with  #  the  License.    You  may  obtain  a  copy  of  the  License  at  #  #          hRp://www.apache.org/licenses/LICENSE-­‐2.0#  #  Unless  required  by  applicable  law  or  agreed  to  in  wri/ng,  sokware  #  distributed  under  the  License  is  distributed  on  an  "AS  IS"  BASIS,  #  WITHOUT  WARRANTIES  OR  CONDITIONS  OF  ANY  KIND,  either  express  or  implied.  #  See  the  License  for  the  specific  language  governing  permissions  and  #  limita/ons  under  the  License.  

•  synonyms.txt  #  The  ASF  licenses  this  file  to  You  under  the  Apache  License,  Version  2.0  #  (the  "License");  you  may  not  use  this  file  except  in  compliance  with#  the  License.    You  may  obtain  a  copy  of  the  License  at#.  ......    .#-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  #some  test  synonym  mappings  unlikely  to  appear  in  real  input  textaaafoo  =>  aaabar  bbbfoo  =>  bbbfoo  bbbbar  cccfoo  =>  cccbar  cccbaz  fooaaa,baraaa,bazaaa    #  Some  synonym  groups  specific  to  this  example  GB,gib,gigabyte,gigabytes  MB,mib,megabyte,megabytes  Television,  Televisions,  TV,  TVs  #no/ce  we  use  "gib"  instead  of  "GiB"  so  any  WordDelimiterFilter  coming  #aker  us  won't  split  it  into  two  words.  #  Synonym  mappings  can  be  used  for  spelling  correc/on    toopixima  =>  pixma  

Page 50: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

text_general's  empty  stopwords  •  No  file    =>  default  stopwords    =>  English  

•  Empty  file    =>  disabled  stopwords  

•  Currently  –  NOT  used  

Page 51: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

text_general  simplified  definiSon    <fieldType  name="text_general"  class="solr.TextField"  posiSonIncrementGap="100"  mulSValued="true">          <analyzer>              <tokenizer  class="solr.StandardTokenizerFactory"/>              <filter  class="solr.LowerCaseFilterFactory"/>          </analyzer>      </fieldType>  

Page 52: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  stopwords  and  synonyms  Sizes  (line  counts)  

managed-­‐schema   33  26  solrconfig.xml   278  params.json   20  

File  count  in  conf  

.txt   2  0  

.xml   2  

.json   1  

managed-­‐schema  (xml)   1  

Page 53: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

How  far  did  we  get  Sizes  (line  counts)  

managed-­‐schema*   481  26  solrconfig.xml   1482  

278  params.json   20  

File  count  in  conf  

.txt   41  0  

.xml   3  2  

.json   1  

managed-­‐schema  (xml)   1  *  already  has  no  comments  

Page 54: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

DeconstrucSng  –  solrconfig.xml  •  solrconfig.xml  is  more  complex  than  schema  •  Heterogeneous  SecSons  •  Nested  definiSons  •  AlternaSve  implementaSons  (e.g.  highlighter)  •  Also  remember    –  configoverlay.json  –  overrides  solrconfig.xml  –  params.json  –  addiSonal  configuraSon  parameters  

Page 55: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

solrconfig.xml  –  feature  counts  11  requestHandler  8  lib  5  searchComponent  3  queryResponseWriter  2  initParams  1  updateRequestProcessorChain  1  updateHandler  1  requestDispatcher      

 1  query  1  luceneMatchVersion  1  jmx  1  indexConfig  1  directoryFactory  1  dataDir  1  codecFactory  

Page 56: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

solrconfig.xml  –  line  counts  55:<updateRequestProcessorChain  name="add-­‐unknown-­‐fields-­‐to-­‐the-­‐schema">    52:<searchComponent  class="solr.HighlightComponent"  name="highlight">    18:<query>  17:<requestHandler  name="/spell"  class="solr.SearchHandler"  startup="lazy">    15:<searchComponent  name="spellcheck"  class="solr.SpellCheckComponent">  13:<updateHandler  class="solr.DirectUpdateHandler2">    9:<requestHandler  name="/terms"  class="solr.SearchHandler"  startup="lazy">    8:<requestHandler  name="/elevate"  class="solr.SearchHandler"  startup="lazy">    8:<requestHandler  name="/tvrh"  class="solr.SearchHandler"  startup="lazy">    7:<requestHandler  name="/update/extract"  startup="lazy"  class="solr.extracSon.ExtracSngRequestHandler">    7:<requestHandler  name="/query"  class="solr.SearchHandler">    6:<requestHandler  name="/debug/dump"  class="solr.DumpRequestHandler">    ......  

Page 57: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remember,  this  works!  <config> <luceneMatchVersion>6.2.0</luceneMatchVersion>

<requestHandler name="/select” class="solr.SearchHandler”>

<lst name="defaults">

<str name="df">text</str> </lst>

</requestHandler>

</config>

Page 58: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

add-­‐unknown-­‐fields-­‐to-­‐the-­‐schema  •  Famous  "schemaless"  mode  •  Generic,  but  fully  configurable  •  Far  from  perfect  –  Remember,  we  had  to  manually  pre-­‐add  fields  – Development,  not  producSon  – Has  normalizaSon  side-­‐effects  (normalizes  dates)  

•  Cannot  remove  it  in  our  example  

Page 59: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

solrconfig.xml  -­‐  highlighter    <searchComponent  class="solr.HighlightComponent"  name="highlight">          <highlighSng>              <fragmenter  name="gap"  default="true"  class="solr.highlight.GapFragmenter">                  <lst  name="defaults">                      <int  name="hl.fragsize">100</int>                  </lst>              </fragmenter>              <fragmenter  name="regex"  class="solr.highlight.RegexFragmenter">                  <lst  name="defaults">                      <int  name="hl.fragsize">70</int>                      <float  name="hl.regex.slop">0.5</float>                      <str  name="hl.regex.paLern">[-­‐\w  ,/\n\"']{20,200}</str>                  </lst>              </fragmenter>              <formaLer  name="html"  default="true"  class="solr.highlight.HtmlFormaLer">                  <lst  name="defaults">                      <str  name="hl.simple.pre"><![CDATA[<em>]]></str>                      <str  name="hl.simple.post"><![CDATA[</em>]]></str>                  </lst>              </formaLer>              <encoder  name="html"  class="solr.highlight.HtmlEncoder"/>              <fragListBuilder  name="simple"  class="solr.highlight.SimpleFragListBuilder"/>              <fragListBuilder  name="single"  class="solr.highlight.SingleFragListBuilder"/>    

     .......  

•  fragmenters  •  encoders  •  fragListBuilders  •  fragmentBuilders  •  boundaryScanners  •  ....  

Page 60: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

highlighter  –  the  truth  •  Highlighter  searchComponent  is  in  default  stack  •  The  params  are  a  mix  of  standard  highlighter,  alternaSve  FastVector  highlighter  

•  Cannot  use  FastVector  version  as  schema  fields  are  missing  termVectors,  etc  

•  And  standard  highlighter  params  are  same  as  implicit  values  

•  Therefore,  we  can  remove  the  WHOLE  definiSon  

Page 61: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  highlighter  Sizes  (line  counts)  

managed-­‐schema   26  solrconfig.xml   278  226  params.json   20  

File  count  in  conf  

.txt   0  

.xml   2  

.json   1  

managed-­‐schema  (xml)   1  

Page 62: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Other  searchComponents  •  Not  on  the  default  stack  

–  spellcheck  –  term  –  termVector  –  elevator  

•  Have  dedicated  requestHandlers  •  IncepSon  (example  within  example)  •  Can  be  deleted  

–  also  delete  elevate.xml  

15:<searchComponent  name="spellcheck"  class="solr.SpellCheckComponent">  17:<requestHandler  name="/spell"  class="solr.SearchHandler"  startup="lazy">  1:<searchComponent  name="terms"  class="solr.TermsComponent"/>  9:<requestHandler  name="/terms"  class="solr.SearchHandler"  startup="lazy">  1:<searchComponent  name="tvComponent"  class="solr.TermVectorComponent"/>  8:<requestHandler  name="/tvrh"  class="solr.SearchHandler"  startup="lazy">  4:<searchComponent  name="elevator"  class="solr.QueryElevaSonComponent">  8:<requestHandler  name="/elevate"  class="solr.SearchHandler"  startup="lazy">    

Page 63: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Remove  custom  searchComponents  Sizes  (line  counts)  

managed-­‐schema   26  solrconfig.xml   226  163  params.json   20  

File  count  in  conf  

.txt   0  

.xml   2  1  

.json   1  

managed-­‐schema  (xml)   1  

Page 64: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

solrconfig.xml  –  more  stuff  •  There  is  more  that  can  be  taken  out  – query  secSon,  since  you  have  to  tune  it  anyway  – updateHandler,  and  revert  to  basic  commits  –  jmx  – enableRemoteStreaming  –  definitely  take  that  out  

•  But  keep  velocity,  browse,  search  support  

Page 65: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Next  acSon  •  Join  the  (virtual)  Solr  Example  Reading  Group  –  Starts  November  2016  –  Register  at  hLp://bit.ly/SolrERG    

•  Join  mailing  list  at  hLp://www.solr-­‐start.com    – Get  the  link  to  the  presentaSon  source  –  Learn  about  other  similar  projects  – Get  news  of  Solr  arScles  and  projects  on  the  web  

Page 66: Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovitch, Search Stack Solutions

Rebuilding  Solr  6  examples  –    layer  by  layer  Alexandre  Rafalovitch  www.solr-­‐start.com