66
Clayton Parker | Senior Web Developer Make Plone Search Act Like Google Using Solr PLONE CONFERENCE 2011

Make Plone Search Act Like Google Using Solr

Embed Size (px)

DESCRIPTION

Solr is a powerful open source search engine server which has become a popular choice for extending the search capabilities of Plone sites. The default configuration works well, but how do you answer the client's request to "Make my search just like Google's"? In this talk we will take a look at the various options that are available for configuring Solr's schema and configuration. We will discuss how to set up stop words, spell checking, n-grams and alternate query handlers. We will see what effect these settings will have on the search results and find out how to debug problems when they arise.

Citation preview

Page 1: Make Plone Search Act Like Google Using Solr

Clayton Parker | Senior Web Developer

Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Page 2: Make Plone Search Act Like Google Using Solr

Who Am I

Page 3: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011What will we learn?

Page 4: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011What will we learn?

• Intro to Solr

Page 5: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011What will we learn?

• Intro to Solr

• Brief overview of Plone integration points

Page 6: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011What will we learn?

• Intro to Solr

• Brief overview of Plone integration points

• Solr configuration

Page 7: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011What will we learn?

• Intro to Solr

• Brief overview of Plone integration points

• Solr configuration

• Solr schema setup

Page 8: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011What will we learn?

• Intro to Solr

• Brief overview of Plone integration points

• Solr configuration

• Solr schema setup

• Debugging tips and tricks

Page 9: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

What is Solr ?

Page 10: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Version Madness

1.x(up to 1.4)

1.5(number abandoned)

3.x(merge of Lucene and Solr)

Page 11: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Books

Page 12: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Integration

Page 13: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

alm.solrindex

Page 14: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

collective.solr

Page 15: Make Plone Search Act Like Google Using Solr

Solr Configuration

Page 16: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Query Handlers

• Standard

• Disjunction Max (DisMax)

• Extended DisMax (experimental)

Page 17: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011DisMax

• Multiple index searches

• Boosting

• Friendlier to end users

Page 18: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011DisMax

qf=SearchableText^1.0 substring^0.2

Index Name

Weight

Page 19: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011MinShouldMatchmm=100%

mm=50%

mm=-2

All terms required

Half of the terms required

All but two terms required

Page 20: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011MinShouldMatch

mm=2<-25% 9<-3

2 or less terms are required

3-9 terms all but 25% required

more than 9 terms all but

three are required

Page 21: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Spelling Component

<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">default</str> <str name="classname">solr.IndexBasedSpellChecker</str> <str name="buildOnCommit">true</str> <str name="spellcheckIndexDir">path/to/spellcheck</str> <!-- The field that will contain the dynamic spelling data --> <str name="field">spell</str> <str name="accuracy">0.5</str> </lst> <!-- Control indexing and query of spelling data --> <str name="queryAnalyzerFieldType">spell-text</str></searchComponent>

Page 22: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Spelling Schema

<fieldType name="spell-text" class="solr.TextField"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> </analyzer></fieldType>

Page 23: Make Plone Search Act Like Google Using Solr

Solr Schema

Page 24: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Index vs Query

http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

Page 25: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Page 26: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Character Filters

Page 27: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Character Filters

Tokenizer

Page 28: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Character Filters

Tokenizer

Filters

Page 29: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Character Filters

Tokenizer

Filters

Page 30: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Complete Field<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> </analyzer>

<analyzer type="query"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/> <tokenizer class="solr.ICUTokenizerFactory"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class="solr.PositionFilterFactory"/> </analyzer></fieldType>

Page 31: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Copy Field

<copyField source="SearchableText" dest="spell"/><copyField source="SearchableText" dest="substring"/>

Page 32: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Character Filters• Process text before tokenizing

• Remove irrelevant characters

Page 33: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Pattern Replace

<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-z0-9_-]" replacement="" replace="all"/>

'That WAS a narrow escape!' said Alice, a good deal frightened

That WAS a narrow escape said Alice a good deal frightened

Page 34: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Mapping

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping.txt"/>

# œ => oe"\u0153" => "oe"# ß => ss"\u00DF" => "ss"

Page 35: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011HTML Strip

<charFilter class="solr.HTMLStripCharFilterFactory"/>

Page 36: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Tokenizers• Split raw text into tokens / terms

• Typically the first step

Page 37: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Whitespace Tokenizer

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

'That WAS a narrow escape!' said Alice

'ThatWASanarrowescape!'saidAlice

Page 38: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011ICU Tokenizer

<tokenizer class="solr.ICUTokenizerFactory"/>

'That WAS a narrow escape!' said Alice

ThatWASanarrowescapesaidAlice

Page 39: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Pattern Tokenizer

<tokenizer class="solr.PatternTokenizerFactory" pattern=";\s*" />

one; two; three

onetwothree

Page 40: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Path Hierarchy

<tokenizer class="solr.PathHierarchyTokenizerFactory"/>

/usr/local/etc/nginx

/usr/usr/local/usr/local/etc/usr/local/etc/nginx

Page 41: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011

Token Filters• Process after tokenizing

• Normalization of terms

Page 42: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Lower Case

<filter class="solr.LowerCaseFilterFactory"/>

FoobArBAZ

foobarbaz

Page 43: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011ASCII Folding

<filter class="solr.ASCIIFoldingFilterFactory"/>

idéebêtegrüßen

ideebetegrussen

Page 44: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011ICU Folding

<filter class="solr.ICUFoldingFilterFactory"/>

IdéeBÊTEGrüßeN

ideebetegrussen

Page 45: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Pattern Replace

<filter class="solr.PatternReplaceFilterFactory" pattern="[^a-zA-z0-9_-]" replacement="" replace="all"/>

'ThatWASanarrowescape!'saidAlice

ThatWASanarrowescapesaidAlice

Page 46: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Word Delimiter

<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/>

StudlyCaps1234-5678

StudlyCaps1234-5678CapsStudly12345678

Page 47: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Edge N Gram

<filter class="solr.EdgeNGramFilterFactory" minGramSize="4" maxGramSize="100" side="front"/>

Conqueror

ConquerorConqueroConquerConqueConquConq

Page 48: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Stop Words

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

narrowescapesaidAlicegooddealfrightened

ThatWASanarrowescapesaidAliceagooddealfrightened

Page 49: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Synonyms

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

# synonyms.txt

# add multiple termsfoozball, foosball, baby-foot

# merge into onetv, t.v., tele => television

foosballfoozballfoosballbaby-foot

telet.v.tv

televisiontelevisiontelevision

Page 50: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Language Stemming

<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

drydryingdried

dridridri

Page 51: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Language Stemming<filter class="solr.ElisionFilterFactory" articles="stopwordarticles.txt"/>

<filter class="solr.EnglishPorterFilterFactory" language="French"/>

considereconsideresconsiderent

considerconsiderconsider

qu'ilnecomprendpasl'anglais

ilnecomprendpasanglais

Page 52: Make Plone Search Act Like Google Using Solr

Solr Debugging

Page 53: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Schema Browser

Page 54: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Analysis

Page 55: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Analysis

Page 56: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Analysis

Page 57: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Analysis

Page 58: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Analysis

Page 59: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Search Interface

Page 61: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Verbose XML*

* like there is any other kind

<lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> <lst name="params"> <str name="explainOther">True</str> <str name="fl">*,score</str> <str name="debugQuery">on</str> <str name="indent">true</str> <str name="q">test</str> <str name="qf">SearchableText^1.0</str> <str name="rows">10</str> <str name="defType">dismax</str> </lst></lst>

Page 62: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Verbose XML*

* like there is any other kind

<result name="response" numFound="2" start="0" maxScore="0.70710677"> <doc> <float name="score">0.70710677</float> <int name="docid">-643919099</int> </doc> <doc> <float name="score">0.3788861</float> <int name="docid">-643919097</int> </doc></result>

Page 63: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Verbose XML*

* like there is any other kind

<lst name="debug"> <str name="rawquerystring">test</str> <str name="querystring">test</str> <str name="parsedquery">+DisjunctionMaxQuery((SearchableText:test)) ()</str> <str name="parsedquery_toString">+(SearchableText:test) ()</str> <lst name="explain"> <str name="-643919099">0.70710677 = (MATCH) sum of: 0.70710677 = (MATCH) fieldWeight(SearchableText:test in 4), product of: 1.4142135 = tf(termFreq(SearchableText:test)=2) 1.0 = idf(docFreq=5, maxDocs=6) 0.5 = fieldNorm(field=SearchableText, doc=4) </str> <str name="-643919097">0.3788861 = (MATCH) sum of: 0.3788861 = (MATCH) fieldWeight(SearchableText:test in 0), product of: 1.7320508 = tf(termFreq(SearchableText:test)=3) 1.0 = idf(docFreq=5, maxDocs=6) 0.21875 = fieldNorm(field=SearchableText, doc=0) </str></lst>

Page 64: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Links

• Solr (http://lucene.apache.org/solr)

• Solr Wiki (http://wiki.apache.org/solr)

• Books (http://www.packtpub.com/books/all?keys=solr)

• SolrIndex (http://pypi.python.org/pypi/alm.solrindex/)

• collective.solr (http://pypi.python.org/pypi/collective.solr)

Page 65: Make Plone Search Act Like Google Using Solr

PLONE CONFERENCE 2011Flickr Credits

• http://www.flickr.com/photos/naturegeak/5642083189/ (who)

• http://www.flickr.com/photos/eklektikos/2541408630/ (schema)

• http://www.flickr.com/photos/sidelong/13954593/ (char filter)

• http://www.flickr.com/photos/benimoto/2214240119/ (tokenizers)

• http://www.flickr.com/photos/chaunceydavis/3264077445/ (filters)

• http://www.flickr.com/photos/comedynose/3271760209/ (configuration)

• http://www.flickr.com/photos/nicksart/4821509371/ (debugging)

Thanks to

Page 66: Make Plone Search Act Like Google Using Solr

Check out

sixfeetup.com/demos

Questions?