72
Full Text Search with Apache Solr Pittaya Sroilong [email protected]

Using Apache Solr

  • Upload
    pittaya

  • View
    17.545

  • Download
    1

Embed Size (px)

DESCRIPTION

intro to full text search solution, Apache Solr

Citation preview

Page 1: Using Apache Solr

Full Text Search with Apache Solr

Pittaya [email protected]

Page 2: Using Apache Solr

Who am I?

Page 3: Using Apache Solr
Page 4: Using Apache Solr

Solr?

Page 5: Using Apache Solr
Page 6: Using Apache Solr

Not her!

Page 7: Using Apache Solr

But a search server

Page 8: Using Apache Solr

based on Lucene

Page 9: Using Apache Solr

Lucene?

Page 10: Using Apache Solr

Full-text search library

Page 11: Using Apache Solr

100% java :-(

Page 12: Using Apache Solr

Solr is based on Lucene

Page 13: Using Apache Solr

XML/HTTP, JSON interface

Page 14: Using Apache Solr

Open Source

Page 15: Using Apache Solr

Shield us from using Java

:-)

Page 16: Using Apache Solr

Who use Solr/Lucene?

Page 17: Using Apache Solr

Who use Solr/Lucene?

Page 18: Using Apache Solr

What is our problem?

Page 19: Using Apache Solr

How do we implement this?

Page 20: Using Apache Solr

SELECT * FROM post WHERE topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’ ORDER BY id DESC

Page 21: Using Apache Solr

SELECT * FROM post WHERE (topic LIKE ‘%aoi%’ OR author LIKE ‘%aoi%’) OR (topic LIKE ‘%miyabi%’ OR author LIKE ‘%miyabi%’) ORDER BY id DESC

Page 22: Using Apache Solr

Full table scan=

Performance killer

Page 23: Using Apache Solr

No search scoring

Page 24: Using Apache Solr

RDBMS isn’t designed to do this

Page 25: Using Apache Solr

Use the right tool!

Page 26: Using Apache Solr

Indexer

Web AppSolrLucene

Update index

Query

Result

Page 27: Using Apache Solr

1

Page 28: Using Apache Solr

De!ne schema.xml

Page 29: Using Apache Solr

<field name="id" type="string" indexed="true" stored="true" /><field name="fullname" type="string" indexed="true" stored="true" /><field name="position" type="string" indexed="true" stored="true" /><field name="tag" type="stringi" indexed="true" stored="true" multiValued="true" />

Page 30: Using Apache Solr

2

Page 31: Using Apache Solr

Deploy on any J2EE container

Page 32: Using Apache Solr

Tomcat, Jetty, etc.

Page 33: Using Apache Solr

3

Page 34: Using Apache Solr

Index documents

Page 35: Using Apache Solr

Document format

<add><doc> <field name=”id”>555</field> <field name=”fullname”>Kaka</field> <field name=”position”>Midfielder</field> <field name=”tag”>AC Milan</field> <field name=”tag”>Brazil</field></doc></add>

Page 36: Using Apache Solr

Post to Solrhttp://<host>/solr/update

Page 37: Using Apache Solr

Any language that can do HTTP POST

Page 38: Using Apache Solr

PHP, Perl, Python

Page 39: Using Apache Solr

cURL

Page 40: Using Apache Solr

Commit<commit />

Page 41: Using Apache Solr

4

Page 42: Using Apache Solr

Search

Page 43: Using Apache Solr

Query fromhttp://<host>/solr/select

Page 44: Using Apache Solr

Use Solr query syntax

Page 46: Using Apache Solr

Response in XML or JSON (con!gurable)

Page 47: Using Apache Solr

<response> <result numFound=”46” start=”0”> <doc> <str name=”fullname”>Sergio Ramos</str> <str name=”position”>Defender</str> <str name=”tag”>Real Madrid</str> <str name=”tag”>Spain</str> </doc> <doc> <str name=”fullname”>Diego Forlan</str> <str name=”position”>Striker</str> <str name=”tag”>Atletico Madrid</str> <str name=”tag”>Uruguay</str> </doc> </result></response>

Page 48: Using Apache Solr

&wt=json

Page 49: Using Apache Solr

{ “result”: { “numFound”: 46, “start”: 0, “docs” : [ { “fullname”: “Sergio Ramos”, “position”: “Defender”, “tag”: [“Real Madrid”, “Spain”] }, { “fullname”: “Diego Forlan”, “position”: “Striker”, “tag”: [“Atletico Madrid”, “Uruguay”] } ] }}

Page 50: Using Apache Solr

Query examples

Page 51: Using Apache Solr

• David Pizzarro

• Equiv: David OR Pizzarro

• Default operator is “OR” (con"gurable)

• Result: David Villa, David Pizzarro, Claudio Pizzarro, David Seaman

Page 52: Using Apache Solr

• +David +tag:Roma

• Equiv: David AND tag:Roma

• Result: David Pizzarro

Page 53: Using Apache Solr

• +David +position:(Striker OR Mid"elder)

• Result: David Villa, David Pizzarro

Page 54: Using Apache Solr

Updating

Page 55: Using Apache Solr

Post new document tohttp://<host>/solr/update

Page 56: Using Apache Solr

Deleting

Page 57: Using Apache Solr

<delete> <id>345</id> </delete>

Page 58: Using Apache Solr

<delete><query>tag:Brazil</query></delete>

Page 59: Using Apache Solr

<delete> <query>*:*</query> </delete>

Page 60: Using Apache Solr

Thai support

Page 61: Using Apache Solr

fwdder.com

Page 62: Using Apache Solr

Sharing forward mails

Page 63: Using Apache Solr
Page 64: Using Apache Solr
Page 65: Using Apache Solr

Use customized !eld in schema.xml

Page 66: Using Apache Solr

<fieldType name="html_th" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> <filter class="solr.ThaiWordFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>

Page 67: Using Apache Solr

<field name="id" type="string" indexed="true" stored="true" /><field name="title" type="html_th" indexed="true" stored="true" /><field name="detail" type="html_th" indexed="true" stored="true" /><field name="tag" type="stringi" indexed="true" stored="true" multiValued="true" /><field name="userid" type="integer" indexed="false" stored="true" />

Page 68: Using Apache Solr

Index analyzer

Page 69: Using Apache Solr

Debugging

Page 70: Using Apache Solr

&debugQuery=on

Page 72: Using Apache Solr

Q & A