66
Small wins In a small time with Apache Solr = Upayavira =

Dev8d Apache Solr Tutorial

Embed Size (px)

Citation preview

Page 1: Dev8d Apache Solr Tutorial

Small wins In a small time with Apache Solr

= Upayavira =

Page 2: Dev8d Apache Solr Tutorial

Who am I?

My (Buddhist) name is Upayavira

Consultant with Sourcesense, specialising in search and operational technologies

A member of the Apache Software Foundation

Page 3: Dev8d Apache Solr Tutorial

Who are Sourcesense?

Open Source integrator, specialising in:

Search

Business Intelligence

Content Management

Application Lifecycle Management

Offices in London, Amsterdam, Milan and Rome

Page 4: Dev8d Apache Solr Tutorial

Committers and ContributorsSearch:

Lucene/Solr – contributor

Hibernate Search – committer

Lucene Infinispan integration – lead developer

Apache UIMA – committer

CMS:

Apache Chemistry – contributor

Apache Jackrabbit – contributor

JBoss GateIn Portal – committer

OpenSSO-Alfresco - contributor

Page 5: Dev8d Apache Solr Tutorial

What is Lucene?

Lucene is a Java information retrieval library

Provides free text search facilities

Started in 2000, by Doug Cutting

A project of the Apache Software Foundation

It is designed to be embedded in Java apps

Page 6: Dev8d Apache Solr Tutorial

What is Solr?

Solr is an enterprise search server based on Lucene

Wraps Lucene with a RESTful web interface

Provides configurable schema

Provides replication functionality

Page 7: Dev8d Apache Solr Tutorial

Solr Design

Solrinstance

UpdateRequestHandler

SearchHandler

User queries

Luceneindex

contentapplication

Page 8: Dev8d Apache Solr Tutorial

Prerequisites

Java, preferably Java 6

Latest Apache Solr, currently 3.3

http://www.sourcesense.com/dev8d-solr.zip

Page 9: Dev8d Apache Solr Tutorial

PrerequisitesExtract your Solr distribution

At a command prompt:

cd into the unzipped distribution directory

cd into the example directory

Enter: java -jar start.jar

Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works

Unpack your dev8d-solr.zip file

At another command prompt, cd into your dev8d-solr directory

Page 10: Dev8d Apache Solr Tutorial

Checking Solr Works

Visit http://localhost:8983/solr/admin/

You should see the Solr admin page.

Click statistics link

You'll see NumDocs: 0

There's nothing in the index, so searches won't show much

So we need to index some sample content

Page 11: Dev8d Apache Solr Tutorial

Indexing Sample Content

In your dev8d-solr directory (extracted from the zip), at a command prompt:

Java -jar post.jar wikipedia-basic.xml

Page 12: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=*:*

Page 13: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=computers

Page 14: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=computer systems

Page 15: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=computers OR systems

Page 16: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=computers AND systems

Page 17: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q="computer systems"

Page 18: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q="computer systems"~10

Page 19: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=computers NOT data

Page 20: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select?q=computers -data

Page 21: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select/?q=computers&fl=title

Page 22: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select/?q=computers&fq=author:yobot

Page 23: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author

Page 24: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title

Page 25: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select/?q=title:system&fl=title

Page 26: Dev8d Apache Solr Tutorial

Searching

http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc

Page 27: Dev8d Apache Solr Tutorial

Advanced Searching

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author

Page 28: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author

Advanced Searching

Page 29: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex

Advanced Searching

Page 30: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count

Advanced Searching

Page 31: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincount=2

Advanced Searching

Page 32: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3

Advanced Searching

Page 33: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3&debugQuery=true

Advanced Searching

Page 34: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select?q=computer&wt=json

Advanced Searching

Page 35: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select?q=computer&wt=javabin

Advanced Searching

Page 36: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text

Advanced Searching

Page 37: Dev8d Apache Solr Tutorial

Look for list after main responses

Nothing there.

Edit 'text' field in schema.xml, changing it to stored=”true”

Reindex (java -jar post.jar wikipedia-enhanced.xml)

Advanced Searching

Page 38: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text

You should now see highlighted content

Advanced Searching

Page 39: Dev8d Apache Solr Tutorial

http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text&hl.simple.pre=<b>&hl.simple.post=</b>

Advanced Searching

Page 40: Dev8d Apache Solr Tutorial

Indexing

Page 41: Dev8d Apache Solr Tutorial

Indexing

Load wikipedia-basic.xml into a text editor or web browser

Load wikipedia-enhanced.xml into a text editor or browser

Load example/solr/conf/schema.xml into a text editor

Page 42: Dev8d Apache Solr Tutorial

Indexing

schema.xml defines field types and fields used in Solr

Equivalent to your database schema in a RDBMS

Page 43: Dev8d Apache Solr Tutorial

Indexing

Change this field in schema.xml to be of type “string” and add multiValued=”true” for each.

<field name="category" type="string" indexed="true" stored="true" multiValued="true"/>

Page 44: Dev8d Apache Solr Tutorial

Indexing

Now add this to the <fields> section of solrconfig.xml:

<field name="source" type="string" indexed="true" stored="true" multiValued="false"/>

<field name="text_general" type="text_general" indexed="true" stored="true" multiValued="true"/>

Now search for the “text_general” field type definition, further up in the file.

Page 45: Dev8d Apache Solr Tutorial

Indexing

At the bottom of solrconfig.xml add the following:

<copyField source="text" dest="text_general"/>

Page 46: Dev8d Apache Solr Tutorial

Indexing

In your window where Solr is running, press CTRL+Cto stop Solr, and then restart it with:

java -jar start.jar

Page 47: Dev8d Apache Solr Tutorial

Indexing

At your command prompt, in the dev8d directory, execute:

java -jar post.jar wikipedia-enhanced.xml

Page 48: Dev8d Apache Solr Tutorial

More Advanced Searching

http://localhost:8983/solr/select?q=computer%20AND%20babbage&facet=true&facet.field=category&facet.mincount=1

Page 49: Dev8d Apache Solr Tutorial

More Advanced Searching

http://localhost:8983/solr/terms?terms.fl=text&terms=true&terms.limit=20

Page 50: Dev8d Apache Solr Tutorial

More Advanced Searching

http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20

Page 51: Dev8d Apache Solr Tutorial

More Advanced Searching

http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20&terms.prefix=at

Page 52: Dev8d Apache Solr Tutorial

Indexing

Index segmentation: merge factor

Index optimisation: <optimize/>

Page 53: Dev8d Apache Solr Tutorial

schema.xml

Equivalent to RDBMS schema

Seen it before!

Let's look through it in more detail...

Page 54: Dev8d Apache Solr Tutorial

solrconfig.xml

Configures the components available to a Solr system

Specific to a Solr 'core', as is schema.xml

In same directory as schema.xml

Let's look through it in more detail...

Page 55: Dev8d Apache Solr Tutorial

Hints and Tips

Page 56: Dev8d Apache Solr Tutorial

Hints and Tips: Prototyping

Velocity response writer (/browse)

Data Import Handler (DIH)

XSLTUpdateRequestHandler (Solr 3.4)

Page 57: Dev8d Apache Solr Tutorial

Hints and Tips: Architecture

A RESTful service

An index, not a data store: keep ability to re-index

Don't make Solr do things you wouldn't have MySQL do

Page 58: Dev8d Apache Solr Tutorial

Hints and Tips: Security

There is none

So use a firewall

Beware what Solr internals you expose:

Query syntax

qt= parameter (e.g. qt=update)

Fake document level security with role fields and filter queries

Page 59: Dev8d Apache Solr Tutorial

Hints and Tips: Scaling

Index too large: distributed search

Too much traffic: replicated search

How much is too much: unanswerable!

Page 60: Dev8d Apache Solr Tutorial

Time for Questions

And your questions are...

Page 61: Dev8d Apache Solr Tutorial

thank [email protected]

Page 62: Dev8d Apache Solr Tutorial

Solr Host Configuration

shard 1

shard 2

shard 3

searches

Page 63: Dev8d Apache Solr Tutorial

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

Page 64: Dev8d Apache Solr Tutorial

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

load balancer

Page 65: Dev8d Apache Solr Tutorial

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

load balancer

shard 1

shard 2

shard 3

co-ordinator

Page 66: Dev8d Apache Solr Tutorial

Solr Host Configuration

shard 1

shard 2

shard 3

co-ordinator

load balancer

shard 1

shard 2

shard 3

co-ordinator