21
SOLR Introduction 1 Lucence / SOLR

Solr introduction

Embed Size (px)

DESCRIPTION

A basic overview about Solr

Citation preview

Page 1: Solr introduction

SOLR Introduction  

Lucence / SOLR

Page 2: Solr introduction

SOLR Introduction  

Why do we need a Search Engine ? What is Lucene/SOLR ? Advantages of SOLR SOLR Architecture Query Syntax Working with SOLR: Feed data, query data SOLR installation SOLR configuration

Page 3: Solr introduction

Why do we need a Search Engine ?  

Need a Search Engine: Lucene / SOLR

DatabaseGoogle, Bing, Yahoo, …

Can not access to our data

Yes, that’s normal wayThe problem is response time

Page 4: Solr introduction

What is Lucene/SOLR ?  

Apache Lucene is a free/open source information retrieval software library.

Lucene is just an indexing and search library Lucene supports: Java, Delphi, Perl, C#, C++, Python,

Ruby, and PHP

Lucene

Page 5: Solr introduction

What is Lucene/SOLR ?  

Solr is wrapper of Lucene for Java Solr is a web application (WAR) which can be deployed in

any servlet container, e.g. Jetty, Tomcat Solr is a REST service

Solr

Page 6: Solr introduction

SOLR Introduction  

Open source/free Administration Interface Rich Document Parsing and Indexing (PDF, Word, HTML,

etc) Full-Text Search Faceted Search and Filtering Multi Server support

The comparison of Search Engines:http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/

Advantages of SOLR

Page 7: Solr introduction

SOLR architecture  

Page 8: Solr introduction

SOLR Shard  

Page 9: Solr introduction

Query Syntax  

title:foo - Search for word "foo" in the title field.

title:"foo bar” - Search for phrase "foo bar" in the title field.

-title:bar - Search everything, except "bar" in the title field.

Keyword matching

Page 10: Solr introduction

Query Syntax  

title:foo* - Search for any word that starts with "foo" in the title field.

title:foo*bar - Search for any word that starts with "foo" and ends with bar in the title field.

*:* - Search every thing

Wildcard matching

Page 11: Solr introduction

Query Syntax  

"foo bar"~number

Number = 0, exactly matchNumber = 1, The result may be “bar foo”

Proximity matching

Page 12: Solr introduction

Query Syntax  

field:[a TO z] - Search the field has value in range [a->z]

field:[* TO 100] - Search all values less than or equal to 100

field:[100 TO *] - Search all values greater than or equal to 100

field:[* TO *] - Matches all documents with the field

Range searches

Page 13: Solr introduction

Query Syntax  

_query_:”field:*lap” OR _query_:”field:*tran”

_query_:”{!dismax qf=somefield} cat dog”

Nested query

Page 14: Solr introduction

Query Syntax  

Join{!join from=inner-id to=outer-id}zzz:vvvSQL

SELECT xxx, yyy FROM collection1 WHERE outer-id IN (SELECT inner-id FROM collection1 where zzz = "vvv")

Page 15: Solr introduction

Query Syntax  

q=inStock:true&facet=true&facet.field=cat&facet.limit=5

Faceted Search

<response><responseHeader><status>0</status><QTime>4</QTime></responseHeader><result numFound="12" start="0"/><lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="cat"> <int name="electronics">10</int> <int name="memory">3</int> <int name="drive">2</int> <int name="hard">2</int> <int name="monitor">2</int> </lst> </lst></lst></response>

Page 16: Solr introduction

SolrJ  

Feed data

// make a connection to Solr serverSolrServer server = new HttpSolrServer("http://localhost:8080/solr/");// prepare a docfinal SolrInputDocument doc1 = new SolrInputDocument();doc1.addField("id", 1);doc1.addField("firstName", "First Name");doc1.addField("lastName", "Last Name");final Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();docs.add(doc1);// add docs to Solrserver.add(docs);server.commit();

Page 17: Solr introduction

SolrJ  

Query data

final SolrQuery query = new SolrQuery();query.setQuery("*:*");query.addSortField("firstName", SolrQuery.ORDER.asc);final QueryResponse rsp = server.query(query);final SolrDocumentList solrDocumentList = rsp.getResults();for (final SolrDocument doc : solrDocumentList) {

final String firstName = (String) doc.getFieldValue("firstName");

final String id = (String) doc.getFieldValue("id"); }

Page 18: Solr introduction

SOLR Introduction  

SOLR installation

Ref:http://wiki.apache.org/solr/SolrInstallhttp://wiki.apache.org/solr/SolrTomcathttp://lucene.apache.org/solr/4_2_1/tutorial.html

Prerequisite:Tomcat (7) http://tomcat.apache.org/JDK 1.6SOLR 4.2.1 http://lucene.apache.org/solr/

Page 19: Solr introduction

SOLR Introduction  

• Extract solr-4.2.1.zip to (D:\Project\solr_web\solr-4.2.1)• Copy resource\solr-4.2.1\examples\solr to D:\Project\solr_web\solr = SOLR_HOME• Copy resource\solr-4.2.1\dist\solr-4.2.1.war to SOLR_HOME and rename to solr.war• Open the SOLR_HOME\collection1\conf\solrconfig.xml and modify the <dataDir>

<dataDir>${solr.data.dir:D:/Project/sorl_web/solr/collection1/data}</dataDir>• Create a Tomcat Context (solr.xml) file like this:<?xml version="1.0" encoding="utf-8"?> <Context docBase="D:/Project/solr_web/solr/solr.war" debug="0“ crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="D:/Project/solr_web/solr" override="true"/></Context>• Copy this file (solr.xml) to tomcat.7.0.35\conf\Catalina\localhost• Start Tomcat• Open the SOLR dashboard with address: http://localhost:8080/sorl/#/

Page 20: Solr introduction

SOLR Introduction  

SOLR Configuration

Ref: http://wiki.apache.org/solr/SolrConfigXmlhttp://wiki.apache.org/solr/SchemaXml

In the configuration of a Solr server, we need at least 2 xml files: solrconfig.xml and schema.xml

Solrconfig.xml: contains the common configuration of a Core: size of memory, data path, transaction, …

Schema.xml: contains the definitions of data: structure, data type, fields name …

Page 21: Solr introduction

SOLR Introduction  

SOLR Configuration

Schema.xml

field : a field will be indexed by solr<field name="firstName" type="string" indexed="true" stored="true"/>

dynamicField: like a field but the name is not specified yet<dynamicField name="*_i" type="int" indexed="true" stored="true"/>

name="*_i" will match any field ending in _i (like myid_i, z_i)