SOLR Introduction
Lucence / SOLR
SOLR Introduction
Why do we need a Search Engine ? What is Lucene/SOLR ? Advantages of SOLR SOLR Architecture Query Syntax Working with SOLR: Feed data, query data SOLR installation SOLR configuration
Why do we need a Search Engine ?
Need a Search Engine: Lucene / SOLR
DatabaseGoogle, Bing, Yahoo, …
Can not access to our data
Yes, that’s normal wayThe problem is response time
What is Lucene/SOLR ?
Apache Lucene is a free/open source information retrieval software library.
Lucene is just an indexing and search library Lucene supports: Java, Delphi, Perl, C#, C++, Python,
Ruby, and PHP
Lucene
What is Lucene/SOLR ?
Solr is wrapper of Lucene for Java Solr is a web application (WAR) which can be deployed in
any servlet container, e.g. Jetty, Tomcat Solr is a REST service
Solr
SOLR Introduction
Open source/free Administration Interface Rich Document Parsing and Indexing (PDF, Word, HTML,
etc) Full-Text Search Faceted Search and Filtering Multi Server support
The comparison of Search Engines:http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/
Advantages of SOLR
SOLR architecture
SOLR Shard
Query Syntax
title:foo - Search for word "foo" in the title field.
title:"foo bar” - Search for phrase "foo bar" in the title field.
-title:bar - Search everything, except "bar" in the title field.
Keyword matching
Query Syntax
title:foo* - Search for any word that starts with "foo" in the title field.
title:foo*bar - Search for any word that starts with "foo" and ends with bar in the title field.
*:* - Search every thing
Wildcard matching
Query Syntax
"foo bar"~number
Number = 0, exactly matchNumber = 1, The result may be “bar foo”
Proximity matching
Query Syntax
field:[a TO z] - Search the field has value in range [a->z]
field:[* TO 100] - Search all values less than or equal to 100
field:[100 TO *] - Search all values greater than or equal to 100
field:[* TO *] - Matches all documents with the field
Range searches
Query Syntax
_query_:”field:*lap” OR _query_:”field:*tran”
_query_:”{!dismax qf=somefield} cat dog”
Nested query
Query Syntax
Join{!join from=inner-id to=outer-id}zzz:vvvSQL
SELECT xxx, yyy FROM collection1 WHERE outer-id IN (SELECT inner-id FROM collection1 where zzz = "vvv")
Query Syntax
q=inStock:true&facet=true&facet.field=cat&facet.limit=5
Faceted Search
<response><responseHeader><status>0</status><QTime>4</QTime></responseHeader><result numFound="12" start="0"/><lst name="facet_counts"> <lst name="facet_queries"/> <lst name="facet_fields"> <lst name="cat"> <int name="electronics">10</int> <int name="memory">3</int> <int name="drive">2</int> <int name="hard">2</int> <int name="monitor">2</int> </lst> </lst></lst></response>
SolrJ
Feed data
// make a connection to Solr serverSolrServer server = new HttpSolrServer("http://localhost:8080/solr/");// prepare a docfinal SolrInputDocument doc1 = new SolrInputDocument();doc1.addField("id", 1);doc1.addField("firstName", "First Name");doc1.addField("lastName", "Last Name");final Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();docs.add(doc1);// add docs to Solrserver.add(docs);server.commit();
SolrJ
Query data
final SolrQuery query = new SolrQuery();query.setQuery("*:*");query.addSortField("firstName", SolrQuery.ORDER.asc);final QueryResponse rsp = server.query(query);final SolrDocumentList solrDocumentList = rsp.getResults();for (final SolrDocument doc : solrDocumentList) {
final String firstName = (String) doc.getFieldValue("firstName");
final String id = (String) doc.getFieldValue("id"); }
SOLR Introduction
SOLR installation
Ref:http://wiki.apache.org/solr/SolrInstallhttp://wiki.apache.org/solr/SolrTomcathttp://lucene.apache.org/solr/4_2_1/tutorial.html
Prerequisite:Tomcat (7) http://tomcat.apache.org/JDK 1.6SOLR 4.2.1 http://lucene.apache.org/solr/
SOLR Introduction
• Extract solr-4.2.1.zip to (D:\Project\solr_web\solr-4.2.1)• Copy resource\solr-4.2.1\examples\solr to D:\Project\solr_web\solr = SOLR_HOME• Copy resource\solr-4.2.1\dist\solr-4.2.1.war to SOLR_HOME and rename to solr.war• Open the SOLR_HOME\collection1\conf\solrconfig.xml and modify the <dataDir>
<dataDir>${solr.data.dir:D:/Project/sorl_web/solr/collection1/data}</dataDir>• Create a Tomcat Context (solr.xml) file like this:<?xml version="1.0" encoding="utf-8"?> <Context docBase="D:/Project/solr_web/solr/solr.war" debug="0“ crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="D:/Project/solr_web/solr" override="true"/></Context>• Copy this file (solr.xml) to tomcat.7.0.35\conf\Catalina\localhost• Start Tomcat• Open the SOLR dashboard with address: http://localhost:8080/sorl/#/
SOLR Introduction
SOLR Configuration
Ref: http://wiki.apache.org/solr/SolrConfigXmlhttp://wiki.apache.org/solr/SchemaXml
In the configuration of a Solr server, we need at least 2 xml files: solrconfig.xml and schema.xml
Solrconfig.xml: contains the common configuration of a Core: size of memory, data path, transaction, …
Schema.xml: contains the definitions of data: structure, data type, fields name …
SOLR Introduction
SOLR Configuration
Schema.xml
field : a field will be indexed by solr<field name="firstName" type="string" indexed="true" stored="true"/>
dynamicField: like a field but the name is not specified yet<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
name="*_i" will match any field ending in _i (like myid_i, z_i)