View
108
Download
2
Category
Tags:
Preview:
DESCRIPTION
An event conducted at DBG about Apache Solr as part of Coffee at DBG program.
Citation preview
Apache SolrPrepared by
Nithin S, Sajin TMDigital Brand Group
Apache solr is a search server written in Java using the java search library “lucene”.
Open source Get results using web service as JSON/XML UTF-8 support
Introduction
Ebay Hp Guardian Cisco At&t Intoit Ford http://wiki.apache.org/solr/PublicServers
Who uses Solr?
Text based library in Java Fast , feature rich with active apache
development community Inverted Index mechanism - Index the
content related to the terms/words
What is Lucene?
Server
Solr 4.3.0 Java server containers ( Tomcat/Jetty Servers ) Java 1.6 and above
Client
Any system which can post and get data through http
Requirements
Solr Model
Schema – can consider as a db table
Core - schema container
Collection – multiple core handling
DIH - Data import handler
Request handler - StandardRequestHandler , DisMaxRequestHandler (multiple fields), IndexInfoRequestHandler
Response handler - xml , json , python,ruby
Common terms
Start Solr java -jar start.jar
This will start up t he Jetty application server on port 8983, and use your terminal to display the logging information from Solr.
Index your data java -jar post.jar *.xml
Interface http://localhost:8983/solr
Start server
The Solr Home directory typically contains the following sub-directories...
conf/ This directory is mandatory and must contain your solrconfig.xml and schema.xml. Any other optional configuration files would also be kept here.
data/ This directory is the default location where Solr will keep your index, and is used by the replication scripts for dealing with snapshots. You can override this location in the conf/solrconfig.xml. Solr will create this directory if it does not already exist.
lib/ This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve any "plugins" specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...). Alternatively you can use the <lib> syntax in conf/solrconfig.xml to direct Solr to your plugins. See the example conf/solrconfig.xml file for details.
Basic Directory Structure
solr-php-client Pecl extention for solr
PHP Clients
Structuring Solr schema
Field options
Indexed Stored multiValued compressed
add/update - allows you to add or update a document to Solr. Additions and updates are not available for searching until a commit takes place.
commit - tells Solr that all changes made since the last commit should be made available for searching.
optimize - restructures Lucene's files to improve performance for searching. Optimization is generally good to do when indexing has completed. If there are frequent updates, you should schedule optimization for low-usage times. An index does not need to be optimized to work properly. Optimization can be a time-consuming process.
delete - can be specified by id or by query. Delete by id deletes the document with the specified id; delete by query deletes all documents returned by a query.
Indexing options
Supported formats XML, JSON, CSV, or javabin.Supported document types are Microsoft office docs, PDF’s
curl http://localhost:8983/solr/collection1/update/csv -H Content-type:text/csv; charset=utf-8 --data-binary @D:/Projects/solr-4.3.0/example/exampledocs/books.csv
http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3E
Upload schema data
Query parametersq The query to search with in Solr. See "Lucene QueryParser
Syntax" in Resources for a full description of the syntax. Sorting information can be included by appending a semi-colon and the name of an indexed, non-tokenized field (explained below). The default sort is score desc, which means sort by descending score.
q=myField:Java AND otherField:developerWorks; date ascThis query searches the two fields specified and sorts the results based on a date field.
start Specifies the starting offset into the result set. Useful for paging through results. The default value is 0.
start=15Returns results starting with the fifteenth ranked result.
rows The maximum number of documents to return. The default value is 10.
rows=25
fq Provide an optional filtering query. Results of the query are restricted to searching only those results returned by the filter query. Filtered queries are cached by Solr. They are very useful for improving the speed of complex queries.
Any valid query that could be passed in the q parameter, not including sort information.
hl When hl=true, highlight snippets in the query response. Default is false. See the Solr Wiki section on highlighting parameters for more options (in Resources).
hl=true
fl Specify as a comma-separated list the set of Fields that should be returned in the document results. "*" is the default and means all fields. "score" indicates the score should be returned as well.
*,score
Full text search http://localhost:8983/solr/select?q=Searchtext
Search only within a field http://localhost:8983/solr/select?q=fieldname:searchtext
Control which fields are displayed in result http://localhost:8983/solr/select?q=video&fl=id,category
Provide ranges to fields http://localhost:8983/solr/select?q=price:[0 TO400]&fl=id,name,price
More like this (MLT) http://localhost:8983/solr/select?
q=Searchtext&mlt=true&mlt.fl=headline&mlt.mindf=1&mlt.mintf=1&fl=id,score&rows=100
More information on how this works and the options available can be found at http://wiki.apache.org/solr/MoreLikeThis
Search
Sample search result
Faceted searchhttp://localhost:8983/solr/query?q=camera&facet=true&facet.field=manu
Features Hit Highlight Auto suggest Spell suggestion Spatial search
Removing Data from Indexcurl http://localhost:8983/solr/collection1/update -H "Content-Type: text/xml“ --data-binary “<delete><query>*:*</query></delete>”
Thank you
Recommended