41

SphinxSE with MySQL

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: SphinxSE with MySQL
Page 2: SphinxSE with MySQL

Introduction to Sphinx .

Sphinx Searching and Sorting Features.

Sphinx Implementation.

Demo.

Page 3: SphinxSE with MySQL

Introduction to Sphinx .

Sphinx Searching and Sorting Features.

Sphinx Implementation.

Demo.

Page 4: SphinxSE with MySQL
Page 5: SphinxSE with MySQL

Open Source Search Engine.

Developed by Andrew Aksyonoff

Integrates well with MySQL.

Provides greatly improved full-text search.

Specially designed for indexing databases.

Page 6: SphinxSE with MySQL
Page 7: SphinxSE with MySQL
Page 8: SphinxSE with MySQL

1. Search on 500 MB of docs.

2. Docs are 3,000.000 in count.

3. Looking for “internet web design (match any)”.

4. Returning 134.000 docs.

Page 9: SphinxSE with MySQL
Page 10: SphinxSE with MySQL
Page 11: SphinxSE with MySQL

It has Two standalone programs:o Indexer – Pulls data from DB, builds indexes.o Searchd- Uses indexes and answers queries.

Clients interact with searchd through:o Via native API’s: PHP, Python, Perl, Ruby, and Java.o Via SphinxSE.

Indexer periodically rebuilds the indexes:o Typically using cron jobs.o Searching works ok during rebuilds (Live Updates).

Page 12: SphinxSE with MySQL

Sphinx documents = Records in DB.

I.Document = It just like ROW in DB and it has its own UNIQUE ID.

II.Each Document comprises of Fields and Attributes.

III.Fields are the columns on which we want to search.

IV.Attributes may be used for filtering, sorting, grouping.

Page 13: SphinxSE with MySQL

1.Sphinx Search Engine Returns only Unique Document ID’s.

2.This means if we Search for Dominos we get corresponding rows

UNIQUE ID possessing it.

3. Hence after searching returns results, you will still likely NEED TO FETCH DETAILS of documents in your FINAL RESULT PAGE.

Page 14: SphinxSE with MySQL

Introduction to Sphinx .

Sphinx Searching and Sorting Features.

Sphinx Implementation.

Demo.

Page 15: SphinxSE with MySQL

SELECT id

FROM sphinx_table

WHERE

query=‘dominos; -- thing which you want to search

mode = ext2; -- searching mode

weights = 1000,100,10; --weight distribution

sort = attr_asc:group_id;’; --sorting type

Page 16: SphinxSE with MySQL

SPH_MATCH_ALL : match all keywords.

SPH_MATCH_ANY : match any keywords.

SPH_MTACH_BOOLEAN : no relevance, implicit Boolean AND between keywords

if not specified otherwise.

1. hello & world

2. hello | world

3. hello –world

SPH_MATCH_PHRASE : treats query as a phrase and requires a perfect match.

SPH_MATCH_EXTENDED : this has been super ceded by SPH_MATCH_EXTENDED2.

SPH_MATCH_EXTENDED2 : it provide varied functionalities.

Page 17: SphinxSE with MySQL

FIELD SEARCH OPERATOR : @title hello @body world.

QUORUM MATCHING OPERATOR : “world is wonderful place”/3.

PROXIMITY SEARCH OPERATOR : “hello world”~10.

STRICT ORDER OPERATOR : black << cat

Page 18: SphinxSE with MySQL

Phrase Ranking : Higher preference to Documents possessing matching phrase like “hello world”.

Statistical Ranking : Here more preference is giving to word frequency i.e.

Document containing more number of “hello” and/or “world” is given more weightage.

Page 19: SphinxSE with MySQL

SPH_MATCH_BOOLEAN : No weighting performed.

SPH_MATCH_ALL and SPH_MATCH_PHRASE : Uses Phrase Ranking.

SPH_MATCH_ANY : Phrase ranks * Big value + Statistical ranking ( Here we multiply with big value to guarantee higher phrase rank even if it’s field weight is low ).

SPH_MATCH_EXTENDED : (Phrase Rank + BM25)*1000.

Personalized Weighting : This can be done using “weights “ keyword in your Sphinx Query. This is generally used in the case when we want more preference between column to be searched. E.g. weights = 1,2,3; --this possible in mode=ext2.

Page 20: SphinxSE with MySQL

SPH_SORT_RELEVANCE : Sorts by Relevance in DESC order.

SPH_SORT_ATTR_DESC : Sorts by an Attribute in DESC order.

SPH_SORT_ATTR_ASC : Sorts by an Attribute in ASC order.

SPH_SORT_TIME_SEGMENTS : Sorts by (hour/day/week/month) in DESC order.

SPH_SORT_EXTENDED : Here we can SPECIFY the COLUMNS on which we are applying our SEARCH for KEYWORDS for sorting order.

SPH_SORT_EXPR : Allows sorting using a mathematical equation involving column.

Page 21: SphinxSE with MySQL

Introduction to Sphinx .

Sphinx Searching and Sorting Features.

Sphinx Implementation.

Demo.

Page 22: SphinxSE with MySQL

Installation is usually straightforward :

REQUIREMENT:

A Good working C++ compiler.A Good Make Program.

STEPS: $./configure - - prefix /path - -with-mysql - - with-pgsql $make $make install

Page 23: SphinxSE with MySQL

Checking SphinxSE Installation

Page 24: SphinxSE with MySQL

There are 2 components that we need to setup before Sphinx is ready for searching:

•Sphinx Table

•Configuration File (e.g.: file_name.conf )

Page 25: SphinxSE with MySQL

Requirements:

1.The data types of the first 3 columns must be INT,INT,VARCHAR.

which will be mapped to document id, match weight and the search query.

2.Query column must be indexed and no other column must be indexed.

3.All other attributes in the source comes as columns.

CREATE TABLE sphinx_table

(

id int not null,

Weight int not null,

Query varchar(255) not null,

Key (query)

)ENGINE=SPHINX CONNECTION=‘sphinx://localhost:3313/city_search_cust_mess’

Page 26: SphinxSE with MySQL

Now in a Configuration File there are 4 section to configure which are as follows:

• Source (multiple)

• Index (multiple)

• Indexer

• Searchd

Page 27: SphinxSE with MySQL

Now in a Configuration File there are 4 section to configure which are as follows:

• Source (multiple)

• Index (multiple)

• Indexer

• Searchd

Page 28: SphinxSE with MySQL

Following are some of the options available in the source section of the configuration file:

TYPE:

type: data source type.

possible options: mysql,pgsql,xmlpipe,xmlpipe2.

Connection Info:

sql_host : SQL server host to connect (Mandatory).

sql_port : SQL server IP to connect ( Default 3306).

sql_user : SQL user to use when connecting to sql_host (Mandatory).

sql_pass : SQL user password to use when connecting to sql_host (Mandatory).

sql_db : SQL DB to be used.

sql_sock : socket name to connect to for local SQL servers.

Page 29: SphinxSE with MySQL

Queries Info:

mysql_query_pre : pre-fetch query , or pre-query.

eg: sql_query_pre= SET NAMES utf8

sql_query : main document fetch query.

sql_query_post : Post-fetch query.

e.g.: sql_query_post= DROP TABLE my_tmp_table

sql_query_info : Document info query. (similar to comment in MySQL)

Attributes Info:

sql_attr_xxx: attribute declaration.(xxx : uint,bigint,float,str2ordinal,timestamp).

Page 30: SphinxSE with MySQL

Now in a Configuration File there are 4 section to configure which are as follows:

• Source (multiple)

• Index (multiple)

• Indexer

• Searchd

Page 31: SphinxSE with MySQL

type: index type .optional (possible option: local , distributed)

source: adds document source to local index. Multi-value.

path: Index files path and file name (without extension).

docinfo : Document attribute values ( inline , extern ) storage mode.

mlock : Memory locking for cached data . (Optional default 0).

min_word_len: minimum indexed word length (optional default 1).

Charset type: character set encoding type

Page 32: SphinxSE with MySQL

Stemming Options:

morphology : A list of morphology preprocessors to apply.

e.g.: cars = car ; running =run.

Stopwords : stopwords file list (space seperated).

e.g.: the,is,are,an,a,etc….

Page 33: SphinxSE with MySQL

Now in a Configuration File there are 4 section to configure which are as follows:

• Source (multiple)

• Index (multiple)

• Indexer

• Searchd

Page 34: SphinxSE with MySQL

mem_limit : Indexing RAM usage limit . Optional, default is 32MB.

max_iops: maximum i/o operations per second.

max_iosize: maximum allowed i/o operation size.

Setting Configuration File: Indexer Section

Page 35: SphinxSE with MySQL

Now in a Configuration File there are 4 section to configure which are as follows:

• Source (multiple)

• Index (multiple)

• Indexer

• Searchd

Page 36: SphinxSE with MySQL

address: IP address to bind on default 0.0.0.0 listens to all interfaces.

port : searchd TCP port number. (mandatory, default is 3312).

log : log file name. (optional, default is empty).

query_log : query log file name . (optional , default is empty).

pid file : searchd process ID file name (mandatory).

max_matches: maximum amount of matches that the daemon keep in RAM for each index and can return to the client. (optional, default 1000)

preopen_indexes: whether to forcibly preopen all indexes on startup.(optional , default 0 i.e. don’t open).

Setting Configuration File: Searchd Section

Page 37: SphinxSE with MySQL
Page 38: SphinxSE with MySQL
Page 39: SphinxSE with MySQL

Introduction to Sphinx .

Sphinx Searching and Sorting Features.

Sphinx Implementation.

Demo.

Page 40: SphinxSE with MySQL
Page 41: SphinxSE with MySQL