Upload
cau-chu
View
451
Download
2
Embed Size (px)
Citation preview
Sphinx - High performance full-text search for MySQL
Nguyen Van Vuong - Framgia
Agenda
❖ Full-text search❖ What’s Sphinx ?❖ Why Sphinx ?❖ Sphinx workflow
➢ Indexing➢ Searching➢ Query syntax
❖ How does it scale ?❖ More about Sphinx❖ References
2
Full-text search
3
Full-text search
❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query
Articles
id (integer) title (varchar) content (text) tag (varchar)
4
❖ Example
Full-text search
❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query
5
❖ ExampleSELECT * FROM articles
WHERE MATCH (title, content) AGAINST ('database' IN NATURAL LANGUAGE MODE)
Full-text search - Term search vs Full-text search
❖ Search keywords: “I ate pizza yesterday”❖ Term search
➢ No analysis phase➢ Operate on a single term
6
Full-text search - Term search vs Full-text search
❖ Full-text search➢ Tokenizer/analyzer
■ Breaking keywords down by whitespace and punctuation
■ Charset table➢ Morphology preprocessors
■ Normalize both "dogs" and "dog" to "dog"● Eat, eating, eaten, ate 7
What’s Sphinx ?
8
What’s Sphinx ?
❖ Sphinx is a mythical creature with the head of a human and the body of a lion
9
What’s Sphinx ?
❖ Sphinx is a mythical creature with the head of a human and the body of a lion
10
What’s Sphinx ?
❖ Full-text search engine❖ Free open source (GPL v2)❖ Begin 10 years ago❖ High performance❖ Integrate well with SQL databases❖ API exist for Perl, C#, Ruby, Java, PHP❖ Available for Linux, Windows, Mac OS
11
Why Sphinx ?
12
Why sphinx ?
❖ Quick to learn
❖ Easy to use
❖ Simple to maintain
13
Why sphinx ?
❖ Speed➢ 50x-100x faster than MySQL Fulltext➢ Up to 1000x faster than MySQL in extreme cases
(eg. large result set with GROUP BY)
❖ Feature-rich➢ Relevancy (BM25)➢ Synonyms➢ Stopwords➢ Real-time index➢ ... 14
Why sphinx ?
❖ Scalable➢ Aggregates search results from many sources➢ Fully transparent to calling application➢ Built-in load balancing
❖ Easy to Integrate➢ SphinxApi➢ SphinxSQL
15
Sphinx workflow
16
Spinx workflow
17
Application
Database
Sphinx Daemon
Sphinx Indexer Sphinx Index
1. Search query
2. Search results (IDs)
3. F
etch
doc
by
ID
Sphinx workflow - Indexing
❖ Configuration➢ sphinx.conf
❖ Data sources
18
❖ Character level➢ Charset_table
■ Use ranges: a...z, U+410...U+42F➢ Ngram_chars
■ Hieroglyphs as separate tokens● Chinese, Japanese, …● Unicode charset CJKV
Sphinx workflow - Indexing
19
Sphinx workflow - Indexing
❖ Word level➢ Stopwords
■ Avoid wasting index space■ Example
● Don’t want to search for (like “I”, “Am”, “An”, etc)
➢ Stemming■ Single word can appear in many forms when
used in different contexts20
Sphinx workflow - Indexing
❖ Building index
21
$ sudo service sphinxsearch start
$ sudo indexer --config <file> --all
$ sudo indexer --config <file> --rotate
Sphinx workflow - Searching
❖ Configuring search daemon
22
searchd {listen =
localhost:9312listen =
9306:mysqllog =
/var/log/sphinxsearch/searchd.logquery_log =
/var/log/sphinxsearch/query.logread_timeout = 5client_timeout = 300max_children = 30persistent_connections_limit = 30pid_file =
/var/run/sphinxsearch/searchd.pid...
}
Sphinx workflow - Searching
❖ Sphinx Api➢ Perl, C#, Ruby, Java, PHP➢ Example in PHP
23
Sphinx workflow - Searching
❖ SphinxQL➢ Connect via MySQL Client
➢ Query like MySQL
24
$ mysql -h<ip> -P<port_of_sphinx>
SELECT * FROM myindex
WHERE MATCH ('@(title,content) find me fast');
Sphinx workflow - Searching
❖ SphinxQL➢ Connect via MySQL Client
25
Sphinx workflow - Query syntax
❖ Boolean search AND OR NOT: hello | world hello & world hello -world
❖ Per-field search@title hello, @body world
❖ Field combination@(title, body) hello world
❖ Search within first N words@body[50] hello
❖ Phrase search“hello world”
26
Sphinx workflow - Query syntax
27
❖ Per field relevancy ranking weightsSPH_MATCH_ALLSPH_MATCH_ANYSPH_MATCH_FULLSCAN
❖ Proximity search"people passion"~3
❖ GEO distance search (with syntax for mi/km/m)GEODIST(0.659298124, -2.136602399, latitude,
longitude)
How does it scale ?
28
How does it scale ?
❖ Distribution is done horizontally➢ Search is performed across different nodes
❖ Set up an index on multiple servers
29
How does it scale ?
❖ Adding distributed index configuration➢ First server (192.168.1.1)
30
index master{
type = distributed# Local index to be searchedlocal = items# Remote agent (index) to be searchedagent = 192.168.1.2:9312:items-2
}
More about sphinx
31
More about Sphinx
❖ Biggest known Sphinx cluster➢ Indexes 25+ billion
documents➢ Over 9TB of data➢ 1+ million
searches/day
32
❖ Busiest known Sphinx cluster➢ 300+ million search
queries/day.
❖ Books
References
❖ Sphinx document (v2.2.1)❖ Sphinx Search Beginner's Guide - Abbas Ali❖ Meet the Sphinx - Andrew Aksyonoff❖ Advanced fulltext search with Sphinx - Adrian
Nuta❖ Search Big Data with MySQL and Sphinx -
Mindaugas Zukas
33
34
Thank you
Time for action
35
⬇
https://github.com/euclid1990/php-sphinx-search