35
Sphinx - High performance full- text search for MySQL Nguyen Van Vuong - Framgia

Sphinx - High performance full-text search for MySQL

  • Upload
    cau-chu

  • View
    451

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Sphinx - High performance full-text search for MySQL

Sphinx - High performance full-text search for MySQL

Nguyen Van Vuong - Framgia

Page 2: Sphinx - High performance full-text search for MySQL

Agenda

❖ Full-text search❖ What’s Sphinx ?❖ Why Sphinx ?❖ Sphinx workflow

➢ Indexing➢ Searching➢ Query syntax

❖ How does it scale ?❖ More about Sphinx❖ References

2

Page 3: Sphinx - High performance full-text search for MySQL

Full-text search

3

Page 4: Sphinx - High performance full-text search for MySQL

Full-text search

❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query

Articles

id (integer) title (varchar) content (text) tag (varchar)

4

❖ Example

Page 5: Sphinx - High performance full-text search for MySQL

Full-text search

❖ Full-text search is one of the techniques for searching a document or database stored➢ Examines all of the words➢ Tries to match the search query

5

❖ ExampleSELECT * FROM articles

WHERE MATCH (title, content) AGAINST ('database' IN NATURAL LANGUAGE MODE)

Page 6: Sphinx - High performance full-text search for MySQL

Full-text search - Term search vs Full-text search

❖ Search keywords: “I ate pizza yesterday”❖ Term search

➢ No analysis phase➢ Operate on a single term

6

Page 7: Sphinx - High performance full-text search for MySQL

Full-text search - Term search vs Full-text search

❖ Full-text search➢ Tokenizer/analyzer

■ Breaking keywords down by whitespace and punctuation

■ Charset table➢ Morphology preprocessors

■ Normalize both "dogs" and "dog" to "dog"● Eat, eating, eaten, ate 7

Page 8: Sphinx - High performance full-text search for MySQL

What’s Sphinx ?

8

Page 9: Sphinx - High performance full-text search for MySQL

What’s Sphinx ?

❖ Sphinx is a mythical creature with the head of a human and the body of a lion

9

Page 10: Sphinx - High performance full-text search for MySQL

What’s Sphinx ?

❖ Sphinx is a mythical creature with the head of a human and the body of a lion

10

Page 11: Sphinx - High performance full-text search for MySQL

What’s Sphinx ?

❖ Full-text search engine❖ Free open source (GPL v2)❖ Begin 10 years ago❖ High performance❖ Integrate well with SQL databases❖ API exist for Perl, C#, Ruby, Java, PHP❖ Available for Linux, Windows, Mac OS

11

Page 12: Sphinx - High performance full-text search for MySQL

Why Sphinx ?

12

Page 13: Sphinx - High performance full-text search for MySQL

Why sphinx ?

❖ Quick to learn

❖ Easy to use

❖ Simple to maintain

13

Page 14: Sphinx - High performance full-text search for MySQL

Why sphinx ?

❖ Speed➢ 50x-100x faster than MySQL Fulltext➢ Up to 1000x faster than MySQL in extreme cases

(eg. large result set with GROUP BY)

❖ Feature-rich➢ Relevancy (BM25)➢ Synonyms➢ Stopwords➢ Real-time index➢ ... 14

Page 15: Sphinx - High performance full-text search for MySQL

Why sphinx ?

❖ Scalable➢ Aggregates search results from many sources➢ Fully transparent to calling application➢ Built-in load balancing

❖ Easy to Integrate➢ SphinxApi➢ SphinxSQL

15

Page 16: Sphinx - High performance full-text search for MySQL

Sphinx workflow

16

Page 17: Sphinx - High performance full-text search for MySQL

Spinx workflow

17

Application

Database

Sphinx Daemon

Sphinx Indexer Sphinx Index

1. Search query

2. Search results (IDs)

3. F

etch

doc

by

ID

Page 18: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Indexing

❖ Configuration➢ sphinx.conf

❖ Data sources

18

Page 19: Sphinx - High performance full-text search for MySQL

❖ Character level➢ Charset_table

■ Use ranges: a...z, U+410...U+42F➢ Ngram_chars

■ Hieroglyphs as separate tokens● Chinese, Japanese, …● Unicode charset CJKV

Sphinx workflow - Indexing

19

Page 20: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Indexing

❖ Word level➢ Stopwords

■ Avoid wasting index space■ Example

● Don’t want to search for (like “I”, “Am”, “An”, etc)

➢ Stemming■ Single word can appear in many forms when

used in different contexts20

Page 21: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Indexing

❖ Building index

21

$ sudo service sphinxsearch start

$ sudo indexer --config <file> --all

$ sudo indexer --config <file> --rotate

Page 22: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Searching

❖ Configuring search daemon

22

searchd {listen =

localhost:9312listen =

9306:mysqllog =

/var/log/sphinxsearch/searchd.logquery_log =

/var/log/sphinxsearch/query.logread_timeout = 5client_timeout = 300max_children = 30persistent_connections_limit = 30pid_file =

/var/run/sphinxsearch/searchd.pid...

}

Page 23: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Searching

❖ Sphinx Api➢ Perl, C#, Ruby, Java, PHP➢ Example in PHP

23

Page 24: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Searching

❖ SphinxQL➢ Connect via MySQL Client

➢ Query like MySQL

24

$ mysql -h<ip> -P<port_of_sphinx>

SELECT * FROM myindex

WHERE MATCH ('@(title,content) find me fast');

Page 25: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Searching

❖ SphinxQL➢ Connect via MySQL Client

25

Page 26: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Query syntax

❖ Boolean search AND OR NOT: hello | world hello & world hello -world

❖ Per-field search@title hello, @body world

❖ Field combination@(title, body) hello world

❖ Search within first N words@body[50] hello

❖ Phrase search“hello world”

26

Page 27: Sphinx - High performance full-text search for MySQL

Sphinx workflow - Query syntax

27

❖ Per field relevancy ranking weightsSPH_MATCH_ALLSPH_MATCH_ANYSPH_MATCH_FULLSCAN

❖ Proximity search"people passion"~3

❖ GEO distance search (with syntax for mi/km/m)GEODIST(0.659298124, -2.136602399, latitude,

longitude)

Page 28: Sphinx - High performance full-text search for MySQL

How does it scale ?

28

Page 29: Sphinx - High performance full-text search for MySQL

How does it scale ?

❖ Distribution is done horizontally➢ Search is performed across different nodes

❖ Set up an index on multiple servers

29

Page 30: Sphinx - High performance full-text search for MySQL

How does it scale ?

❖ Adding distributed index configuration➢ First server (192.168.1.1)

30

index master{

type = distributed# Local index to be searchedlocal = items# Remote agent (index) to be searchedagent = 192.168.1.2:9312:items-2

}

Page 31: Sphinx - High performance full-text search for MySQL

More about sphinx

31

Page 32: Sphinx - High performance full-text search for MySQL

More about Sphinx

❖ Biggest known Sphinx cluster➢ Indexes 25+ billion

documents➢ Over 9TB of data➢ 1+ million

searches/day

32

❖ Busiest known Sphinx cluster➢ 300+ million search

queries/day.

❖ Books

Page 33: Sphinx - High performance full-text search for MySQL

References

❖ Sphinx document (v2.2.1)❖ Sphinx Search Beginner's Guide - Abbas Ali❖ Meet the Sphinx - Andrew Aksyonoff❖ Advanced fulltext search with Sphinx - Adrian

Nuta❖ Search Big Data with MySQL and Sphinx -

Mindaugas Zukas

33

Page 34: Sphinx - High performance full-text search for MySQL

34

Thank you

Page 35: Sphinx - High performance full-text search for MySQL

Time for action

35

https://github.com/euclid1990/php-sphinx-search