Transcript
Page 1: Using Sphinx for Search in PHP

Using Sphinx for Search

Mike Lively Slickdeals, LLC

Page 2: Using Sphinx for Search in PHP

What is Sphinx?• A full-text search engine

• Quickly get high quality (relevant) results

• Designed to integrate well with SQL RDBMS

• Can work with any data source

• Can be queried using either an API or SQL

Page 3: Using Sphinx for Search in PHP

How do I know anything about Sphinx?

• Manager of Software Architecture for Slickdeals.net

• Alexa top 150 site (in the US)

• Have been working at improving our Sphinx search engine for the last 2 months or so.

• Over 7 Million searches a month directly through the interface, lots more happen indirectly.

Page 4: Using Sphinx for Search in PHP

When should I use Sphinx?

• Site / Product / Document searches

• Auto-suggest / Auto-Correct functionality

• Finding relevant and related items

Page 5: Using Sphinx for Search in PHP

Simple Architecture

• Often, search is offloaded straight to the database

• Search goes to the backend which performs queries on the database

• Obviously very easy to implement

Page 6: Using Sphinx for Search in PHP

Simple Architecture• Simple “starts with” searches

on indexed fields can sometimes work: `city` LIKE ‘Las%’

• Anything else will lock your database for writes with MyISAM.

• MySQL is not a great or flexible full text engine

• It can sometimes be adequate

Page 7: Using Sphinx for Search in PHP

Sphinx Architecture• Searchd is responsible for

receiving requests from clients and executing the searches against the sphinx index.

• Indexer is responsible for getting data into the sphinx index.

• This separation allows indexing and searching to be scaled separately.

Page 8: Using Sphinx for Search in PHP

Sphinx Architecture• Searchd has a binary protocol

for which there are several clients available in multiple languages.

• Searchd is also binary compatible with MySQL’s protocol since mysql 4.1

• Searchd is a daemon that runs on your search servers

Page 9: Using Sphinx for Search in PHP

Sphinx Architecture

• Indexer is a shell program that you can execute to build any number of indexes.

• Can handle index rotation for live indexing

Page 10: Using Sphinx for Search in PHP

Not So Quick Side NoteMySQL IS SLOWWWWWWWWWWWWW

(at text matches)

Page 11: Using Sphinx for Search in PHP

Still Not Quick Side NoteIndexes won’t help you…

Page 12: Using Sphinx for Search in PHP

Quicker Side NoteFull Text Search isn’t so bad

IF….

Page 13: Using Sphinx for Search in PHP

Sphinx Concepts

• Sphinx Indexes “Documents”

• Each document has a unique unsigned, non-zero integer ID (either 32 bit or 64 bit space)

• Each document has one or more fields

• Each document has zero or more attributes

Page 14: Using Sphinx for Search in PHP

Indexes / Sources• Sphinx indexes are created from one or more

sources.

• The source can be a database, xml, or tsv stream.

• You can use multiple sources

• This is useful for maintaining updated indexes

• Also used to implement a sphinx cluster

Page 15: Using Sphinx for Search in PHP

Sphinx Fields• Fields are what the full text index is comprised of.

• When searching you can search against any number of fields.

• You can assign different relevancy weights to different fields.

• The original value of a field is never stored by Sphinx.

• You should always have at least one.

Page 16: Using Sphinx for Search in PHP

Sphinx Attributes

• data that helps further describe the item being indexed

• Can be returned as a part of the search

• Useful for filtering and sorting results

• These are not a part of the full text index.

Page 17: Using Sphinx for Search in PHP

MySQL Full Text Search

• You can get away with MyISAM tables or as of version 5.6 InnoDB.

• You don’t care about morphology (think plurals)

• You don’t need anything but the most basic of search operators

Page 18: Using Sphinx for Search in PHP

Creating An Index

• We are going to add an index that sources a mysql database.

• The data being sourced is a list of the titles of wikipedia posts.

Page 19: Using Sphinx for Search in PHP

Creating An Index

Page 20: Using Sphinx for Search in PHP

Indexer Configuration

• We are going to be peaking into a sphinx configuration file now.

• You can rebuild the config file by concatenating each section into a single file.

• On my VM this file is located in /usr/local/etc/sphinx.conf

Page 21: Using Sphinx for Search in PHP

Source Definition

Page 22: Using Sphinx for Search in PHP

Source DefinitionDefines the connection information

Page 23: Using Sphinx for Search in PHP

Connection information

• Ideally, you should create a separate account for sphinx

• You can also connect via unix socket

• I didn’t specify it here, but you can also add a port.

Page 24: Using Sphinx for Search in PHP

Source DefinitionThe query that pulls data to populate the index

Page 25: Using Sphinx for Search in PHP

Source Index• The index query MUST return

the id field as the first column

• Remember, the id needs to be a unique, unsigned 64 bit (or less number)

• The query must be on a single line. Unless you escape new lines with back slashes.

• Notice that we converted the timestamp into a unix timestamp. That is important.

Page 26: Using Sphinx for Search in PHP

Source DefinitionHow data is stored in the index

Page 27: Using Sphinx for Search in PHP

Source Fields• The first column in the query is

always the ID.

• You specify any columns that are attributes.

• Remember, attributes are stored in the index as fields that can be used to filter and sort by.

• Any field besides the id that is not specified as an attribute, is assumed to be a text field (title)

Page 28: Using Sphinx for Search in PHP

Index Definition

Page 29: Using Sphinx for Search in PHP

Index Definition• An Index includes one or

more sources.

• Each source gets it’s own “source” line

• Multiple sources must all define the same fields and attributes.

• The ids need to be unique across resources

Page 30: Using Sphinx for Search in PHP

Index Definition• path is not actually a path, it’s

a filename with no extension.

• docinfo dictates if attributes are stored in the index or outside of the index.

• dict is not really important now. Used to be either crc or keywords. Now crc is deprecated.

• min_word_len is the minimum length of words to index

Page 31: Using Sphinx for Search in PHP

Rest of the Index Configuration

Page 32: Using Sphinx for Search in PHP

It’s time to build the indexindexer <index name>

Page 33: Using Sphinx for Search in PHP

Searching the Index

• searchd is the daemon that searches the index

• Binary ProtocolOR

• MySQL Compatible too!

Page 34: Using Sphinx for Search in PHP

searchd configIncluded in the same config file as the rest

Page 35: Using Sphinx for Search in PHP

Spinning up searchd

Page 36: Using Sphinx for Search in PHP

–Sphinx

“I know MySQL”

Page 37: Using Sphinx for Search in PHP

MySQL Compatible

Page 38: Using Sphinx for Search in PHP

MySQL Compatible

• Tables == Indexes

• SHOW TABLES…Shows indexes.

• Select * From <index> works too.

Page 39: Using Sphinx for Search in PHP

Selecting from an index

Page 40: Using Sphinx for Search in PHP

Querying Indexes

• Default limit of 20 rows

• Notice the text fields are not returned…

• They would be if we made them attributes (sql_field_string)

Page 41: Using Sphinx for Search in PHP

Querying Indexes

• The magic function in SphinxQL is match()

• match() performs a full text search against the entire index…usually

• The ‘@field’ operator can isolate which field is searched on.

Page 42: Using Sphinx for Search in PHP

Querying Indexes

• You can query against attributes

• You can sort results

• You can use the weight() function to determine relevancy.

Page 43: Using Sphinx for Search in PHP

Querying Indexes

• The 25387283 title was more relevant because it matched on the term “testing”

Page 44: Using Sphinx for Search in PHP

Getting PHP into the mix

• All we need? PDO.

• We will build a basic search page

• Accepts a query, displays up to 100 matching results by relevancy with the matching keywords highlighted.

Page 45: Using Sphinx for Search in PHP
Page 46: Using Sphinx for Search in PHP

Pulling data from Sphinx

Page 47: Using Sphinx for Search in PHP

Fetching the data from Mysql

Page 48: Using Sphinx for Search in PHP

Adding the fancy yellow highlighting

Page 49: Using Sphinx for Search in PHP

The rest is pretty basic…

Page 50: Using Sphinx for Search in PHP

Cool things we would talk about if I had like…3 more hours

• Auto-suggest, Auto-correct

• More on lemmatization and stemming

• Distributed Sphinx Clustering

• Delta indexes

• Real Time Indexes

• The plethora of operators you can use

• Ranged Queries

• ………

Page 51: Using Sphinx for Search in PHP

Additional Information

• The sphinx documentation is actually pretty great

• http://sphinxsearch.com/docs/

• Slides are already on Slideshare

• Will link them to the meet up shortly

Page 52: Using Sphinx for Search in PHP

Questions?


Recommended