52
Using Sphinx for Search Mike Lively Slickdeals, LLC

Using Sphinx for Search in PHP

Embed Size (px)

DESCRIPTION

This is an intro to Sphinx and PHP. It will take you through the very basics of how Sphinx works, how you can set up an index, and using the mysql client to search your index. Then, it culminates in a quick little PHP script that builds a small search interface around your index. I will be posting the example code into my github account soon. This presentation was given to the LV PHP meetup on August 5th.

Citation preview

Page 1: Using Sphinx for Search in PHP

Using Sphinx for Search

Mike Lively Slickdeals, LLC

Page 2: Using Sphinx for Search in PHP

What is Sphinx?• A full-text search engine

• Quickly get high quality (relevant) results

• Designed to integrate well with SQL RDBMS

• Can work with any data source

• Can be queried using either an API or SQL

Page 3: Using Sphinx for Search in PHP

How do I know anything about Sphinx?

• Manager of Software Architecture for Slickdeals.net

• Alexa top 150 site (in the US)

• Have been working at improving our Sphinx search engine for the last 2 months or so.

• Over 7 Million searches a month directly through the interface, lots more happen indirectly.

Page 4: Using Sphinx for Search in PHP

When should I use Sphinx?

• Site / Product / Document searches

• Auto-suggest / Auto-Correct functionality

• Finding relevant and related items

Page 5: Using Sphinx for Search in PHP

Simple Architecture

• Often, search is offloaded straight to the database

• Search goes to the backend which performs queries on the database

• Obviously very easy to implement

Page 6: Using Sphinx for Search in PHP

Simple Architecture• Simple “starts with” searches

on indexed fields can sometimes work: `city` LIKE ‘Las%’

• Anything else will lock your database for writes with MyISAM.

• MySQL is not a great or flexible full text engine

• It can sometimes be adequate

Page 7: Using Sphinx for Search in PHP

Sphinx Architecture• Searchd is responsible for

receiving requests from clients and executing the searches against the sphinx index.

• Indexer is responsible for getting data into the sphinx index.

• This separation allows indexing and searching to be scaled separately.

Page 8: Using Sphinx for Search in PHP

Sphinx Architecture• Searchd has a binary protocol

for which there are several clients available in multiple languages.

• Searchd is also binary compatible with MySQL’s protocol since mysql 4.1

• Searchd is a daemon that runs on your search servers

Page 9: Using Sphinx for Search in PHP

Sphinx Architecture

• Indexer is a shell program that you can execute to build any number of indexes.

• Can handle index rotation for live indexing

Page 10: Using Sphinx for Search in PHP

Not So Quick Side NoteMySQL IS SLOWWWWWWWWWWWWW

(at text matches)

Page 11: Using Sphinx for Search in PHP

Still Not Quick Side NoteIndexes won’t help you…

Page 12: Using Sphinx for Search in PHP

Quicker Side NoteFull Text Search isn’t so bad

IF….

Page 13: Using Sphinx for Search in PHP

Sphinx Concepts

• Sphinx Indexes “Documents”

• Each document has a unique unsigned, non-zero integer ID (either 32 bit or 64 bit space)

• Each document has one or more fields

• Each document has zero or more attributes

Page 14: Using Sphinx for Search in PHP

Indexes / Sources• Sphinx indexes are created from one or more

sources.

• The source can be a database, xml, or tsv stream.

• You can use multiple sources

• This is useful for maintaining updated indexes

• Also used to implement a sphinx cluster

Page 15: Using Sphinx for Search in PHP

Sphinx Fields• Fields are what the full text index is comprised of.

• When searching you can search against any number of fields.

• You can assign different relevancy weights to different fields.

• The original value of a field is never stored by Sphinx.

• You should always have at least one.

Page 16: Using Sphinx for Search in PHP

Sphinx Attributes

• data that helps further describe the item being indexed

• Can be returned as a part of the search

• Useful for filtering and sorting results

• These are not a part of the full text index.

Page 17: Using Sphinx for Search in PHP

MySQL Full Text Search

• You can get away with MyISAM tables or as of version 5.6 InnoDB.

• You don’t care about morphology (think plurals)

• You don’t need anything but the most basic of search operators

Page 18: Using Sphinx for Search in PHP

Creating An Index

• We are going to add an index that sources a mysql database.

• The data being sourced is a list of the titles of wikipedia posts.

Page 19: Using Sphinx for Search in PHP

Creating An Index

Page 20: Using Sphinx for Search in PHP

Indexer Configuration

• We are going to be peaking into a sphinx configuration file now.

• You can rebuild the config file by concatenating each section into a single file.

• On my VM this file is located in /usr/local/etc/sphinx.conf

Page 21: Using Sphinx for Search in PHP

Source Definition

Page 22: Using Sphinx for Search in PHP

Source DefinitionDefines the connection information

Page 23: Using Sphinx for Search in PHP

Connection information

• Ideally, you should create a separate account for sphinx

• You can also connect via unix socket

• I didn’t specify it here, but you can also add a port.

Page 24: Using Sphinx for Search in PHP

Source DefinitionThe query that pulls data to populate the index

Page 25: Using Sphinx for Search in PHP

Source Index• The index query MUST return

the id field as the first column

• Remember, the id needs to be a unique, unsigned 64 bit (or less number)

• The query must be on a single line. Unless you escape new lines with back slashes.

• Notice that we converted the timestamp into a unix timestamp. That is important.

Page 26: Using Sphinx for Search in PHP

Source DefinitionHow data is stored in the index

Page 27: Using Sphinx for Search in PHP

Source Fields• The first column in the query is

always the ID.

• You specify any columns that are attributes.

• Remember, attributes are stored in the index as fields that can be used to filter and sort by.

• Any field besides the id that is not specified as an attribute, is assumed to be a text field (title)

Page 28: Using Sphinx for Search in PHP

Index Definition

Page 29: Using Sphinx for Search in PHP

Index Definition• An Index includes one or

more sources.

• Each source gets it’s own “source” line

• Multiple sources must all define the same fields and attributes.

• The ids need to be unique across resources

Page 30: Using Sphinx for Search in PHP

Index Definition• path is not actually a path, it’s

a filename with no extension.

• docinfo dictates if attributes are stored in the index or outside of the index.

• dict is not really important now. Used to be either crc or keywords. Now crc is deprecated.

• min_word_len is the minimum length of words to index

Page 31: Using Sphinx for Search in PHP

Rest of the Index Configuration

Page 32: Using Sphinx for Search in PHP

It’s time to build the indexindexer <index name>

Page 33: Using Sphinx for Search in PHP

Searching the Index

• searchd is the daemon that searches the index

• Binary ProtocolOR

• MySQL Compatible too!

Page 34: Using Sphinx for Search in PHP

searchd configIncluded in the same config file as the rest

Page 35: Using Sphinx for Search in PHP

Spinning up searchd

Page 36: Using Sphinx for Search in PHP

–Sphinx

“I know MySQL”

Page 37: Using Sphinx for Search in PHP

MySQL Compatible

Page 38: Using Sphinx for Search in PHP

MySQL Compatible

• Tables == Indexes

• SHOW TABLES…Shows indexes.

• Select * From <index> works too.

Page 39: Using Sphinx for Search in PHP

Selecting from an index

Page 40: Using Sphinx for Search in PHP

Querying Indexes

• Default limit of 20 rows

• Notice the text fields are not returned…

• They would be if we made them attributes (sql_field_string)

Page 41: Using Sphinx for Search in PHP

Querying Indexes

• The magic function in SphinxQL is match()

• match() performs a full text search against the entire index…usually

• The ‘@field’ operator can isolate which field is searched on.

Page 42: Using Sphinx for Search in PHP

Querying Indexes

• You can query against attributes

• You can sort results

• You can use the weight() function to determine relevancy.

Page 43: Using Sphinx for Search in PHP

Querying Indexes

• The 25387283 title was more relevant because it matched on the term “testing”

Page 44: Using Sphinx for Search in PHP

Getting PHP into the mix

• All we need? PDO.

• We will build a basic search page

• Accepts a query, displays up to 100 matching results by relevancy with the matching keywords highlighted.

Page 45: Using Sphinx for Search in PHP
Page 46: Using Sphinx for Search in PHP

Pulling data from Sphinx

Page 47: Using Sphinx for Search in PHP

Fetching the data from Mysql

Page 48: Using Sphinx for Search in PHP

Adding the fancy yellow highlighting

Page 49: Using Sphinx for Search in PHP

The rest is pretty basic…

Page 50: Using Sphinx for Search in PHP

Cool things we would talk about if I had like…3 more hours

• Auto-suggest, Auto-correct

• More on lemmatization and stemming

• Distributed Sphinx Clustering

• Delta indexes

• Real Time Indexes

• The plethora of operators you can use

• Ranged Queries

• ………

Page 51: Using Sphinx for Search in PHP

Additional Information

• The sphinx documentation is actually pretty great

• http://sphinxsearch.com/docs/

• Slides are already on Slideshare

• Will link them to the meet up shortly

Page 52: Using Sphinx for Search in PHP

Questions?