Installing Sphinx Search Server

I’ve always used MySQL fulltext indexes with a match query for keyword searches but I’ve never been happy with the results and lack of configuration choices. The limited nature of word interpretation and the boolean searches were useless when visitors didn’t know how to use search operators. Fulltext indexes require the MyISAM table structure – yet we generally prefer InnoDB because of performance and foreign key constraints.

For a recent project I decided to finally (long overdue, I know) make the the transition to an external search and indexing application called Sphinx.

Sphinx is a separate server that will index content from a variety of sources (in our case, MySQL) and provides an API that allows you to search the content more effectively.

If you’re using it with MySQL, you need to ensure that the mysql-devel package has been installed. Since this project was on CentOS, this is a simple yum call:

$ yum mysql-devel

Once that’s installed you’re ready to install Sphinx:

$ wget http://sphinxsearch.com/files/sphinx-1.10-beta.tar.gz
$ tar xzvf sphinx-1.10-beta.tar.gz
$ cd sphinx*
$ ./configure --prefix=/usr/local/sphinx
$ make
$ make install
$ sudo mkdir -p /var/data/sphinx

Once installed, you need to create a configuration file that will dictate where the data is indexed from, and how the search server will behave.

$ vi /usr/local/etc/sphinx.conf

Sphinx needs to know about two things – where to find the data for indexing, and how to index it. The source needs to point to a database or content source – in this case it points to our MySQL database.

source name_your_source
{
    type                            = mysql
    sql_host                        = 127.0.0.1
    sql_user                        = mysql_user
    sql_pass                        = mysql_pass
    sql_db                          = mysql_database
    sql_sock                        = /var/run/mysqld/mysqld.sock
    sql_port                        = 3306

    # indexer query
    # document_id MUST be the very first field
    # document_id MUST be positive (non-zero, non-negative)
    # document_id MUST fit into 32 bits
    # document_id MUST be unique
    sql_query = SELECT id, field1, field2 FROM ourtable;

    # document info query
    # ONLY used by search utility to display document information
    # MUST be able to fetch document info by its id, therefore
    # MUST contain '$id' macro
    sql_query_info  = SELECT * FROM ourtable WHERE id=$id
}
index name_your_index
{
    source                  = name_your_source
    path                    = /var/data/sphinx/name_your_source
    morphology              = stem_en
    min_word_len            = 3
    min_prefix_len          = 0
#    min_infix_len           = 3
}
searchd
{
	port				= 3312
	log					= /var/log/searchd/searchd.log
	query_log			= /var/log/searchd/query.log
	pid_file			= /var/log/searchd/searchd.pid
}

The first few lines clearly setup the database connection credentials. The sql_query statement is what pulls in the data for indexing. Everything you need indexed needs to be pulled with this query. It’s also possible to pull fields that won’t be indexed, but will actually be used for filtering the data. These are called attributes.

The index component configures the index itself and how searches will be processed. Features like the minimum word length, and morphology (how variations of the word are matched) are defined here.

The searchd component simply sets some configurations for the daemon itself.

Once configured, run the following command and Sphinx will build all of the indexes:

$ /usr/local/sphinx/bin/indexer --config /usr/local/etc/sphinx.conf --all 

If the configuration file is loaded properly and the database connection/queries work, you will see the indexer output:

using config file '/usr/local/etc/sphinx.conf'...
indexing index 'yournamedindex'...
collected 1421 docs, 0.1 MB
sorted 0.0 Mhits, 100.0% done
total 1421 docs, 75457 bytes
total 0.188 sec, 400587 bytes/sec, 7543.82 docs/sec
total 2 reads, 0.000 sec, 22.6 kb/call avg, 0.0 msec/call avg
total 6 writes, 0.000 sec, 17.1 kb/call avg, 0.0 msec/call avg

Once the index is complete, you can run a test search directly without starting the daemon:

/usr/local/sphinx/bin/search --config /usr/local/etc/sphinx.conf searchtermhere

You will see a dump of every matching document. Once you’ve got the search running, it’s time to start the daemon. First, we need to create a directory for the log files we asked for in the config:

sudo mkdir -p /var/log/searchd

Then, start the daemon:

/usr/local/sphinx/bin/searchd --config /usr/local/etc/sphinx.conf

Keep watching for more posts – I’ll discuss how to to work with the search from PHP, how to run incremental index updates, etc.

Possibly related posts:

  1. Missing mysql.sock on Mac OS X
  2. Installing APC Manually
  3. Installing Lighttpd on CentOS 5
  4. MySQL Fulltext Min Word Length
  5. Install Apache, MySQL, and PHP on OS X (10.5)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment