41
Introducing high performance photo gallery Remigijus Kiminas 2010-12-25 v6

Hppg r819 gallery presentation, search by color introduced

Embed Size (px)

Citation preview

Page 1: Hppg r819 gallery presentation, search by color introduced

Introducing high performance photo gallery

Remigijus Kiminas2010-12-25v6

Page 2: Hppg r819 gallery presentation, search by color introduced

Who I am?

Author of

http://livehelperchat.com/

http://redmine.remdex.info my projects :)

Currently working

http://www.coralsolutions.com/

Freelancing and building open-source software in free time

Page 3: Hppg r819 gallery presentation, search by color introduced

Purpose of the presentation 1

Present some architecture decisions witch were applied building image gallery

Page 4: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation

Mobile devices get support

Image gallery can be used as shopping CMS

Credit's based buying

Checkout using paypal service

Uncached pages get speed improvement by finding bug in paginator.

Official ngnix support

Page 5: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 2

Extensions

Kernel modules override

Kernel classes override

CSS compile

Most popular images in 24 hours

Photo approvement functionality

Image filtering by resolution

Page 6: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 3

Thumbnails recreation script

100% duplicates management accuracy

More configurable system aspects as:

Max upload photo size

Max archive size

Max file queue size

Animated gif support

Page 7: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 4

Animated gif support

Completely fixed AJAX navigation usability, no more confusing of available images to left or to right.

Front end design remake, thanks to http://pauliusc.lt

HTML output compression

HTML 5 frontend changes, saves bandwidth

Page 8: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 5

Some performance improvement regarding users permissions settings

More things moved to Memcached service

Page 9: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 5 V4

Sort by relevance was introduced

AddQuery usage implementation in search

Refactored search page. One query less now.

Paginator updates

Sphinx wildcard support

Images without original deletion script

SEO enchancement related to resolution and user current page

Page 10: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 5 V5

Refactored captcha, it's now AJAX/javacript based, performs well, plus saves one request on image preview window

Image preview full window cache!!! cached windows is as fast as cached pagination around 5ms

Image counter from log file, avoid insert on each image preview window

Page 11: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 5 V5

Last rated functionality

Cache status window

Recently top rated, in 24 hours

APC support as cache engine.

HTML5, SWF, FLV files support

Search suggest feature

Page 12: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation 5 V5

Mysql query hint for album pagination, mysql planner choosed wrong indexes

Smart selects in image preview window

Full multilanguage support including translatable module URL!!! none of my known gallery/cms has this featyre. E.x gallery/search (engish) or gallerie/recherche (french)

Full InnoDB support. Performs well as MyISAM. Top process is PHP not Mysql :)

Page 13: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation V6

Search by color, multicolor, and keyword at the same time.

For best performance this feature uses MySQL partitions. Biggest table has around 8M records for 270 000 images.

Multicolor search uses self inner joins. Regarding performance memory table can be activated.

http://code.google.com/p/hppg/wiki/SearchByColor

Page 14: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation V6

Sphinx can be used as search by color handler also.$cl->SetMatchMode( SPH_MATCH_EXTENDED2);

$cl->SetRankingMode(SPH_RANK_WORDCOUNT);

Much faster than MySQL layer. Also pays attention to keyword density. Results are almost the same as MySQL layer.

Page 15: Hppg r819 gallery presentation, search by color introduced

What's new since last presentation V6

Custom color_indexer was writeln using opencv library. Yes I know a little bit C :)

Gives 24x performance boost compared to standard method using php and mysql.

http://code.google.com/p/hppg/wiki/ColorIndexer

Page 16: Hppg r819 gallery presentation, search by color introduced

How does search by color works?

Some reference firsthttp://opencv.willowgarage.com/wiki/

This library was used for writing color_indexer application

http://mattmueller.me/blog/creating-piximilar-image-search-by-color

There I got my inspiration and basic concept. Either database structure is completely different.

http://www.compuphase.com/cmetric.htm Formula for calculating similar color to our pallete

http://en.wikipedia.org/wiki/Tag_cloud Formula for representing color density in image sphinx table

Page 17: Hppg r819 gallery presentation, search by color introduced

Database structure 1Two tables

Pallete table

CREATE TABLE IF NOT EXISTS `lh_gallery_pallete` ( `id` int(11) NOT NULL AUTO_INCREMENT, `red` int(11) NOT NULL DEFAULT '0', `green` int(11) NOT NULL DEFAULT '0', `blue` int(11) NOT NULL DEFAULT '0', `position` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`id`), KEY `position` (`position`)) ENGINE=MyISAM;

Images statistic table

CREATE TABLE IF NOT EXISTS `lh_gallery_pallete_images` ( `pid` int(11) NOT NULL, `pallete_id` smallint(3) NOT NULL, `count` smallint(5) NOT NULL, PRIMARY KEY (`pallete_id`,`pid`), KEY `pid` (`pallete_id`,`count`,`pid`), KEY `pallete_id` (`pallete_id`), KEY `pid_2` (`pid`)) ENGINE=MyISAM ;

Page 18: Hppg r819 gallery presentation, search by color introduced

Database structure 2

Table for quick fetch of image top colors:CREATE TABLE IF NOT EXISTS `lh_gallery_pallete_images_stats` (

`pid` int(11) NOT NULL, `colors` varchar(100) NOT NULL, PRIMARY KEY (`pid`)) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Page 19: Hppg r819 gallery presentation, search by color introduced

Filling sphinx index

Sphinx index table has dedicated field “colors” witch is filled in the following way. 15400 is our thumbnail size 120x130 – maximum color matches

/** * This part was changed based on formula * * It fits here better than just log * * http://en.wikipedia.org/wiki/Tag_cloud * */ $max = 15400/2; // A little better distribution of color $min = 25; $rmax = 50; $rmin = 1;

$colorIndex = array();

foreach ($colorsMaximumImage as $color) { $colorIndexString = trim(str_repeat(' pld'.$color['pallete_id'],round((($rmin*($color['count']-25))/($max-$min))*100))); if ($colorIndexString != '') $colorIndex[] = $colorIndexString; }

Page 20: Hppg r819 gallery presentation, search by color introduced

Searching by colorTwo options as I wrote earlier:

Use MySQL as search engine

Advantages – activated by default, works faster than sphinx with single color search

Disantavtages – works slowly then more than one color filter is used

Sphinx as search engine

Advantages – performance stays the same with one or multiple colors

Disatvantages – need to install sphinx, works a little bit slower with one color filter than MySQL.

Recomendation?

Definitely use Sphinx for color search.

Page 21: Hppg r819 gallery presentation, search by color introduced

Future works V5 (implemented)

Pagination sharding with index filter shard table.It should boost large sets of pagination around 100% > and keep constant speed with millions of photos.

http://remdex.info/Optimising-mysql-limit-performance-99a.html

Backend redesign

Page 22: Hppg r819 gallery presentation, search by color introduced

Issues with previous image gallery's I had

A lot of users = a lot of problemsNo caching support

Unoptimized SQL query's

Resource hungry

No framework used (well, perhaps this is not a problem, but most of the time they just duplicate frameworks functionality, reinventing the wheel...)

No Etag based caching, bandwidth saver...

Page 23: Hppg r819 gallery presentation, search by color introduced

Requirements

Optimized SQL queries

Fulltext search engine

Etag based caching

SQL querys caching

Fullpage caching

Low resource requirements

Page 24: Hppg r819 gallery presentation, search by color introduced

Adopted software

APC – opcode cache for PHP

Sphinx – free open-source SQL full-text search engine (http://sphinxsearch.com/)

Memcached – free & open source, high-performance, distributed memory object caching system (http://memcached.org/)

eZ Components – an enterprise-ready, general-purpose PHP library of components used independently or together for PHP application development.(http://ez.no/ezcomponents)

JQuery – is a fast and concise JavaScript Library that simplifies HTML document traversing, event handling, animating, and Ajax interactions for rapid web development. (http://jquery.com/)

Lighttpd – lightweight open-source web server.(http://www.lighttpd.net/)

Mysql – database engine(http://www.mysql.com)

Page 25: Hppg r819 gallery presentation, search by color introduced

Adopted software

Ngnix - A HTTP and mail proxy server licensed under a 2-clause BSD-like license. (http://nginx.org/)

Fully working ngnix config provided. For eshop requirements and standard

Page 26: Hppg r819 gallery presentation, search by color introduced

Building process – core

Gallery core is based on eZ Components. Used components:

Authentication

Configuration

Database

Feed

ImageAnalysis

ImageConversion

PersistentObject

Translation

Cache

Url

UserInput

Page 27: Hppg r819 gallery presentation, search by color introduced

Fulltext search implementation

Why sphinx?

Very very fast :)

Used features of 9.9

SetSelect – this feature was introduced in 9.9 version and allowed to make fancy filtering.

Example in next slide

Page 28: Hppg r819 gallery presentation, search by color introduced

Image full mode problem with previous and next image

Search condition in literal. I need to find 2 previous images based on current image position including search keyword, sorting mode.

URL consists of

Current image ID (16679)

Keyword (haposai)

Sort mode (popular)

How do I find out what should I display in two first thumbnails (middle image is current our image)?

Page 29: Hppg r819 gallery presentation, search by color introduced

Solution

Use SetSelect query$cl->SetSelect ( "*, (hits > '.$Image->hits.' OR (hits = '.$Image->hits.' AND pid > '.$Image->pid.')) AS myfilter" );$cl->SetFilter ( "myfilter", array(1) );

Things I do not know how to do till now. If sorting is based on relevance how to now previous two images.

I know now. But:

SetSelect does not work with @weight attributes in it.

Had to use two query's. SetFilter() works with @weight

AddQuery comes in help here for perfromance. Mutch more relevance images now.

Page 30: Hppg r819 gallery presentation, search by color introduced

Some search statistic

Each day around 190 K querys. It were more if search result page were not be cached :)

Page 31: Hppg r819 gallery presentation, search by color introduced

Mysql performance tweaking

Just optimise querys (EXPLAIN is you friend)

Not a single slow query

Some tips:

With large data sets useSELECT * FROM `lh_gallery_images`

INNER JOIN ( SELECT pid FROM lh_gallery_images ORDER BY comtime DESC, pid DESC LIMIT 20 OFFSET 20 ) AS items

ON lh_gallery_images.pid = items.pid

This query is at least 5x times faster than normal select. Tested with (150 K records.)

See - http://www.mysqlperformanceblog.com

Page 32: Hppg r819 gallery presentation, search by color introduced

Supported HTTP servers

Lighttpd

Apache

Ngnix

With ngnix managed to produce around 1200 Q/S on cached page. It's 30% more than with Lighttpd.

Page 33: Hppg r819 gallery presentation, search by color introduced

Caching objects

Version cachinghttp://www.bestechvideos.com/2009/03/21/railslab-scaling-rails-episode-8-memcached

http://www.infoq.com/presentations/lutke-rockstar-memcaching

Version cache were used in

Album pages

Last uploaded

Last hits

Popular images and so on.

The most popular images in 24 hours

Then cache is cleared?

It's not, only version number is increased, and automatic cache self expire, because cache key does not exists.

Page 34: Hppg r819 gallery presentation, search by color introduced

Some code with version cache

Cache Key calculation in Album$cache = CSCacheAPC::getMem();

$cacheKey = md5('version_'.$cache->getCacheVersion('album_'.(int)$Params['user_parameters']['album_id']).$mode.'album_view_url'.(int)$Params['user_parameters']['album_id'].'_page_'.$Params['user_parameters_unordered']['page']);

Includes:

Album version

$mode – sorting mode (Ex. Popular)

Page

this combination gives unique cache version for each page.

Same logic applies to all listing pages

Page 35: Hppg r819 gallery presentation, search by color introduced

Some benchmarks[root@ks310613 ~]# ab -n 500 -c 10 http://animeonly.org/Fantasy/Mix-16a.htmlThis is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking animeonly.org (be patient)Completed 100 requestsCompleted 200 requestsCompleted 300 requestsCompleted 400 requestsFinished 500 requests

Server Software: lighttpdServer Hostname: animeonly.orgServer Port: 80

Document Path: /Fantasy/Mix-16a.htmlDocument Length: 26883 bytes

Concurrency Level: 10Time taken for tests: 0.545137 secondsComplete requests: 500Failed requests: 0Write errors: 0Total transferred: 13593092 bytesHTML transferred: 13441500 bytesRequests per second: 917.20 [#/sec] (mean)Time per request: 10.903 [ms] (mean)Time per request: 1.090 [ms] (mean, across all concurrent requests)Transfer rate: 24349.84 [Kbytes/sec] received

Connection Times (ms) min mean[+/-sd] median maxConnect: 0 0 0.0 0 0Processing: 5 10 2.9 9 23Waiting: 4 9 3.1 9 23Total: 5 10 2.9 9 23

Percentage of the requests served within a certain time (ms) 50% 9 66% 12 75% 13 80% 13 90% 13 95% 13 98% 20 99% 22 100% 23 (longest request)

Page 36: Hppg r819 gallery presentation, search by color introduced

Etag base caching

What is it?

An ETag (entity tag) is part of HTTP, the protocol for the World Wide Web. It is a response header that may be returned by an HTTP/1.1 compliant web server and is used to determine change in content at a given URL (http://en.wikipedia.org/wiki/HTTP_ETag)

Page 37: Hppg r819 gallery presentation, search by color introduced

How to use it?

$ExpireTime = 3600;$currentKeyEtag = md5($cacheKey.'user_id_'.erLhcoreClassUser::instance()->getUserID());;header('Cache-Control: max-age=' . $ExpireTime); // must-revalidateheader('Expires: '.gmdate('D, d M Y H:i:s', time()+$ExpireTime).' GMT');header('ETag: ' . $currentKeyEtag);

$iftag = isset($_SERVER['HTTP_IF_NONE_MATCH']) ? $_SERVER['HTTP_IF_NONE_MATCH'] == $currentKeyEtag : null;

if ($iftag === true){ header ("HTTP/1.0 304 Not Modified"); header ('Content-Length: 0'); exit;}

$cacheKey – from previous example cache keyUser ID is needed if user is logged in.Can be used for custom pages, that do not changeThen image is uploaded or deleted, we just increase cache version and Etag is expired automatic also.

Page 38: Hppg r819 gallery presentation, search by color introduced

Some MRTG screen shots 1

Hits per hour

Mysql queries

Page 39: Hppg r819 gallery presentation, search by color introduced

Some MRTG screen shots 2

Memcached status

Traffic stats

Page 40: Hppg r819 gallery presentation, search by color introduced

Conclusions

Single server with sphinx, memcached, mysql, nginx handles per day around 180 K pageviews daily.

No performance issues at this time.

Gallery home page

http://code.google.com/p/hppg/

Page 41: Hppg r819 gallery presentation, search by color introduced

Thank you for your attention :)

Questions etc:

[email protected]