Gearman and asynchronous processing in PHP applications

Preview:

DESCRIPTION

Presentation at BarcampSaigon 2010

Citation preview

Gearman and asynchronous processing in PHP applications

Pham Cong Dinh (a.k.a pcdinh)@pcdinh on Twitter

BarCampSaiGon 2010

Skunkworks@teamskunkworks on Twitter

2

The aim of my talk

Discuss about a solution that helps

scale

your high traffic

PHP web applications

3

Introduction

PHP developer since 2002. 8 years in PHP development and counting

Presenter at Hanoi PHP Day in 2008, 2009 Founder and maintainer of PHPVietnam mailing list (Google

Group) since 2004 Very interested in Linux, server farm, big data, database,

distributed processing, scalability, high performance web systems

Involved in clip.vn development at Vega Corporation 1 year ago

Software developer at Skunkworks

4

Agenda

Challenges in developing large scale PHP applications for high traffic web sites

Resolve the challenge: How to distribute workload

Gearman: an open source high performance job server

Develop PHP clients and workers

Challenges in managing workers – a case study of Gearman Agent Manager

5

What is large scale? How high is high traffic?

Challenges in developing large scale PHP applications for high traffic web sites (1)

6

Large Scale?

Challenges in developing large scale PHP applications for high traffic web sites (2)

Traffic

Data graph

Storage

Code base

Development team

7

Typical challenges: limitation of resources

CPU Disk speed Memory Bandwidth: router, NIC Architecture: application and system

Challenges in developing large scale PHP applications for high traffic web sites (3)

8

Major challenges

No preparation for growth No idea on how to scale your application at a certain extent No in-depth understanding of your system No proper system capacity monitoring Lack of proper skills

Challenges in developing large scale PHP applications for high traffic web sites (4)

9

Our challenge today

Resolve the challenge: How to distribute workload (1)

TOO MUCH WORKLOAD FOR A SINGLE SERVER

10

Many solutions

Load balancing: Hardware: F5, Cisco Content Services Switch Software: Bind, LVS, HAProxy, Varnish ...

Precalculate data Multi-tier application architecture

Resolve the challenge: How to distribute workload (2)

11

Our solution today

Queue up the workload Categorize workload pattern Optimize processing model, security Job server

Resolve the challenge: How to distribute workload (3)

12

Is queuing the final answer?

Keep up with peak workload? Handle backlog gracefully

Resolve the challenge: How to distribute workload (4)

13

Concepts

Synchronous and asynchronous Job, job queue and job server

Who

Used at LiveJournal, Yahoo!, Digg, BackType and many more

Used at Vega (clip.vn, vega.com.vn) for sending mails.

At Skunkworks?

Gearman: an open source high performance job server (1)

14

Architecture

Client Worker Job server

Gearman: an open source high performance job server (2)

Fail-over cluster

15

Features

Fast Programming language neutral A bridge between a message queue server and a pub/sub engine Enables applications to outsource tasks to other servers in a

synchronous or asynchronous manner Fault-tolerant Poison message and retries Persistent queues for background jobs Timeout

Gearman: an open source high performance job server (3)

16

How it works

Worker• worker connects to all gearmand servers.• worker registers what functions it supports.• worker asks for jobs.• if no jobs, sends command 'pre_sleep' to all gearmand's and sleeps.

Client connect to gearmand. submit a job for a particular job name

Gearmand acks the job, finds all sleeping workers related to the job. sends them all a 'noop' command to wake them up.

Gearman: an open source high performance job server (4)

17

Use cases

Long running processes: thumbnail generation, image resizing, order processing in e-commerce …

High CPU or memory requirements: high volume data processing, MapReduce, log aggregation, video encoding

Distributed and parallel processing Timing processing: incremental updates, data replication Limited rate FIFO processing Separation of concerns or security issues. Priority-aware system monitoring tasks: WonderProxy

Gearman: an open source high performance job server (5)

18

PHP interface library to Gearman server

PECL gearman: http://pecl.php.net/package/gearman or https://github.com/php/pecl-gearman

Pear's Net_Gearman: http://pear.php.net/package/Net_Gearman

Develop PHP clients and workers (1)

19

PHP Client = Job Sender

Develop PHP clients and workers (2)

20

PHP Worker = Job Executor

Develop PHP clients and workers (3)

21

Ease of use

How to manage multiple worker processes for a single job: launch, reload, stop, add process ...

Monitoring

Centralized management over set of servers

Web API (Restful)

Challenges in managing workers – a case study of Gearman Agent Manager

Questions?

@skunkworksvn, @pcdinh #barcampsaigon #teamskunkworks