39
PHP CLI A Cinderella Story

PHP CLI: A Cinderella Story

Embed Size (px)

DESCRIPTION

How to utilize the PHP CLI SAPI in a scalable way. Initially presented at the 2008 DC PHP Conference

Citation preview

Page 1: PHP CLI: A Cinderella Story

PHP CLI

A Cinderella Story

Page 2: PHP CLI: A Cinderella Story

Introduction

• Andrew Minerd is a software architect at the Selling Source, Inc. As a part of the architecture team he is responsible for the overall technical direction of the Selling Source software products.

• Mike Lively is a team lead with the Selling Source, developing an online loan servicing solution. This solution heavily uses and relies on background processing to perform many tasks ranging from sending legal documents to transferring money to and from bank accounts.

Page 3: PHP CLI: A Cinderella Story

If you use Windows...

...please leave now.

Page 4: PHP CLI: A Cinderella Story

Overview

Why Identifying processes that can be backgrounded Walk through the evolution of a CLI script

Creating a single process Creating multiple processes Distributing a process across multiple machines

Page 5: PHP CLI: A Cinderella Story

Why Background Processing

• Performance - Let your web server serve webpages• Robustness - If a web service or email fails, it is easier to

handle in the background• Isolation - Using background processes can help isolate

functionality and allow you to easily swap it out for different (sometimes better) services

• Efficiency - consolidate resource requirements

Page 6: PHP CLI: A Cinderella Story

Why Use PHP?

Reuse Existing development staff Existing code Existing infrastructure

Quick prototyping

Page 7: PHP CLI: A Cinderella Story

Identifying Suitable Processes

Anything where an immediate response is not vital Email notifications Remote service calls

Processing data in advance Pre-caching Aggregating Data

Even a few things where a somewhat immediate response is needed Notify users upon completion

Page 8: PHP CLI: A Cinderella Story

Single Process

Advantages: Easiest to implement Don't have to worry about synchronization Don't have to worry about sharing data Already familiar with this paradigm

Disadvantages: You can only do one thing

Page 9: PHP CLI: A Cinderella Story

Introducing the CLI SAPI

SAPI: Server API; PHP's interface to the world Special file descriptor constants:

STDIN: standard in STDOUT: standard out STDERR: standard error

Special variables: $argc: number of command line parameters $argv: array of parameter values

Misc dl() still works (worth mentioning?)

Page 10: PHP CLI: A Cinderella Story

Writing a cronjob

Advantages Automatically restarts itself Flexible scheduling good for advance processing

Challenges Long-running jobs

Page 11: PHP CLI: A Cinderella Story

Overrun protection

Touch a lock file at startup, remove at shutdown Work a little `ps` magic

Page 12: PHP CLI: A Cinderella Story

Work Queues

Database MySQL SQLite

Message queue Memcached

Possible, not necessarily optimal

Page 13: PHP CLI: A Cinderella Story

MySQL Work Queues

Segregate tasks on a specific table by auto_increment key Access is very fast for MyISAM, can be even faster for

InnoDB Create a separate table to hold progress

If progress == MAX(id), nothing needs to be done LOCK/UNLOCK TABLE; easy synchronization Single point of failure, but probably already is

Page 14: PHP CLI: A Cinderella Story

SQLite Work Queue

SQLite 3 only locks during active writes by default BEGIN EXCLUSIVE TRANSACTION prevents others from

reading and writing Synchronized access to a progress/queue table Lock is retained until COMMIT

Page 15: PHP CLI: A Cinderella Story

Memcached

Perhaps already familiar Eases transition for processes dependent upon shared

memory VOLATILE STORAGE Use as a job queue?

Add a lock key; on fail (key exists) block and poll Read pointer Read item Increment pointer Remove lock key

Already capable of distributing storage across servers

Page 16: PHP CLI: A Cinderella Story

Persistent Processing

Advantages: Mitigate setup overhead by doing it once

Disadvantages: Persistent processes may be more susceptible to

memory leaks More housekeeping work than cronjobs

Page 17: PHP CLI: A Cinderella Story

Process Control

Signal handling pcntl_signal - Commonly used signals

What are ticks Daemonizing

Fork and kill parent Set the child to session leader Close standard file descriptors See: daemon(3)

Page 18: PHP CLI: A Cinderella Story

Signals

• SIGHUP• SIGTERM; system shutdown, kill• SIGINT; sent by Ctrl+c• SIGKILL (uncatchable); unresponsive, kill -9• SIGCHLD; child status change• SIGSTP; sent by Ctrl+z• SIGCONT; resume from stop, fg• See: signal(7), kill -l

Page 19: PHP CLI: A Cinderella Story

Daemonize

function daemon($chdir = TRUE, $close = TRUE){  // fork and kill off the parent  if (pcntl_fork() !== 0)  {    exit(0);  }

  // become session leader  posix_setsid();

  // close file descriptors  if ($close)  {    fclose(STDIN);    fclose(STDOUT);    fclose(STDERR);  }

  // change to the root directory  if ($chdir) chdir('/');}

Page 20: PHP CLI: A Cinderella Story

Multiple Processes

Advantages: Take advantage of the multi-core revolution; most

machines can now truly multiprocess Disadvantages:

Must synchronize process access to resources Harder to communicate

Page 21: PHP CLI: A Cinderella Story

Directed vs. Autonomous

Directed: one parent process that distributes jobs to children processes Single point of failure No locking required on job source

Autonomous: multiple peer processes that pick their own work Need to serialize access to job source Single peer failure isn't overall failure

Split work into independent tasks

Page 22: PHP CLI: A Cinderella Story

Forking

<?php$pid = pcntl_fork();if ($pid == -1) {    die("Could not fork!");} else if ($pid) {    // parent} else {    // child}?>

Page 23: PHP CLI: A Cinderella Story

Forking Multiple Children

<?phpdefine('MAX_CHILDREN', 5);$children = array();$jobs = get_jobs();

while (count($jobs)) {  if (count($children) < MAX_CHILDREN) {    $data = array_shift($jobs);    $pid = pcntl_fork();    if ($pid == -1) {      die("Could not fork!");    } else if ($pid) {      $children[$pid] = true;    } else {      process_data($data);      exit(0);    }  }

  while ($wait_pid = pcntl_waitpid(-1, $status, WNOHANG)) {    if ($wait_pid == -1) {      die("problem in pcntl_waitpid!");    }    unset($children[$wait_pid]);  }}

?>

Page 24: PHP CLI: A Cinderella Story

Shared Resources

File/socket descriptors shared between parent and child Some resources cannot be shared

MySQL connections Use resources before forking Assume children will probably need to open and establish

its own resources Allow your resources to reopen themselves

Page 25: PHP CLI: A Cinderella Story

Shared Resources

<?php// ...// bad time to open a database connection$db = new PDO('mysql:host=localhost', 'dbuser', 'pass');

while (count($data)) {  if (count($children) < MAX_CHILDREN) {    $data = array_shift($jobs);    $pid = pcntl_fork();    if ($pid == -1) {      die("Could not fork!");    } else if ($pid) {      $children[$pid] = true;    } else {      process_data($data, $db);      exit(0); // When the child exits the database connection                // will be disposed of.    }  }  // ...}

?>

Page 26: PHP CLI: A Cinderella Story

Shared Resources

<?php// ...

while (count($data)) {  if (count($children) < MAX_CHILDREN) {    $data = array_shift($jobs);    $pid = pcntl_fork();    if ($pid == -1) {      die("Could not fork!");    } else if ($pid) {      $children[$pid] = true;    } else {      // Much safer      $db = new PDO('mysql:host=localhost', 'dbuser', 'pass');      process_data($data, $db);      exit(0); // When the child exits the database connection                // will be disposed of.    }  }  // ...}

?>

Page 27: PHP CLI: A Cinderella Story

Memory Usage

Entire process space at time of forking is copied Do as little setup as possible before forking If you have to do setup before forking; clean it up in the

child after forking Pay particular attention to large variables

Page 28: PHP CLI: A Cinderella Story

Memory Usage

<?phpdefine('MAX_CHILDREN', 5);$children = array();$jobs = get_jobs();

while (count($jobs)) {  if (count($children) < MAX_CHILDREN) {    $data = array_shift($jobs);    $pid = pcntl_fork();    if ($pid == -1) {      die("Could not fork!");    } else if ($pid) {      $children[$pid] = true;    } else {      unset($jobs); // <--- will save memory in your child where you do not need $jobs around anymore      process_data($data);      exit(0);    }  }

  while ($wait_pid = pcntl_waitpid(-1, $status, WNOHANG)) {    if ($wait_pid == -1) {      die("problem in pcntl_waitpid!");    }    unset($children[$wait_pid]);  }}

?>

Page 29: PHP CLI: A Cinderella Story

Shared Memory

Shmop_* or shm_*? shm functions store and retrieve key/value pairs stored

as a linked list Retrieval by key is O(n)

shmop functions access bytes Semaphores

Generic locking mechanism Message queues ftok()

Page 30: PHP CLI: A Cinderella Story

How to Talk to Your Kids

• msg_get_queue($key, $perms)• msg_send($q, $type, $msg, $serialize, $block, $err)• msg_receive($q, $desired, $type, $max, $msg, $serialize, $flags, $err)

• Use types to communicate to a specific processo Send jobs with type 1o Responses with PID of process

Page 31: PHP CLI: A Cinderella Story

How to Talk to Your Kids

• array stream_socket_pair($domain, $type, $protocol)• Creates a pair of socket connections that communicate

with each other• Use the first index in the parent, use the second index in

the child (or the other way around)

Page 32: PHP CLI: A Cinderella Story

How to Talk to Your Kids

<?php$socks = stream_socket_pair(STREAM_PF_UNIX, STREAM_SOCK_STREAM, STREAM_IPPROTO_IP);$pid = pcntl_fork();

if ($pid == -1) {     die('could not fork!');} else if ($pid) {     // parent    fclose($socks[1]);    fwrite($socks[0], "Hi kid\n");    echo fgets($socks[0]);    fclose($socks[0]);} else {    // child    fclose($socks[0]);    fwrite($socks[1], "Hi parent\n");    echo fgets($socks[1]);    fclose($socks[1]);}/* Output: Hi kidHi parent*/?>

Page 33: PHP CLI: A Cinderella Story

Distributing Across Servers

Advantages: Increased reliability/redundancy Horizontal scaling can overcome performance plateau

Disadvantages: Most complex Failure recovery can be more involved

Page 34: PHP CLI: A Cinderella Story

Locking

Distributed locking is much more difficult Database locking

"Optimistic" vs. "Pessimistic" Handling failures when the progress is already updated

Page 35: PHP CLI: A Cinderella Story

Talking to Your Servers

Roll your own network message queues stream_socket_server(), stream_socket_client() Asynchronous IO

stream_select() curl_multi() PECL HTTP

Page 36: PHP CLI: A Cinderella Story

Failure Tolerance

PHP cannot recover from some types of errors Heartbeat

Moves a service among cluster init style scripts start/stop services

Angel process Watches a persistent process and restarts it if it fails

What if dependent services fail?

Page 37: PHP CLI: A Cinderella Story

"Angel" Process

<?php

    function run($function, array $args = array())    {        do        {            $pid = pcntl_fork();            if ($pid === 0)            {                call_user_func_array($function, $args);                exit;            }        }        while (pcntl_waitpid($pid, $s));    }

?>

Page 38: PHP CLI: A Cinderella Story

Angel as a Cron Job

• In your primary script write your pid to a file• In the angel cron check for that pid file and if it exists,

ensure the pid is still running `ps -o pid= <pid>` or file_exists('/proc/<pid>')

• If the file does not exist, or the process can not be found, restart the process

Page 39: PHP CLI: A Cinderella Story

Resources

• http://php.net/manual - as always• http://linux-ha.org/ - Heartbeat• http://dev.sellingsource.com/ - Forking tutorial• http://curl.haxx.se/libcurl/c/ - libcurl documentation• man pages• http://search.techrepublic.com.com/search/php+cli.html