Upload
blaze-grant
View
233
Download
1
Embed Size (px)
Citation preview
© 2006 IBM Corporation2
PHP Opcodes
Presentation splits into 3 sections
– Generation of opcodes
• ZEND_COMPILE
– Generation of the Interpreter code
• Interpreter comes in many flavours!!
– Execution of opcodes
• ZEND_EXECUTE
© 2006 IBM Corporation3
Execution path for a script
php_execute_script()
zend_execute_scripts()
zend_execute()
user call(function/method)
include/require
zend_compile_file()
© 2006 IBM Corporation4
zend_compile_file
Function ptr that can be overridden to call alternative compiler, e.g. by B-compiler
By default resolves to a call to compile_file() in zend_language_scanner.c
Compilation is broken down into 2 steps:
– Lexical analysis of source PHP script into tokens
– Parsing of resulting tokens into opcodes
PHP
script
Lexical
AnalyserParser
byte codestokens
© 2006 IBM Corporation5
Lexical Analysis
Lexical analyser code in zend_language_scanner.c
– generated from zend_language_scanner.l using “flex”
Exposed to userspace as token_get_all()
<?php
$tokens = token_get_all("<?php echo = ‘Hello World’;
?>"); foreach($tokens as $token) { if (is_array($token)) { printf("%s \t %s\n", token_name($token[0]), $token[1]); } else { printf("\t'%s'\n", $token); } } ?>
T_OPEN_TAG <?phpT_ECHO echoT_WHITESPACET_CONSTANT_ENCAPSED_STRING 'Hello World' ';'T_CLOSE_TAG ?>
Lexical
Analysis
© 2006 IBM Corporation6
Parsing
Next the tokens are compiled into opcodes
– Parser code in zend_language_parser.c which is generated from zend_langauge_parser.l by “Bison”
– Calls code in zend_compile.c to generate opcodes
Parser
T_OPEN_TAG <?phpT_ECHO echoT_WHITESPACET_CONSTANT_ENCAPSED_STRING 'Hello World' ';'T_CLOSE_TAG ?>
ZEND_OPZEND_OPZEND_OP ZEND_OP
© 2006 IBM Corporation7
Non-PHP statements
Whats does the complier do with any non-PHP statements in the input script, e.g. HTML
– All such statements are complied into ECHO statements
– So at execution time the statements are just output asis
<!-- example for PHP 5.0.0 final release -->
<?php
$domain = "localhost";
$user = "root";#note "MIKE" is unacceptable
$password = "";
$conn = mysql_connect( $domain, $user, $password );
if($conn)
{
$msg = "Congratulations !!!! $user, You connected to MySQL";
}
?>
<html>
<head>
<title>Connecting user</title>
</head>
<body>
<h3>
<?php echo( $msg ); ?>
</h3>
</body>
</html>
© 2006 IBM Corporation8
Non-PHP statements
line # op fetch ext operands
-------------------------------------------------------------------------------
3 0 ECHO '%3C%21--+example+for+
PHP+5.0.0+final+release+--%3E%0D%0A%0D%0A'
5 1 ASSIGN !0, 'localhost'
6 2 ASSIGN !1, 'root'
7 3 ASSIGN !2, ''
9 4 INIT_FCALL_BY_NAME 'mysql_connect'
…….. snip ….
22 ADD_STRING ~5, ~5, 'MySQL'
23 ASSIGN !4, ~5
14 24 JMP ->25
26 25 ECHO '%0D%0A%3Chtml%3E%0D%0A
%0D%0A+%3Chead%3E%0D%0A++%3Ctitle%3EConnecting+user%3C%2Ftitle%3E%0D%0A+%3C%2Fhead%3E%0D
%0A%0D%0A+%3Cbody%3E%0D%0A++%3Ch3%3E+%0D%0A+++'
26 ECHO !4
30 27 ECHO '+%0D%0A++%3C%2Fh3%3E
%0D%0A+%3C%2Fbody%3E%0D%0A%0D%0A%3C%2Fhtml%3E'
28 RETURN 1
29 ZEND_HANDLE_EXCEPTION
© 2006 IBM Corporation9
Opcodes
Each Opcodes consists of:– Opcode handler – 1 or 2 input operands– Optional result operand – Optional “Extended value”
• Meaning opcode dependent, e.g on a ZEND_CAST it defines target type– Line number in original source script– Opcode. Range 0 - 151.
• All listed in zend_vm_opcodes.h– Total size of each zend_op is 96 bytes
Some operations consist of 2 opcodes– e.g ZEND_ASSIGN_OBJ – 2nd Opcode set to ZEND_OP_DATA
struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode;};
© 2006 IBM Corporation10
znode One for each operand and result
– each znode is 24 bytes Type can be as follows:
– IS_CONST (0x1)• program literal
– IS_TMP_VAR (0x2)• temporary variable with no name• intermediate result
– IS_VAR (0x4)• temporary variable with a name• defined in symbol table
– IS_UNUSED (0x8)• operand not specified
– IS_CV (0x10)• optimized version of VAR
For some opcodes type of znode is implied, e.g. for a JMP opcode’s op1 znode defines jump target address in “jmp_addr”
EA defines “extended attributes”– meanings opcode dependent – e.g. on a ZEND_UNSET_VAR it defines if variable is static or not
typedef struct _znode { int op_type; union { zval constant;
zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u;} znode;
© 2006 IBM Corporation11
zend_compile_file() Returns a pointer to zend_op_array for global scope
– first, it's not an array but a structure
zend_op_array contains a pointer to an array of opcodes, plus much more including:
– pointer to array of complied variables details. More on these later.
– count of number of temporaries (TMP + VAR) required by opcodes
• i.e. the number of Zend Engine registers used
– pointer to hash table for all static's defined by the function
• the Hashtable is created and populated by the compiler if needed
Compiler produces one zend_op_array for:
– global scope
• this is the one returned to caller of zend_compile_file and is saved in EG(active_op_array)
– each user function
• added to thread’s function table by compiler
– each user class method
• added to function table for class by compiler
© 2006 IBM Corporation12
zend_compile_file()
Initial opcode array allocated by init_op_array() in zend_opcode.c– allocated from heap – sufficient for just 64 opcodes
Reallocated each time it is full when by get_next_op() – Reallocates new array 4 times current size
Storage for opcode array freed by call to destroy_op_array() at request end– For global scope called from zend_execute_scripts()– For functions and methods called by Hash Table dtor routine. More later
struct _zend_op_array {…….zend_uint *refcount;zend_op *opcodes;zend_uint last, size;zend_compiled_variable *vars;int last_var, size_var;zend_uint T;zend_brk_cont_element *brk_cont_array;zend_uint last_brk_cont;zend_uint current_brk_cont;zend_try_catch_element *try_catch_array;int last_try_catch;/* static variables support */HashTable *static_variables;….. e.t.c
;
ZEND_OP ZEND_OP ZEND_OP ZEND_OP ZEND_OP
© 2006 IBM Corporation13
Opcodes Not all opcode information can be determined as opcodes are
generated by compiler, e.g. target address for a JMP opcode. So after all opcodes generated a 2nd pass is made over opcode array to fill in the missing information: – set target for all jump opcodes
• during compilation jump targets are opcode array index’s. These are changed to absolute addresses
– set opcode handler to as defined by executor generated:• CALL: address of handler function • GOTO: address of label• SWITCH: identifier (int) for handling CASE block
– for any operands (op1 or op2) which are CONSTANTS modify zval to is_ref=1, refcount=2 to ensure zval copied
– trim opcode array to required size; i.e. free unused storage – See pass_two() in zend_opcode.c
© 2006 IBM Corporation14
Functions and Classes During MINIT 2 hash tables are built which the compiler uses
– GLOBAL_FUNCTION_TABLE • Populated with names of all built-in functions, and functions
defined by any enabled extensions – GLOBAL_CLASS_TABLE
• Populated with default classes and any classes defined by enabled extensions
The complier ADDs to both tables during compile step– Any new entries are then removed again at request shutdown
In a non-ZTS environment compiler updates the GLOBAL hash tables
In a ZTS environment GLOBAL tables are read-only– A separate r/w copy of each table is created for each new thread and
populated from GLOBAL table in compiler_globals_ctor()
© 2006 IBM Corporation15
Function table’s
One function table per thread– Address stored in executor globals (EG)
Function table is a Hashtable mapping function name to “zend_function” – zend_function structure is itself a union of structure’s
Populated with built in functions, extension functions e.t.c by copying GLOBAL_FUNCTIONS_TABLE built during MINIT in compiler_globals_ctor()
– type == ZEND_INTERNAL_FUNCTION– zend_function == zend_internal_function
During each request functions defined by user script are added to function table at compile time
– type == ZEND_USER_FUNCTION– zend_function == zend_op_array
typedef union _zend_function { zend_uchar type; struct { zend_uchar type; /* never used */ char *function_name; zend_class_entry *scope; ………. <SNIP > …………. zend_bool pass_rest_by_reference; unsigned char return_reference; } common; zend_op_array op_array; zend_internal_function internal_function;} zend_function;
© 2006 IBM Corporation16
State of play after compile complete
opcodes
zend_op_array
active_oparray
symbol_table
active_symbol_table
function_table
class_table
executor_globals
GLOBALS
_ENV
HTTP_ENV_VARS
…….
symbol_table
function_table
<?php function add5($a) { return $a + 5; } function sub5($a) { return $a - 5; } $a = add5(10); $b = sub5(15); ?>
zend_op_array
zend_op_array
zend_internal_fucntion
<internal func>
add5
sub5
…….
<internal func>
op_array for
global scope
zend_internal_fucntion
© 2006 IBM Corporation17
Function tables
User entries removed from global function table during RSHUTDOWN processing by call to shutdown_executor()
– As user function entries are added after all internal functions the code uses the zend_hash_reverse_apply() function to traverse threads function table entries backwards removing entries until type != ZEND_USER_FUNCTION
– Removal triggers HT dtor routine ZEND_FUNCTION_DTOR which in turn calls destroy_op_array() to free opcode array and other structures which hang of zend_op_array
© 2006 IBM Corporation18
Class_table
One class table per thread
– Address stored in executor globals (EG)
Class table is a Hashtable mapping class name to “zend_class_entry”
Populated with default classes and extension defined classes by copying GLOBAL_CLASS_TABLE built during MINIT in compiler_globals_ctor()
During each request classes defined by user script are added to class table at compile time
– Each class has its own function table and compiler adds an entry for each method defined by a class
© 2006 IBM Corporation19
State of play after compile complete :
opcodes
zend_op_array
active_oparray
symbol_table
active_smbol_table
function_table
class_table
executor_globals
GLOBALS
_ENV
HTTP_ENV_VARS
…….
symbol_table
<internal class>
Dog
…….
class_table
<?phpclass Dog{ function bark() { print "Woof!"; } function sit() { print “Sit!!”; }}$pooch = new Dog;$pooch ->bark();$pooch ->sit();?>
zend_op_arraybark
sit
…….
function_table
<internal func>
<internal func>
…….
function_table
zend_op_array
<internal method>
<internal method>
…….
function_tablezend_internal_fucntion
zend_internal_fucntion
© 2006 IBM Corporation20
Class table
User class entries are removed by shutdown executor() by traversing threads class table backwards removing all entries until type != ZEND_USER_CLASS
– Removal triggers HT door routine ZEND_CLASS_DTOR which in turn calls destroy_zend_class()
– destroy_zend_class() calls zend_hash_destroy() on the class’s function_table which walks the HT and calls dtor ZEND_FUNCTION_DTOR on each entry as described earlier
© 2006 IBM Corporation21
Static variables
Local scope but value retained across calls
Hashtable allocated by compiler per function or method when first static variable defined
– Referenced by zend_op_array structure
Statics added to Hashtable as found by compiler
© 2006 IBM Corporation22
Examining compile results
Two tools available for analysing results of compile
– VLD
– Parsekit
Both available from PECL
© 2006 IBM Corporation23
VLD
Dumps opcodes for a given PHP script
– Written by Derick Rethans
Download from PECL
– http://pecl.php.net/package/vld/0.8.0
Simple configuration
– --enable-vld[=shared]
Invoked via command line switches
– php -dvld.active=[0|1] –dvld.execute=[0|1] –f <php script>
– Can override defaults in php.ini
© 2006 IBM Corporation24
VLD
No config.w32 file for Windows
ARG_ENABLE("vld", “Enable Vulcan Opcode decoder" , "no");
if (PHP_VLD != "no") {
EXTENSION("vld", "vld.c srm_oparray.c");
}
© 2006 IBM Corporation25
VLD output
line # op fetch ext operands-------------------------------------------------------------------------- 2 0 ECHO '%0A' 4 1 ASSIGN !0, 5 5 2 ASSIGN !1, 10 6 3 ADD ~2, !0, !1 4 ADD ~3, ~2, 99 5 ASSIGN !2, ~3 8 6 INIT_STRING ~5 7 ADD_STRING ~5, ~5, 'c' 8 ADD_STRING ~5, ~5, '%3D+' 9 ADD_VAR ~5, ~5, !2 10 ADD_STRING ~5, ~5, '+' 11 ADD_CHAR ~5, ~5, 10 12 ECHO ~5 11 13 RETURN 1 14 ZEND_HANDLE_EXCEPTION
c= 114
<?php $a = 5; $b = 10; $c = $a + $b + 99; echo "c= $c \n"; ?>php -f test.php -dvld.active=1
KEY
! == compiler variable
$ == variable
~ == temporary
There are TMP’s defied for results here
but they are not used and VLD does not
list them
© 2006 IBM Corporation26
Why all these “+” in VLD output for CONST’s ?
<?php
echo "Hello World";
echo "Hello World";
echo "Hello + World";
?>
line # op fetch ext operands
-------------------------------------------------------------------------------
2 0 ECHO '%0D%0A+'
4 1 ECHO 'Hello+World'
5 2 ECHO 'Hello++++++++++++++++++++++++++++++++World'
6 3 ECHO 'Hello+%2B+World'
9 4 RETURN 1
5 ZEND_HANDLE_EXCEPTION
Answer: VLD calls php_url_encode() on the CONST to format it before output which amongst other things converts all spaces to “+”. Internally white space is stored as 0x20 as you would expect.
© 2006 IBM Corporation27
parsekit PHP opcode analyser written by Sara Goleman
– meant for development and debug only; some code not thread safe
Download from PECL
– http://pecl.php.net/package/parsekit
Simple configuration
– --enable-session[=shared]
Implements 5 functions
– parsekit_compile_string
– parsekit_compile_file
– parsekit_func_arginfo
– parsekit_opcode_flags
– parsekit_opcode_name
© 2006 IBM Corporation28
parsekit array parsekit_compile_string ( string phpcode [, array &errors [, int
options]] )– compiles and then analyzes supplied string – array parsekit_compile_string ( string phpcode [, array &errors [, int options]] )
• errors: 2 dimensional array of errors encounterd during compile– example of use in parsekit/examples
• options: either PARSEKIT_SIMPLE or PARSEKIT_QUIET– PARSEKIT_QUIET results in more verbose output
array parsekit_compile_file ( string filename [, array &errors [, int options]] )– As above but takes name of a .php file as input
array parsekit-func-arginfo (mixed function)– Return the arg_info data for a given user defined function/method
long parsekit_opcode_flags (long opcode)– Return flags which define return type, operand types etc for an opcode
string parsekit_opcode_name (long opcode)– Return name for given opcode
© 2006 IBM Corporation29
parsekit-compile-string: SIMPLE output
<?php $oparray = parsekit_compile_string('echo "HelloWorld";', $errors, PARSEKIT_SIMPLE); var_dump($oparray);
?>
array(5) { [0]=> string(36) "ZEND_ECHO UNUSED 'HelloWorld' UNUSED" [1]=> string(30) "ZEND_RETURN UNUSED NULL UNUSED" [2]=> string(42) "ZEND_HANDLE_EXCEPTION UNUSED UNUSED UNUSED" ["function_table"]=> NULL ["class_table"]=> NULL}
© 2006 IBM Corporation30
parsekit-compile-file: QUIET output
<?php $oparray = parsekit_compile_string('echo "HelloWorld";', $errors, PARSEKIT_QUIET); var_dump($oparray);
?>
array(20) { ["type"]=> int(2) ["type_name"]=> string(18) "ZEND_USER_FUNCTION" ["fn_flags"]=> int(0) ["num_args"]=> int(0) ["required_num_args"]=> int(0) ["pass_rest_by_reference"]=> bool(false) ["uses_this"]=> bool(false) ["line_start"]=> int(0) ["line_end"]=> int(0) ["return_reference"]=> bool(false) ["refcount"]=> int(1) ["last"]=> int(3) ["size"]=> int(3) ["T"]=> int(0) ["last_brk_cont"]=> int(0) ["current_brk_cont"]=> int(-1) ["backpatch_count"]=> int(0) ["done_pass_two"]=> bool(true) ["filename"]=> string(49) "C:\Testcases\helloWorld.php" ["opcodes"]=> array(3) { [0]=> array(5) { ["opcode"]=> int(40) ["opcode_name"]=> string(9) "ZEND_ECHO" ["flags"]=> int(768) ["op1"]=> array(3) { ["type"]=> int(1) ["type_name"]=> string(8) "IS_CONST" ["constant"]=> &string(11) "Hello World" } ["lineno"]=> int(3) etc…..
© 2006 IBM Corporation31
parsekit-func-arginfo
<? phpfunction foo ($a, stdClass $b, &$c) {
}
$oparray = parsekit_func_arginfo (‘foo’);
var_dump($oparray);
?>
array(3) { [0]=> array(3) { ["name"]=> string(1) "a" ["allow_null"]=> bool(true) ["pass_by_reference"]=> bool(false) } [1]=> array(4) { ["name"]=> string(1) "b" ["class_name"]=> string(8) "stdClass" ["allow_null"]=> bool(false) ["pass_by_reference"]=> bool(false) } [2]=> array(3) { ["name"]=> string(1) "c" ["allow_null"]=> bool(true) ["pass_by_reference"]=> bool(true) }}
© 2006 IBM Corporation32
parsekit-opcode-name
<?php
$opname = parsekit_opcode_name (61);
var_dump($opname);
?>
string(21) "ZEND_DO_FCALL_BY_NAME"
<?php
$opflags = parsekit_opcode_flags (61);
var_dump($opflags);
?>
int(16777218)
flags define whether opcode takes op1 and op2,
defines EA, sets a result etc
© 2006 IBM Corporation33
Execution path for a script
php_execute_script()
zend_execute_scripts()
zend_execute()
user call(function/method)
include/require
zend_compile_file()
© 2006 IBM Corporation34
PHP Interpreter
Can be generated in many flavours
– 12 different versions possible
Generated by a chunk of PHP code; zend_vm_gen.php
– You need to understand regular expressions before attempting to read this code
Interpreter generated from
– definition of each opcode in zend_vm_def.h, and
– skeletal interpreter body in zend_vm-execute.skl
© 2006 IBM Corporation35
Interpreter generation process
zend_vm_gen.php
zend_vm_execute.skl
zend_vm_def.h
zend_vm-execute.h
zend_vm_opcodes.h
© 2006 IBM Corporation36
zend_vm_execute.skl{%DEFINES%}
ZEND_API void {%EXECUTOR_NAME%}(zend_op_array *op_array TSRMLS_DC)
{
zend_execute_data execute_data;
{%HELPER_VARS%}
{%INTERNAL_LABELS%}
if (EG(exception)) {
return;
}
/* Initialize execute_data */
EX(fbc) = NULL;
EX(object) = NULL;
EX(old_error_reporting) = NULL;
if (op_array->T < TEMP_VAR_STACK_LIMIT) {
EX(Ts) = (temp_variable *) do_alloca(sizeof(temp_variable) * op_array->T);
} else {
EX(Ts) = (temp_variable *) safe_emalloc(sizeof(temp_variable), op_array->T, 0);
}
…… etc
triggers to zend_vmg_gen.php to insert generated code
© 2006 IBM Corporation37
zend_vm_defs.h
ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV)
{
zend_op *opline = EX(opline); zend_free_op free_op1, free_op2;
add_function(&EX_T(opline->result.u.var).tmp_var,
GET_OP1_ZVAL_PTR(BP_VAR_R),
GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC);
FREE_OP1();
FREE_OP2();
ZEND_VM_NEXT_OPCODE();
}
opcode opcode name
types accepted forop1
types accepted forop2
.. although this is just a macro!
helper function triggers to php code to replace text
© 2006 IBM Corporation38
Interpreter generation process
Usage information:
php zend_vm_gen.php [options]
Options:
--with-vm-kind=CALL|SWITCH|GOTO - select threading model (default is CALL)
--without-specializer - disable executor specialization
--with-old-executor - enable old executor
--with-lines - enable #line directives
–with-vm-kind defines execution method• CALL: Each opcode handler is defined as a function• SWITCH: Each opcode handler is a case block in one huge switch
statement• GOTO: Label defined for each opcode handler
--without-specializer means only one handler per opcode • With specializer’s a handler generated for each possible combination of
operand types • A reported 20% speedup with specializers enabled over old executor
© 2006 IBM Corporation39
Interpreter generation process
--with-old-executor enables runtime decision to call old pre-ZE2 type executor which is a CALL type executor with no specializer’s – zend_vm_use_old_executor() defined to switch executor model – no current callers though
--with-lines results in addition of #lines directives to generated zend_vm_execute.h
#line 28 "C:\PHPDEV\php5.2-200612111130\Zend\zend_vm_def.h"
static into ZEND_ADD_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
zend_op *opline = EX(opline);
add_function(&EX_T(opline->result.u.var).tmp_var,
…. etc
default interpreter which is checked into CVS is generated as followsphp zend_vm_gen.php –with-vm-kind=CALL
© 2006 IBM Corporation40
Specialization
With specialization enabled an handler is generated for each valid combination of input operand
– As each input operand (op1 and op2) can take 1 of 5 types
• TMP• VAR• CV• CONST• UNUSED
– This gives a theoretical 25 opcode handlers for each opcode
© 2006 IBM Corporation41
zend_vm_defs.h
ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV)
{
zend_op *opline = EX(opline); zend_free_op free_op1, free_op2;
add_function(&EX_T(opline->result.u.var).tmp_var,
GET_OP1_ZVAL_PTR(BP_VAR_R),
GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC);
FREE_OP1();
FREE_OP2();
ZEND_VM_NEXT_OPCODE();
}
© 2006 IBM Corporation42
ZEND_ADD without specialization
static int ZEND_ADD_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
zend_op *opline = EX(opline);
zend_free_op free_op1, free_op2;
add_function(&EX_T(opline->result.u.var).tmp_var,
get_zval_ptr(&opline->op1, EX(Ts), &free_op1, BP_VAR_R),
get_zval_ptr(&opline->op2, EX(Ts), &free_op2, BP_VAR_R)
TSRMLS_CC);
FREE_OP(free_op1);
FREE_OP(free_op2);
ZEND_VM_NEXT_OPCODE();
}
Handler calls non-type specific routines to get zval * for op1 and op2
© 2006 IBM Corporation43
ZEND_ADD with specialization
static int ZEND_ADD_SPEC_CONST_CONST_HANDLER
(ZEND_OPCODE_HANDLER_ARGS)
{
zend_op *opline = EX(opline);
add_function(&EX_T(opline->result.u.var).tmp_var,
&opline->op1.u.constant,
&opline->op2.u.constant TSRMLS_CC);
ZEND_VM_NEXT_OPCODE();
}
static int ZEND_ADD_SPEC_CONST_TMP_HANDLER
(ZEND_OPCODE_HANDLER_ARGS)
{
zend_op *opline = EX(opline);
zend_free_op free_op2;
add_function(&EX_T(opline->result.u.var).tmp_var,
&opline->op1.u.constant,
_get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)
TSRMLS_CC);
zval_dtor(free_op2.var);
ZEND_VM_NEXT_OPCODE();
}
static int ZEND_ADD_SPEC_CONST_VAR_HANDLER
(ZEND_OPCODE_HANDLER_ARGS)
{
zend_op *opline = EX(opline);
zend_free_op free_op2;
add_function(&EX_T(opline->result.u.var).tmp_var,
&opline->op1.u.constant,
_get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)
TSRMLS_CC);
if (free_op2.var) {zval_ptr_dtor(&free_op2.var);};
ZEND_VM_NEXT_OPCODE();
}
…. and 13 other handlers
Handlers call type specific routines
to get zval * for op1 and op2
© 2006 IBM Corporation44
zend_vm_gen.php$op1_get_zval_ptr = array( "ANY" => "get_zval_ptr(&opline->op1, EX(Ts), &free_op1, \\1)", "TMP" => "_get_zval_ptr_tmp(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)", "VAR" => "_get_zval_ptr_var(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)", "CONST" => "&opline->op1.u.constant", "UNUSED" => "NULL", "CV" => "_get_zval_ptr_cv(&opline->op1, EX(Ts), \\1 TSRMLS_CC)",);
$op2_get_zval_ptr = array( "ANY" => "get_zval_ptr(&opline->op2, EX(Ts), &free_op2, \\1)", "TMP" => "_get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)", "VAR" => "_get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)", "CONST" => "&opline->op2.u.constant", "UNUSED" => "NULL", "CV" => "_get_zval_ptr_cv(&opline->op2, EX(Ts), \\1 TSRMLS_CC)",…..<snip>function gen_code(….)…… $code = preg_replace( array( ......... "/GET_OP1_ZVAL_PTR\(([^)]*)\)/", "/GET_OP2_ZVAL_PTR\(([^)]*)\)/", ........ ), array( ....... ....... $op1_get_zval_ptr[$op1], $op2_get_zval_ptr[$op2], ....... ), $code);
© 2006 IBM Corporation45
Generated code not always the best !!
static int ZEND_INIT_ARRAY_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS){ zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (IS_CONST == IS_UNUSED) { ZEND_VM_NEXT_OPCODE();#if 0 || IS_CONST != IS_UNUSED } else { return ZEND_ADD_ARRAY_ELEMENT_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU);#endif }}
ZEND_VM_HANDLER(71, ZEND_INIT_ARRAY, CONST|TMP|VAR|UNUSED|CV, CONST|TMP|VAR|UNUSED|CV){ zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (OP1_TYPE == IS_UNUSED) { ZEND_VM_NEXT_OPCODE();#if !defined(ZEND_VM_SPEC) || OP1_TYPE != IS_UNUSED } else { ZEND_VM_DISPATCH_TO_HANDLER(ZEND_ADD_ARRAY_ELEMENT);#endif }}
Input: zend_vm-def.h
Output: zend_vm-execute.h
© 2006 IBM Corporation46
Mapping opcode to an handler
Generated zend_execute.h contains an array to map opcodes to handlers
– without specializers array has just 151 entries
– with specializers 3775 (151 * 25) entries
zend_execute.c defines a function to enable compiler to determine correct handler for a given opcode
– zend_vm_set_opcode_handler(zend_op *op)
– Decodes type information for op1 and op2 in supplied “zend_op” and picks appropriate handler from array of handlers. Handler returned will be either:
• function pointer for handler when CALL• id of handler routine for SWITCH• address of handlers label for GOTO
Mapping performed at compile time
– pass_two() of complier calls zend_vm_set_opcode_handle() to patch handler into all generated opcodes
© 2006 IBM Corporation47
zend_execute By default zend_execute function pointer addresses the generated
execute() routine in zend_execute.h
– This is called by zend_execute_scripts() with :• a pointer to the zend_op_array for global scope, and • if ZTS enabled the tsrm_ls pointer
Executor keeps state data for current user function in zend_execute_data structure which is allocated in execute() stack frame
– Address of currently executing functions zend_execute_data stored in EG
struct _zend_execute_data {
struct _zend_op *opline;
zend_function_state function_state;
zend_function *fbc; /* Function Being Called */
zend_op_array *op_array;
zval *object;
union _temp_variable *Ts;
zval ***CVs;
zend_bool original_in_execution;
HashTable *symbol_table;
struct _zend_execute_data *prev_execute_data;
zval *old_error_reporting;
};
© 2006 IBM Corporation48
execute()
On entry acquire storage for– Temporary variables
• Number of temporary variables used by function stored in “T” field of zend_op_array• Storage allocated on stack if alloca() available and T < 2000• If alloca not available or 2000+ temporaries then allocated by emalloc from heap
– CV cache• Number of compiled variables used stored in “last_var” field of zend_op_array• Allocated on stack regardless of size if alloca available or emalloc otherwise
Initialize zend_execute_data– Initialize EX(opline) to address first opcode to execute– EX(symbol_table) = EG(active_symbol_table)– EX(prev_execute_data) = EG(current_execute_data);– EG(current_execute_data) = &execute_data;
zend_execute_data
………
current_execute_data
……....
executor_globals
zend_execute_data
null
global scope foo()<?php function foo() { … } …… foo();}
© 2006 IBM Corporation49
Operand Types Operands Op1 and Op2 can be either:
– VAR ($)• Temporary variable into which interpreter caches zval * and zval ** for a defined
symbol.– TMP (~)
• Temporary variable were interpreter keeps an intermediate result.• For example $a = $b + $c, the sum of $b and $c will be stored in a TMP before
being assigned to $a– CV (!)
• Compiled variable. • Optimized version of a VAR. More to follow shortly
– CONSTANT • Program literal, e.g. $a = “hello”• Symbols are also constants• ZVAL allocated by complier
– ZVAL has is_ref=1 refcount=2 to force split on assignment– UNUSED
• Operand not defined for opcode Result operand can be VAR, TMP or CV
© 2006 IBM Corporation50
Temporary Variables: VAR and TMP
struct _zend_execute_data {
struct _zend_op *opline;
…….
union _temp_variable *Ts;
….. etc
};
typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; zval *str; zend_uint offset; } str_offset; zend_class_entry *class_entry;} temp_variable;
“Ts” field of zend_execute_data addresses an array of temp_variables – Size of array based on information gathered by compiler.
• The “var” field in the operands znode contains the offset into the “temp_variables” array
• Temporaries are each 24 bytes
• T and EX_T macros provided to do this
• Temporary variables are NOT re-used by compiler
typedef struct _znode { int op_type; union { zval constant;
zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u;} znode;
© 2006 IBM Corporation51
VAR variables
FETCH_W $0, 'a' /* Retrieve the $a variable for writing */ ASSIGN $0, 123 /* Assign the numeric value 123 to retrieved variable 0 */ FETCH_W $2, 'b' /* Retrieve the $b variable for writing */ ASSIGN $2, 456 /* Assign the numeric value 456 to retrieved variable 2 */ FETCH_R $5, 'a' /* Retrieve the $a variable for reading */ FETCH_R $6, 'b' /* Retrieve the $b variable for reading */ ADD ~7, $5, $6 /* Add the retrieved variables (5 & 6) together and store the result in 7 */ FETCH_W $4, 'c' /* Retrieve the $c variable for writing */ ASSIGN $4, ~7 /* Assign the value in temporary variable 7 into retrieved variable 4 */ FETCH_R $9, 'c' /* Retrieve the $c variable for reading */ ECHO $9 /* Echo the retrieved variable 9 */
<?php $a = 123; $b = 456; $c = $a + $b; echo $c; ?>
Note: Each time $a is accessed we look it up in symbol table and store result in a different VAR
© 2006 IBM Corporation52
VAR variables
typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; … etc} temp_variable;
ZVAL
typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; … etc} temp_variable;
ZVALpDataPtr
symbol_table
After FETCH_R
After FETCH_RW or FETCH_W
pDataPtr
symbol_table
© 2006 IBM Corporation53
Compiled Variables Introduced in PHP 5.1 Avoids need for expensive symbol table lookup EVERY TIME a symbol is
referenced The “var” field in the operands znode contains the index into the CV cache When variable is initialized at runtime engine looks up symbol in symbol table
and stores zval ** in a CV cache addressed from zend– Hash value of variable calculated at compile time which allows “quick” HT
functions to be used at runtime– Subsequent uses of CV avoid symbol table lookup
All references to same symbol by a function/method refer to same CV– Unlike temporary variables
Only supported for simple variables– i.e. not object properties, auto globals or “this” pointer – For more information: See Sara Golemon’s blog on subject:
http://blog.libssh2.org/index.php?/archives/21-Compiled-Variables.html
© 2006 IBM Corporation54
Compiled Variables – compile time processing
An array of eligible variables constructed at compile time by lookup_CV()– Address of array stored in “vars” field of zend_op_array– For any variable eligible to be a CV compiler walks the current “vars” array to check for a
match. • If found index returned. • If not found then its added in next free slot
– Name and length of symbol– Hash code
• Array allocated from heap. When array fills its extended by 16 entries by erealloc
struct _zend_op_array {……
zend_compiled_variable *vars; int last_var, size_var;…e.t.c
};
last_var contains index of last slot used, size_var last available index
typedef struct zend_compiled_variable { char *name; int name_len; ulong hash_value;} zend_compiled_variable;
© 2006 IBM Corporation55
Compiled Variables – At runtime To access a CV, we extract CV number from the znode and use as index into CV cache.
– If CV cache slot non-zero then you have the zval **!! No symbol table lookup– If CV cache slot is zero then it’s the first reference to X so:
• Lookup X in symbol table using pre-computed hash, i.e. zend_hash_quick_find()• First lookup of a symbol in a function will also fail so symbol is also added to symbol
table at this point if lookup for W or RW using information in “vars” array. Uses zend_hash_quick_update() using pre-computed hash again.
• If lookup is for R then CV cache set to address EG(uninitialized_zval)• Save returned zval** for X saved in CV cache
– On userspace “unset” the CV cache is entry is set to NULL to force ht lookup on next reference to symbol
.....zval *** CVs
….
zval **
execute_data
CV cache
ZVAL
zval *
symbol table
…..
zend_uint var;
….
znode
pDataPtr slot of
a HT bucket
© 2006 IBM Corporation56
Compiled variables:with regular variables
<?php $a = 123; $b = 456; $c = $a + $b; echo $c; ?>
FETCH_W $0, 'a' /* Retrieve the $a variable for writing */ ASSIGN $0, 123 /* Assign the numeric value 123 to retrieved variable 0 */ FETCH_W $2, 'b' /* Retrieve the $b variable for writing */ ASSIGN $2, 456 /* Assign the numeric value 456 to retrieved variable 2 */ FETCH_R $5, 'a' /* Retrieve the $a variable for reading */ FETCH_R $6, 'b' /* Retrieve the $b variable for reading */ ADD ~7, $5, $6 /* Add the retrieved variables (5 & 6) together and store the result in 7 */ FETCH_W $4, 'c' /* Retrieve the $c variable for writing */ ASSIGN $4, ~7 /* Assign the value in temporary variable 7 into retrieved variable 4 */ FETCH_R $9, 'c' /* Retrieve the $c variable for reading */ ECHO $9 /* Echo the retrieved variable 9 */
ASSIGN !0, 123 /* Assign the numeric value 123 to compiled variable 0 */ ASSIGN !1, 456 /* Assign the numeric value 456 to compiled variable 1 */ ADD ~2, !0, !1 /* Add compiled variable 0 to compiled variable 1 */ ASSIGN !2, ~2 /* Assign the value of temporary variable 2 to compiled variable 2 */ ECHO !2 /* Echo the value of compiled variable 2 */
Without CV
With CV
© 2006 IBM Corporation57
Compiled variables : with Object variables
<?php $f->a = 123; $f->b = 456; $f->c = $f->a + $F->b; echo $f->c; ?>
ASSIGN_OBJ !0, 'a' /* Assign the numeric value 123 to property 'a' of compiled variable 0 object */ OP_DATA 123 /* Additional data for ASSIGN_OBJ opcode */ ASSIGN_OBJ !0, 'b' /* Assign the numeric value 456 to property 'b' of compiled variable 0 object */ OP_DATA 456 /* Additional data for ASSIGN_OBJ opcode */ FETCH_OBJ_R $3, !0, 'a‘ /* Retrieve property 'a' from compiled variable 0 object */FETCH_OBJ_R $4, !0, 'b‘ /* Retrieve property 'b' from compiled variable 0 object */ADD ~5, $3, $4 /* Add those values and store the result in temp var 5 */ASSIGN_OBJ !0, 'c' /* Assign the ADD result to property 'c' of compiled variable 0 object */OP_DATA ~5 /* Additional data for ASSIGN_OBJ opcode */ FETCH_OBJ_R $6, !0, 'c‘ /* Retrieve property 'c' from compiled variable 0 object */ ECHO $6 /* Echo the value */
With CV
Note: Properties are re-fetched every time a read or write is performed on them which cannot be avoided due to the magic methods _get(), _set() e.t.c. which can return a different variable on different fetches
© 2006 IBM Corporation58
Symbol tables One for global scope created during RINIT processing by call to init_executor()
– Add “GLOBALS entry to new symbol table; which is a recursive reference– Populated with requested super globals by call to php_hash_environment()
• _POST, _GET, and _COOKIE• if INI register_argc_argc=on specified then argv and argc added • if INI auto_globals_jit = off specified then _ENV, _SERVER and _REQUEST added
– If auto_globals_jit=on they are added by compiler if reference found; see zend_is_auto_global()
• if INI register_long_arrays=on specified long versions of _ENV, _POST etc • if INI register_globals=yes the adds all globals to symbol table
Symbol tables created for each called function/method at time of call – created and freed in zend_do_fcall_common_helper()– Hashtable for Symbol table allocated from a cache if any available
• Up to 32 Symbol table HashTable’s cached • Hashtables are cleared before being added to cache so no need to initialize on
allocation from cache– Otherwise allocate and init a new HashTable
Symbols added to symbol table at runtime on first reference to a symbol– On reference to a symbol we do a Hashtable lookup; if lookup fails we add it
• Initially its set to reference a special zval of uninitialized_zval until its assigned to
© 2006 IBM Corporation59
Which symbol table to use ?
EX(active_symbol_table) contains reference to current function/method symbol table
However, as super/auto globals only stored in global scope symbol table what happens if a function references one ?
– this is where the EA attribute of znode comes into play !!• EA for op2 set by complier to direct interpreter to FETCH from global symbol table
EX(symbol_table) rather than current symbol table EX(active_symbol_table) when it finds a reference to an auto_global
– complier checks every symbol against auto_globals hash table– see fetch_simple_variable_ex()
– different flag set in EA for static variable, class static etc• see zend_get_target_symbol_table()
<?php
function foo() {
$a = $_ENV;
}
foo();
?>
line # op fetch ext operands
-------------------------------------------------------------------------------
9 0 FETCH_R global $0, '_ENV'
1 ASSIGN !0, $0
10 2 RETURN null
3 ZEND_HANDLE_EXCEPTION
© 2006 IBM Corporation60
Static Variables
Hashtable referenced by zend_op-array details all statics, if any, defined for a function/method
– When a static is accessed at runtime an entry is added to local symbol table (EG(active_symbol_table)
– Entry made to reference same zval referenced by static_variables Hashtable in a “Change on Write” set • just like a parameter passed by reference
© 2006 IBM Corporation61
Static Variables
<?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo();?>
line # op fetch ext operands
-------------------------------------------------------------------------------
5 0 FETCH_W static $0, 'count'
1 ASSIGN_REF !0, $0
6 2 POST_INC ~1, !0
3 FREE ~1
8 4 RETURN null
5 ZEND_HANDLE_EXCEPTION
vld output for foo()
count value=0 is_ref=0 refount=1
zvalcount
static_variablesEG(active_symbol_table)
value=0 is_ref=1 refount=2
<?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo();?>
<?php function foo() { static $count = 0; $count ++; } foo(); foo(); foo();?>
value=1 is_ref=1 refount=2
© 2006 IBM Corporation62
Function calling Opcode sequence depends if target known at compile time
– Target function name is not known at compile time if• Call site before user function definition• Conditional functions used• Referred to in code as “dynamic function call”
If complier cannot verify function name at compile time then sequence is – INIT_FCALL_BY_NAME
• performs a runtime check on function name – SEND_* for each argument.
• EA set to force arg checks for “pass by ref” at runtime– DO_FCALL_BY_NAME
If compiler can verify function name then sequence is just– SEND_* for each argument– DO_FCALL
foo(&£a, $b, 100)
SEND_REF SEND_VAR SEND_VAL
© 2006 IBM Corporation63
Function calling
<?php $a= 10; $b= 5; foo($a, $b); function foo(&$x, &$y) { echo "foo called with: $x $y"; } foo($a, $b); ?>
line # op fetch ext operands---------------------------------------------------- ------------------- 2 0 ECHO '%0D%0A+' 4 1 ASSIGN !0, 10 5 2 ASSIGN !1, 5 6 3 INIT_FCALL_BY_NAME 'foo' 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL_BY_NAME 2 0 8 7 NOP 14 8 SEND_REF !0 9 SEND_REF !1 10 DO_FCALL 2 'foo', 0 17 11 RETURN 1 12 ZEND_HANDLE_EXCEPTION
SEND_VAR opcode’s extended_value set when FCALL_BY_NAME to force SEND_VAR handler to check expected args at RUNTIME and if call by REF expected it re-dispatches SEND_REF handler
Uses EX(fbc) set by INIT_FCALL_BY_NAME to access required arg info.
extended_value on FCALL opcode
is number of arguments passed
opcodes for global scope
© 2006 IBM Corporation64
Calling other user functions
When user space function calls another user function or method then arguments pushed to a LIFO argument stack
– Address in EG(argument stack)
– Initial argument stack of 64 slots is allocated by init_executor()
– When full a new stack twice the size allocated
ZEND_SEND_* opcodes for each argument
– pushes a zval * to argument stack
– zval’s are split when necessary
© 2006 IBM Corporation65
Example 1: Passing arguments by value without splitting
line # op ext operands-------------------------------------------------- 3 0 NOP 8 1 ASSIGN !0, 10 9 2 ASSIGN !1, 5 10 3 ASSIGN !2, !0 12 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL 2 'foo', 0 15 7 RETURN 1 8 ZEND_HANDLE_EXCEPTION
NULL
NULL
arg1
arg2
2
value=10
refcount= 3
is_ref= 0
argument_stack
<?php function foo($x, $y) { echo "foo called with: $x $y"; } $a= 10; $b= 5; $c= $a foo($a, $b); ?>
value=5
refcount=2
is_ref= 0
a
b
c
symbol_table
After opcode #5:number of args
opcodes for global scope
no need to split zval for $aas its part of a “copy onwrite” set
© 2006 IBM Corporation66
Example 2: Passing arguments by value when splitting required
line # op ext operands------------------------------------------------- 3 0 NOP 8 1 ASSIGN !0, 10 9 2 ASSIGN !1, 5 10 3 ASSIGN !2, !0 12 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL 2 'foo', 0 15 7 RETURN 1 8 ZEND_HANDLE_EXCEPTION
<?php function foo($x, $y) { echo "foo called with: $x $y"; } $a= 10; $b= 5; $c= &$a foo($a, $b); ?>
After opcode #5:
NULL
NULL
arg1
arg2
2
value=10
refcount= 2
is_ref= 1
argument_stack
value=5
refcount=2
is_ref= 0
a
b
c
symbol_table
value=10
refcount=1
is_ref= 0
we have to split zval for $a as its part of a “change onwrite” set
opcodes for global scope
© 2006 IBM Corporation67
Example 3:Passing arguments by reference when splitting required
NULL
NULL
arg1
arg2
2
value=10
refcount= 1
is_ref= 0
argument_stack
<?php function foo($x, $y) { echo "foo called with: $x $y"; } $a= 10; $b= 5; $c = $a foo(&$a, &$b); ?>
value=10
refcount=2
is_ref= 1
a
b
c
symbol_table
CV is actually in op1 not result operand
line # op ext operands------------------------------------------------ 3 0 NOP 8 1 ASSIGN !0, 10 9 2 ASSIGN !1, 5 10 3 ASSIGN !2, !0 11 4 SEND_REF !0 5 SEND_REF !1 6 DO_FCALL 2 'foo', 0 14 7 RETURN 1 8 ZEND_HANDLE_EXCEPTION
value=10
refcount= 2
is_ref= 1
After opcode #5:
here we have to split zval for $a as its part of a “copy onwrite” set
© 2006 IBM Corporation68
Example 4:Passing arguments by reference (compile time)
NULL
NULL
arg1
arg2
2
value=10
refcount= 1
is_ref= 0
argument_stack
<?php $a= 10; $b= 5; $c= $a; foo($a, $b); function foo(&$x, &$y) { echo "foo called with: $x $y"; } ?>
value=10
refcount=2
is_ref= 1
a
b
c
symbol_table
line # op ext operands---------------------------------------------------- 5 0 ASSIGN !0, 10 6 1 ASSIGN !1, 5 7 2 ASSIGN !2, !0 8 3 INIT_FCALL_BY_NAME 'foo' 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL_BY_NAME 2 0 10 7 NOP 16 8 RETURN 1 9 ZEND_HANDLE_EXCEPTION
value=10
refcount= 2
is_ref= 1
After opcode #5:
As its FCALL_BY_NAME extra checks in SEND_VAR kick in to check arg info for compile time call by ref. If so redispatch SEND_REF to do right thing !
Not shown but op2 znode is used
to save argument number which is
used to index into arg info structure
© 2006 IBM Corporation69
Receiving arguments passed by value
NULL
NULL
arg1
arg2
2
value=10
refcount= 4
is_ref= 0
argument_stack
<?php function foo($x, $y) { echo "foo called with: $x $y"; } $a= 10; $b= 5; $c = $a foo($a, $b); ?>
value=10
refcount=3
is_ref= 0
a
b
c
callers symbol_table
line # op ext operands-------------------------------------------------- 3 0 RECV 1 1 RECV 2 4 2 INIT_STRING ~0 3 ADD_STRING ~0, ~0, 'foo' 4 ADD_STRING ~0, ~0, '+' 5 ADD_STRING ~0, ~0, 'called‘ e.t.c
After opcode #1:
x
y
callee symbol_table
this is argument number
Result op is CV for arg
© 2006 IBM Corporation70
Receiving arguments passed by reference
NULL
NULL
arg1
arg2
2
value=10
refcount= 1
is_ref= 0
argument_stack
<?php function foo($x, $y) { echo "foo called with: $x $y"; } $a= 10; $b= 5; $c = $a foo(&$a, &$b); ?>
value=10
refcount=3
is_ref= 1
a
b
c
symbol_table
line # op ext operands-------------------------------------------------- 3 0 RECV 1 1 RECV 2 4 2 INIT_STRING ~0 3 ADD_STRING ~0, ~0, 'foo' 4 ADD_STRING ~0, ~0, '+' 5 ADD_STRING ~0, ~0, 'called‘ e.t.c
value=10
refcount= 3
is_ref= 1
After opcode #1:
x
y
symbol_table
© 2006 IBM Corporation71
Function binding <?php
function a() {
echo "called function a";
}
function b() {
echo "called fucntio b";
}
a();
b();
?>
line # op fetch ext operands
-------------------------------------------------------------------------------
3 0 NOP
6 1 NOP
10 2 DO_FCALL 0 'a', 0
11 3 DO_FCALL 0 'b', 0
14 4 RETURN 1
5 ZEND_HANDLE_EXCEPTION
What are these NOP’s ?
© 2006 IBM Corporation72
Function binding They are artefacts of “function binding” When compiler encounters a function declaration in a script it generates a
“ZEND_DECLARE_FUNCTION” opcode in current opcode array – op1 is long function name
• \0<function name><file name><address>– “ fooC:\Testcases\tes.php0012CD38”
• where address is character position of last char of function prototype in scripts buffer– op2 is short name, i.e. just ”foo”– a function table entry is added by compiler for long name
After parsing function body and generating its zend_op_array compiler then performs “early binding” for the unconditional functions
– Effectively executes opcode at compile time. See zend_do_early_binding().– Opcode checks for duplicate function names
• Looks up function table entry for long name. This should always be successful !!• Attempts to add a function entry with short name using zend_function_entry just
retrieved. If this fails we have a duplicate function name and an error message is produced detailing filename and line number of previous declaration.
– If no duplicate then the ZEND_DECLARE_FUNCTION opcode is converted to a NOP• opcode set to ZEND_NOP• op1 and op2 set to UNUSED and zval’s for name strings freed
– Deletes function table entry for “long name”
© 2006 IBM Corporation73
Conditional Functions
Same function name can be defined multiple time with different content and/or signature
– A zend_op_array generated for each different version of a conditional function
Which function gets executed not known until runtime
– So function binding delayed until runtime
– ZEND_DECLARE_FUNCTION perists on complier output
<?php $a= 10; if (a > 10) { function foo() { echo "foo has no parms"; } } else { function foo($a) { echo "foo has 1 parm"; } } if (a > 10) { foo(); } else { foo($a); }
© 2006 IBM Corporation74
Conditional Functions
line # op fetch ext operands
-------------------------------------------------------------------------------
2 0 ECHO '%0D%0A+'
5 1 ASSIGN !0, 10
7 2 FETCH_CONSTANT ~1, 'a'
3 IS_SMALLER ~2, 10, ~1
4 JMPZ ~2, ->7
8 5 ZEND_DECLARE_FUNCTION '%00fooC%3A%5CEclipse-PHP%5C
workspace%5CTestcases%5Ctest.php0140E973', 'foo'
12 6 JMP ->8
13 7 ZEND_DECLARE_FUNCTION '%00fooC%3A%5CEclipse-PHP%5C
workspace%5CTestcases%5Ctest.php0140E9B9', 'foo'
20 8 FETCH_CONSTANT ~3, 'a'
9 IS_SMALLER ~4, 10, ~3
10 JMPZ ~4, ->14
. . . e.t.c . . .
© 2006 IBM Corporation75
Conditional Functions
line # op fetch ext operands
-------------------------------------------------------------------------------
10 0 ECHO 'foo+has+no+parms'
11 1 RETURN null
2 ZEND_HANDLE_EXCEPTION
line # op fetch ext operands
-------------------------------------------------------------------------------
13 0 RECV 1
15 1 ECHO 'foo+has+1+parm'
16 2 RETURN null
3 ZEND_HANDLE_EXCEPTION
zend_op_array for foo()
zend_op_array for foo($a)
© 2006 IBM Corporation76
Exception Handling
<?php function foo($x) { if ($x > 1 ) { throw new Exception; } } try { foo(1); } catch (Exception $e) { echo "exception 1"; } try { foo(2); } catch (Exception $e) { echo "exception 2"; } ?>
line # op fetch ext operands------------------------------------------------------------------------------- 3 0 NOP 11 1 SEND_VAL 1 2 DO_FCALL 1 'foo', 0 13 3 ZEND_FETCH_CLASS :1, 'Exception' 4 ZEND_CATCH null, 'e' 14 5 ECHO 'exception+1' 18 6 SEND_VAL 2 7 DO_FCALL 1 'foo', 0 20 8 ZEND_FETCH_CLASS :3, 'Exception' 9 ZEND_CATCH null, 'e' 21 10 ECHO 'exception+2' 25 11 RETURN 1 12 ZEND_HANDLE_EXCEPTION
line # op fetch ext operands------------------------------------------------------------------------------- 3 0 RECV 1 5 1 IS_SMALLER ~0, 1, !0 2 JMPZ ~0, ->8 6 3 ZEND_FETCH_CLASS :1, 'Exception' 4 NEW $2, :1 5 DO_FCALL_BY_NAME 0 0 6 ZEND_THROW $2 7 7 JMP ->8 8 8 RETURN null 9 ZEND_HANDLE_EXCEPTION
not shown by VLD but extended value of CATCH opcode contains opcode number to branch too if exception not thrown.
if no exception thrown during TRY block then we actually execute the ZEND_FETCH_CLASS and ZEND_CATCH opcodes. ZEND_CATCH on finding no exception thrown dispatches first opcode after end of catch block, i.e 6 in this case
© 2006 IBM Corporation77
Exception handling
When ZEND_THROW opcode executes it sets EG(exception) and EG(opline_before_exception) before dispatching the ZEND_HANDLE_EXCEPTION opcode at end of current op array
– see zend_throw_exception_internal()
ZEND_HANDLE_EXCEPTION opcode handler checks all try/catch blocks in current scope to see if the range they cover includes the last opcode executed in current scope before exception. If any dispaches the firstopcode of the catch block which will be ZEND_FETCH_CLASS
• uses an array built by compiler which defines scope of eah try/catch block• array records scope in terms of opcode number of first try block opcode
and opcode number of first catch block opcode– if none then return to caller
• on seeing EG(exception) still set return processing sets EG(opline_before_exception) to the last opcode executed in the caller, i.e. the FCALL opcode, and then sets next opcode in caller to the ZEND_HANDLE_EXCEPTION opcode at end of callers opcode array
– Repeat until a catch block found or we return from global scope and which point if EG(execption) set “uncaught exception” error msg produced
© 2006 IBM Corporation78
Exception handling
struct _zend_op_array {
…….
zend_try_catch_element *try_catch_array;
int last_try_catch;
….. etc
};
typedef struct _zend_try_catch_element { zend_uint try_op; zend_uint catch_op; /* ketchup! */
} zend_try_catch_element;
struct _zend_execute_data {
struct _zend_op *opline;
…….
zend_op_arrary *op_array
….. etc
};
contains opcode number of first opcode of
try and catch blocks
array is realloacted as every try/catch
block found by compiler
© 2006 IBM Corporation79
Exception Handling
try_op = 1catch_op = 3
try_op = 6catch_op = 8
try_catch_array
last_try_catch = 2
zend_op_array
try_catch_array
last_try_catch = 0
zend_op_array null
line # op fetch ext operands------------------------------------------------------------------------------- 3 0 NOP 11 1 SEND_VAL 1 2 DO_FCALL 1 'foo', 0 13 3 ZEND_FETCH_CLASS :1, 'Exception' 4 ZEND_CATCH null, 'e' 14 5 ECHO 'exception+1' 18 6 SEND_VAL 2 7 DO_FCALL 1 'foo', 0 20 8 ZEND_FETCH_CLASS :3, 'Exception' 9 ZEND_CATCH null, 'e' 21 10 ECHO 'exception+2' 25 11 RETURN 1 12 ZEND_HANDLE_EXCEPTION
line # op fetch ext operands------------------------------------------------------------------------------- 3 0 RECV 1 5 1 IS_SMALLER ~0, 1, !0 2 JMPZ ~0, ->8 6 3 ZEND_FETCH_CLASS :1, 'Exception' 4 NEW $2, :1 5 DO_FCALL_BY_NAME 0 0 6 ZEND_THROW $2 7 7 JMP ->8 8 8 RETURN null 9 ZEND_HANDLE_EXCEPTION