Upload
sherilyn-mcbride
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
1
R Environment and Variable Lookup
Apr. 2012
2
R Environment and Variable Lookup
Outline
R Environment and Variable Lookup R Byte-Code Interpreter Variable Cache Mechanism Unboxed Value Cache Proposal Others
3
R Environment and Variable Lookup
R Environment Organization
Environment Frames– Frames are connected as a tree structure
// One variable binding cell structurestruct listsxp_struct { struct SEXPREC *carval; // value of the symbol struct SEXPREC *cdrval; // next binding cell struct SEXPREC *tagval; // symbol};
// One frame structurestruct envsxp_struct { struct SEXPREC *frame; struct SEXPREC *enclos; // parent struct SEXPREC *hashtab; // optional};
Frame B
Frame A
Frame C
enclos
enclos
Var binding cell Var binding cell
SEXP symbol
SEXP value
SEXP symbol
SEXP value
R_nilltag car
cdr cdr
tag car
hashtabHashtable
frame
4
R Environment and Variable Lookup
R Environment Organization (2)
Hashtable– Implemented by VECSXP structure• A vector, each vector element is a R SEXP object (listsxp_struct)
– Calculating the buckle number• Hash(symbol) & hashTableMask
Var binding cell Var binding cell
SEXP symbol
SEXP value
SEXP symbol
SEXP value
R_NilValuetag car
cdr cdr
tag car
Var binding cell Var binding cell
SEXP symbol
SEXP value
SEXP symbol
SEXP value
R_NilValuetag car
cdr cdr
tag car
Hashtable
VECSXP object
Buckle 0Buckle 1Buckle 2Buckle 3
5
R Environment and Variable Lookup
R Environment – Variable Lookup
Steps– Get the environment frame• From the current execution frame• Or from recursive lookup
– Check if it has the hashtable• No: – start from the first binding cell, do list search, compare symbol
• Yes:– Calculate the hash buckle number– Get the corresponding buckle’s first binding cell, do list search, compare symbol
– No found: return R_NilValue– Found: could return a binding cell
6
R Environment and Variable Lookup
R Byte-Code Symbol
In R byte-code, each symbol has an index
A simple optimization– Use the index value to do directly look up
run <-function() { b <- a+202; print(b);};
GETVAR 1LDCONST 2ADD 3SETVAR 4POPGETFUN 5MAKEPROM 6CALL 7RETRUN
Idx Value
1 a
2 202
3 a+202
4 b
5 print
6 list(.Code, list(7L, GETVAR.OP, 0L, RETURN.OP), list(b)),
7 print(b)
Constant tableInstructions
Byte-code compiling
7
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache A cache to store the “bindings”– Not the exactly value
Cache size: 128– “More than 90% of the closures in base have constant pools with fewer than
128 entries when compiled” Cache Space Wasting– “On average about 1/3 of constant pool entries are symbols”– Optimization: re-order the constant table (not implemented)
Var binding cell Var binding cell
SEXP symbol
SEXP value
SEXP symbol
SEXP value
tag car tag car
Var binding cell
SEXP symbol
SEXP value
tag car
Var binding cell
SEXP symbol
SEXP value
tag car
index 0 1 2 3
8
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache (2)
Cache Storage – On Stack (by default)
Var binding cell
Var binding cell
Var binding cell
One Var
One Var
One Var
One Var
One Var
One Var
…
…
Stack top
Curr
ent f
ram
ePr
evio
us fr
ame
128 entries
9
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache (3) The reason to cache binding cell, not the exactly value– Easy for child frames to modify the value
Frame AA
Frame A
Frame AAA
enclos
enclos
Var binding cell
Symbol
aValue
5
tag carframe
Var binding cell
Var binding cell
Var binding cell
One Var
One Var
One Var
… AA’s
fram
e
AA <-function() { a <- 5 AAA() print(a);};
AAA <-function() { a <<- 100};
Case 1AAA <-function() { … //remove parent //frame’s val “a”};
Case 2
In AAA: set the binding cell’s value to 100.
Return back to AA, the value got in the cache is the right value
In AAA: set the binding cell’s value to “unbounded” value
Return back to AA, the value got in cache is “unbounded value” try to look up “a” in AA’s parent frame
10
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache (4) Cache Target: only variables defined in current frame– Not variables found in parent frame– Reason: intersection define problems
Example: Suppose AAA caches g parent A’s “a”
Frame AA
Frame A
Frame AAA
enclos
enclosVar binding cell
Var binding cell
Var binding cell
One Var
One Var
One Var
… AAA’
s fr
ameVar binding cell:
a 5
Frame AAAAenclos
Var binding cell: a 100
A <-function() { a <- 5; AA()};
AAA <-function() { b <- a; //cache a AAAA() print(a); //use a};
AA <-function() { AAA()};
AAAA <-function() { ...//define “a” //in AA frame};
The second using “a” should use the one in AA frame’sIf using the cached one incorrect semantics
cache
Define later
11
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache Steps
Used in SETVAR, GETVAR and similar instructions Two modes– SmallCache: constant table size <= 128• Use symbol index as direct reference• Get the binding cell
– Normal: constant table size > 128• Symbol index % 128 reference number• Get the binding cell• Compare the binding cell’s symbol
Cache initial value– R_NilValue
12
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache Steps (2)
SETVAR– Finding Cell Step• SmallCache– Get the binding cell directly by symbol index may return R_nilValue
• Normal– Get the binding cell by symbol index, symbol» If get the cell with right symbol and value is not unbounded return the cell» Use base method to find variable in local frame» Find the ncell in current frame update the local cache, and return the ncell» Not find and if the cell from previous step is not null but is unbounded value
Clean the local cache (The value is totally removed, no need cache)
– Setting Value Step• Use the cell to update the value directly• If the cell is R_NilValue Use base method to define a var in local frame
13
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache Steps (3) SETVAR cache update Normal mode– SETVAR first time: • Finding Cell Step: No valid binding cell• Setting Value Step: use base method to define a var
– SETVAR second time:• Finding Cell Step: find the cell and update the cache• Setting Value Step: use the cell to directly update the value
SETVAR in SmallCache Mode (Pure SETVAR)– SETVAR first time:• Finding Cell Step: No valid binding cell• Setting Value Step: use base method to define a var
– SETVAR second time:• Finding Cell Step: No valid binding cell because it only uses directly lookup– Still not update the cell
• Setting Value Step: use base method to define a var
14
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache Steps (4) GETVAR– SmallCache Mode• Directly lookup the cell– Invalid cell goto Normal Model– Valid Cell» Check the value type, may return the value directly, or force promoise
– Normal Mode• Get the binding cell by symbol index, symbol– If get the cell with right symbol and value is not unbounded return the cell– Use base method to find variable in local frame– Find the ncell in current frame update the local cache, and return the ncell– Not find and if the cell from previous step is not null but is unbounded value Clean the
local cache (The value is totally removed, no need cache)• Use the returned cell to get the value– May return the valid value or return error
15
R Environment and Variable Lookup
R byte-code Interpreter Variable Cache Steps (5) GETVAR in SmallCache Model followed by SETVAR– SETVAR: not update the cache at all– GETVAR: first time• Goto normal model Update the Cache• Return the value
– GETVAR: second time• Return the value use the cache
PC STMT 1 LDCONST, 1 3 SETVAR, 2 5 POP 6 GETVAR, 2 8 SETVAR, 3 10 POP 11 GETVAR, 2 13 SETVAR, 3 15 POP 16 GETVAR, 2 18 SETVAR, 3 20 INVISIBLE 21 RETURNt
run <-function() { a <- 101; b <- a; #get var first time b <- a; #get var second time b <- a; #get var third time};
Set a
Get a: normal, and update cache
Get a: cache
Get a: cache
16
R Environment and Variable Lookup
R Byte-Code Interpreter Variable Cache Mechanism
Others– There are some additional codes to handle• Unbounded values• Force Promise if the symbol’s value is a promise• Missing value handing
Some conclusion– The cache mechanism is correct– But very complex due to the complex R semantics– Optimize the Cache Mechanism is possible• E.g. cache parent frame’s variable
– But should be very complex
17
R Environment and Variable Lookup
Unboxed Value Cache Proposal
Basic Assumptions– Not change the current cache mechanism– An additional cache only for unboxed values• Something like local register files
– Rules should be simple Basic Logic– GETVAR: Get the var from the byte-code interpreter logic,
unbox and populate the cache– SETVAR: only update the register files– Context Change• Box, and Write back using the byte-code interpreter logic• Context Change: function call, return, …
18
R Environment and Variable Lookup
Basic Cache Design One Cache and One Cache State
– Cache only store the value, not the binding cell• Each cell, 64 bit width: store unboxed Real, Int, Logical
– Each Cache State• Not valid: no value available, need get it and unbox it• Valid: an unboxed version is stored in the cache• Modified: the value in the cache is modified Need write back later
– Cache State also stores the type of the value– Global Cache Counter• NumModified: How many cache cell’s values are modified• If >0, need write back during context change
Cache
Cache
Cache
Cache
Cache State
Cache State
Cache State
Cache State
19
R Environment and Variable Lookup
Code Transformation For The Cache The Current Sequence Example
– It’s hard to populate the cache from this sequence Need combine– GETVAR + UNBOXREAL
About GUARD– No context change, no need additional guard
Define a New Instruction to replace the sequnce– GETUNBOXREAL
PC STMT...18 GETVAR, 220 GUARD, 2, 2023 UNBOXREAL
20
R Environment and Variable Lookup
Logical of GETUNBOXREAL GETUNBOXREAL– Check the cache’s state– Valid/Modified:• Directly return the unboxed value
– Not valid (first time or context changed)• Get var first• execute the guard logic– May fall back to the un-opt code
• If success– Populate the cache with the unboxed value , set valid state, and return the value
Also define SETUNBOX– If the value on top of the stack is unboxed, use the SETUNBOX to replace
SETVAR– The shape of the stack is known during compiling time
21
R Environment and Variable Lookup
Write-Back Policy
If meeting context change– Function call, return– Check the global state NumModified• = 0, no action• >0, iterate the cache– Use index to look for the symbol– Box value according to the type of this value– Set the var back
22
R Environment and Variable Lookup
Some Findings in the Latest R-2.15.0
R-2.15.0– Released Mar. 2012–Many function level improvement– No found R interpreter/byte-code interpreter/Runtime changes• Very draft performance evaluation: No big changes in micro-test• Our current working version is R-2.14.1
Another finding– Started from R-2.14.0, there is a package called “parallel”– High level parallel wrapper to some coarse grain computation
tasks• R-2.15.0: some new APIs. Something like map/reduce style