48
Byterun: A (C)Python interpreter in Python Allison Kaptur github.com/akaptur akaptur.github.io @akaptur

Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

  • Upload
    akaptur

  • View
    365

  • Download
    0

Embed Size (px)

DESCRIPTION

Byterun is a Python interpreter written in Python with Ned Batchelder. It's architected to mirror the structure of CPython (and be more readable, too)! Learn how the interpreter is constructed, how ignorant the Python compiler is, and how you use a 1,500 line switch statement every day.

Citation preview

Page 1: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Byterun: A (C)Python interpreter in Python

Allison Kaptur !

github.com/akaptur akaptur.github.io

@akaptur

Page 2: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Byterun with Ned Batchelder

!Based on

# pyvm2 by Paul Swartz (z3p) from http://www.twistedmatrix.com/users/z3p/

Page 3: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Why would you do such a thing

>>> if a or b: ... do_stuff()

Page 4: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

out = "" for i in range(5): out = out + str(i) print(out)

Page 5: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

def fn(a, b=17, c="Hello", d=[]): d.append(99) print(a, b, c, d) !fn(1) fn(2, 3) fn(3, c="Bye") fn(4, d=["What?"]) fn(5, "b", "c")

Page 6: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

def verbose(func): def _wrapper(*args, **kwargs): return func(*args, **kwargs) return _wrapper !@verbose def add(x, y): return x+y !add(7, 3)

Page 7: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

try: raise ValueError("oops") except ValueError as e: print("Caught: %s" % e) print("All done")

Page 8: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can doclass NullContext(object): def __enter__(self): l.append('i') return self ! def __exit__(self, exc_type, exc_val, exc_tb): l.append('o') return False !l = [] for i in range(3): with NullContext(): l.append('w') if i % 2: break l.append('z') l.append('e') !l.append('r') s = ''.join(l) print("Look: %r" % s) assert s == "iwzoeiwor"

Page 9: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Some things we can do

g = (x*x for x in range(3)) print(list(g))

Page 10: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

A problem

g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Page 11: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

The Python virtual machine: !

A bytecode interpreter

Page 12: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: the internal representation of a python

program in the interpreter

Page 13: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans

Page 14: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code

Function Code object

Bytecode

Page 15: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code '|\x00\x00|\x01\x00\x16}\x02\x00|\x02\x00S'

Page 16: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Bytecode: it’s bytes!

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code ‘|\x00\x00|\x01\x00\x16}\x02\x00|\x02\x00S' >>> [ord(b) for b in mod.func_code.co_code] [124, 0, 0, 124, 1, 0, 22, 125, 2, 0, 124, 2, 0, 83]

Page 17: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

dis, a bytecode disassembler

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Page 18: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

dis, a bytecode disassembler

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Line Number

Index in bytecode Instruction

name, for humans

More bytes, the argument to each

instruction

Hint about arguments

Page 19: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

whatever

some other thing

something

whatever

some other thing

something

a

b

whatever

some other thing

something

ans

Before After BINARY_MODULO

After LOAD_FAST

Data stack on a frame

Page 20: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [<foo>] k ---------------------

Page 21: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [<bar>, 1] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 22: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [1, 2] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 23: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [3] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 24: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [3] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 25: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [3] k ---------------------

Page 26: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

dis, a bytecode disassembler

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Page 27: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014
Page 28: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

} /*switch*/

/* Main switch on opcode */ READ_TIMESTAMP(inst0); !switch (opcode) {

Page 29: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

#ifdef CASE_TOO_BIG default: switch (opcode) { #endif

/* Turn this on if your compiler chokes on the big switch: */ /* #define CASE_TOO_BIG 1 */

Page 30: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Back to that bytecode

!>>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) ! 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Page 31: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

case LOAD_FAST: x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); PUSH(x); goto fast_next_opcode; } format_exc_check_arg(PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg)); break;

Page 32: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

case BINARY_MODULO: w = POP(); v = TOP(); if (PyString_CheckExact(v)) x = PyString_Format(v, w); else x = PyNumber_Remainder(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;

Page 33: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

It’s “dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3

Page 34: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“Dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Gotham”))

Page 35: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“Dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Gotham”)) PyGotham

Page 36: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“Dynamic”

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Gotham”)) PyGotham >>> print “%s%s” % (“Py”, “Gotham”) PyGotham

Page 37: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

case BINARY_MODULO: w = POP(); v = TOP(); if (PyString_CheckExact(v)) x = PyString_Format(v, w); else x = PyNumber_Remainder(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;

Page 38: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

>>> class Surprising(object): … def __mod__(self, other): … print “Surprise!” !>>> s = Surprising() >>> t = Surprsing() >>> s % t Surprise!

Page 39: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

“In the general absence of type information, almost every instruction must be treated as INVOKE_ARBITRARY_METHOD.”

!- Russell Power and Alex Rubinsteyn, “How Fast Can

We Make Interpreted Python?”

Page 40: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Back to our problem

g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Page 41: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [<foo>] k ---------------------

Page 42: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [<bar>, 1] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 43: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [1, 2] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 44: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c --------------------- a | bar Frame | -> blocks: [] l | (newest) | -> data: [3] l --------------------- | foo Frame | -> blocks: [] s | | -> data: [] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 45: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l --------------------- | foo Frame | -> blocks: [] s | | -> data: [3] t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [] k ---------------------

Page 46: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

def foo(): x = 1 def bar(y): z = y + 2 # <--- (3) return z return bar(x) # <--- (2) foo() # <--- (1) !c a l l !s t --------------------- a | main (module) Frame | -> blocks: [] c | (oldest) | -> data: [3] k ---------------------

Page 47: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

Back to our problem

g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Page 48: Allison Kaptur: Bytes in the Machine: Inside the CPython interpreter, PyGotham 2014

More

Great blogs http://tech.blog.aknin.name/category/my-projects/pythons-innards/ by @aknin http://eli.thegreenplace.net/ by Eli Bendersky !Contribute! Find bugs! https://github.com/nedbat/byterun !Apply to Hacker School! www.hackerschool.com/apply