CC-BY
Programming with Pythonand PostgreSQL
Peter [email protected]
F-Secure Corporation
PostgreSQL Conference East 2011
Partitioning
• Part I: Client programming (60 min)• Part II: PL/Python (30 min)
Why Python?
Pros:• widely used• easy• strong typing• scripting, interactive use• good PostgreSQL support• client and server (PL) interfaces• open source, community-based
Pros:• no static syntax checks, must rely on test coverage• Python community has varying interest in RDBMS
Why Python?
Pros:• widely used• easy• strong typing• scripting, interactive use• good PostgreSQL support• client and server (PL) interfaces• open source, community-based
Pros:• no static syntax checks, must rely on test coverage• Python community has varying interest in RDBMS
Why Python?
Pros:• widely used• easy• strong typing• scripting, interactive use• good PostgreSQL support• client and server (PL) interfaces• open source, community-based
Pros:• no static syntax checks, must rely on test coverage• Python community has varying interest in RDBMS
Part I
Client Programming
Example
import psycopg2
dbconn = psycopg2.connect('dbname=dellstore2')cursor = dbconn.cursor()cursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")for row in cursor.fetchall():
print "Name: %s %s" % (row[0], row[1])cursor.close()db.close()
Drivers
Name License Platforms Py VersionsPsycopg LGPL Unix, Win 2.4–3.2PyGreSQL BSD Unix, Win 2.3–2.6ocpgdb BSD Unix 2.3–2.6py-postgresql BSD pure Python 3.0+bpgsql (alpha) LGPL pure Python 2.3–2.6pg8000 BSD pure Python 2.5–3.0+
More details
• http://wiki.postgresql.org/wiki/Python
• http://wiki.python.org/moin/PostgreSQL
Drivers
Name License Platforms Py VersionsPsycopg LGPL Unix, Win 2.4–3.2PyGreSQL BSD Unix, Win 2.3–2.6ocpgdb BSD Unix 2.3–2.6py-postgresql BSD pure Python 3.0+bpgsql (alpha) LGPL pure Python 2.3–2.6pg8000 BSD pure Python 2.5–3.0+
More details
• http://wiki.postgresql.org/wiki/Python
• http://wiki.python.org/moin/PostgreSQL
DB-API 2.0
• the standard Python database API• all mentioned drivers support it• defined in PEP 249• discussions: [email protected]• very elementary (from a PostgreSQL perspective)• outdated relative to Python language development• lots of extensions and incompatibilities possible
Higher-Level Interfaces
• Zope• SQLAlchemy• Django
Psycopg Facts
• Main authors: Federico Di Gregorio, Daniele Varrazzo• License: LGPLv3+• Web site: http://initd.org/psycopg/
• Documentation: http://initd.org/psycopg/docs/• Git, Gitweb
• Mailing list: [email protected]• Twitter: @psycopg• Latest version: 2.4 (February 27, 2011)
Using the Driver
import psycopg2
dbconn = psycopg2.connect(...)...
Driver Independence?
import psycopg2
dbconn = psycopg2.connect(...) # hardcodes driver name
Driver Independence?
import psycopg2 as dbdriver
dbconn = dbdriver.connect(...)
Driver Independence?
dbtype = 'psycopg2' # e.g. from config filedbdriver = __import__(dbtype,
globals(), locals(),[], -1)
dbconn = dbdriver.connect(...)
Connecting
# libpq-like connection stringdbconn = psycopg2.connect('dbname=dellstore2
host=localhost port=5432')
# samedbconn = psycopg2.connect(dsn='dbname=dellstore2
host=localhost port=5432')
# keyword arguments# (not all possible libpq options supported)dbconn = psycopg2.connect(database='dellstore2',
host='localhost',port='5432')
DB-API 2.0 says: arguments database dependent
“Cursors”
cursor = dbconn.cursor()
• not a real database cursor, only an API abstraction• think “statement handle”
Server-Side Cursors
cursor = dbconn.cursor(name='mycursor')
• a real database cursor• use for large result sets
Executing
# queriescursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")
# updatescursor.execute("UPDATE customers SET password = NULL")print "%d rows updated" % cursor.rowcount
# or anything elsecursor.execute("ANALYZE customers")
Fetching Query Results
cursor.execute("SELECT firstname, lastname FROM ...")cursor.fetchall()
[('AABBKO', 'DUTOFRPLOK'),('AABTSI', 'ZFCKMPRVVJ'),('AACOHS', 'EECCQPVTIW'),('AACVVO', 'CLSXSGZYKS'),('AADVMN', 'MEMQEWYFYE'),('AADXQD', 'GLEKVVLZFV'),('AAEBUG', 'YUOIINRJGE')]
Fetching Query Results
cursor.execute("SELECT firstname, lastname FROM ...")for row in cursor.fetchall():
print "Name: %s %s" % (row[0], row[1])
Note: field access only by number
Fetching Query Results
cursor.execute("SELECT firstname, lastname FROM ...")for row in cursor.fetchall():
print "Name: %s %s" % (row[0], row[1])
Note: field access only by number
Fetching Query Results
cursor.execute("SELECT firstname, lastname FROM ...")row = cursor.fetchone()if row is not None:
print "Name: %s %s" % (row[0], row[1])
Fetching Query Results
cursor.execute("SELECT firstname, lastname FROM ...")for row in cursor:
print "Name: %s %s" % (row[0], row[1])
Fetching Query Results in Batches
cursor = dbconn.cursor(name='mycursor')cursor.arraysize = 500 # default: 1cursor.execute("SELECT firstname, lastname FROM ...")while True:
batch = cursor.fetchmany()break if not batchfor row in batch:
print "Name: %s %s" % (row[0], row[1])
Fetching Query Results in Batches
cursor = dbconn.cursor(name='mycursor')cursor.execute("SELECT firstname, lastname FROM ...")cursor.itersize = 2000 # defaultfor row in cursor:
print "Name: %s %s" % (row[0], row[1])
Getting Query Metadata
cursor.execute("SELECT DISTINCT state, zip FROMcustomers")
print cursor.description[0].nameprint cursor.description[0].type_codeprint cursor.description[1].nameprint cursor.description[1].type_code
state1043 # == psycopg2.STRINGzip23 # == psycopg2.NUMBER
Passing Parameters
cursor.execute("""UPDATE customers
SET password = %sWHERE customerid = %s
""", ["sekret", 37])
Passing Parameters
Not to be confused with (totally evil):
cursor.execute("""UPDATE customers
SET password = '%s'WHERE customerid = %d
""" % ["sekret", 37])
Passing Parameters
cursor.execute("INSERT INTO foo VALUES (%s)","bar") # WRONG
cursor.execute("INSERT INTO foo VALUES (%s)",("bar")) # WRONG
cursor.execute("INSERT INTO foo VALUES (%s)",("bar",)) # correct
cursor.execute("INSERT INTO foo VALUES (%s)",["bar"]) # correct
(from Psycopg documentation)
Passing Parameters
cursor.execute("""UPDATE customers
SET password = %(pw)sWHERE customerid = %(id)s
""", {'id': 37, 'pw': "sekret"})
Passing Many Parameter Sets
cursor.executemany("""UPDATE customers
SET password = %sWHERE customerid = %s
""", [["ahTh4oip", 100],["Rexahho7", 101],["Ee1aetui", 102]])
Calling Procedures
cursor.callproc('pg_start_backup', 'label')
Data Types
from decimal import Decimalfrom psycopg2 import Date
cursor.execute("""INSERT INTO orders (orderdate, customerid,
netamount, tax, totalamount)VALUES (%s, %s, %s, %s, %s)""",[Date(2011, 03, 23), 12345,Decimal("899.95"), 8.875, Decimal("979.82")])
Mogrify
from decimal import Decimalfrom psycopg2 import Date
cursor.mogrify("""INSERT INTO orders (orderdate, customerid,
netamount, tax, totalamount)VALUES (%s, %s, %s, %s, %s)""",[Date(2011, 03, 23), 12345,Decimal("899.95"), 8.875, Decimal("979.82")])
Result:
"\nINSERT INTO orders (orderdate, customerid,\nnetamount, tax, totalamount)\nVALUES('2011-03-23'::date, 12345, 899.95, 8.875, 979.82)"
Data Types
cursor.execute("""SELECT * FROM orders WHERE customerid = 12345""")
Result:
(12002, datetime.date(2011, 3, 23), 12345,Decimal('899.95'), Decimal('8.88'),Decimal('979.82'))
Nulls
Input:
cursor.mogrify("SELECT %s", [None])
'SELECT NULL'
Output:
cursor.execute("SELECT NULL")cursor.fetchone()
(None,)
Booleans
cursor.mogrify("SELECT %s, %s", [True, False])
'SELECT true, false'
Binary DataStandard way:
from psycopg2 import Binarycursor.mogrify("SELECT %s", [Binary("foo")])
"SELECT E'\\\\x666f6f'::bytea"
Other ways:
cursor.mogrify("SELECT %s", [buffer("foo")])
"SELECT E'\\\\x666f6f'::bytea"
cursor.mogrify("SELECT %s",[bytearray.fromhex(u"deadbeef")])
"SELECT E'\\\\xdeadbeef'::bytea"
There are more. Check the documentation. Check the versions.
Binary DataStandard way:
from psycopg2 import Binarycursor.mogrify("SELECT %s", [Binary("foo")])
"SELECT E'\\\\x666f6f'::bytea"
Other ways:
cursor.mogrify("SELECT %s", [buffer("foo")])
"SELECT E'\\\\x666f6f'::bytea"
cursor.mogrify("SELECT %s",[bytearray.fromhex(u"deadbeef")])
"SELECT E'\\\\xdeadbeef'::bytea"
There are more. Check the documentation. Check the versions.
Date/Time
Standard ways:
from psycopg2 import Date, Time, Timestamp
cursor.mogrify("SELECT %s, %s, %s",[Date(2011, 3, 23),Time(9, 0, 0),Timestamp(2011, 3, 23, 9, 0, 0)])
"SELECT '2011-03-23'::date, '09:00:00'::time,'2011-03-23T09:00:00'::timestamp"
Date/Time
Other ways:
import datetime
cursor.mogrify("SELECT %s, %s, %s, %s",[datetime.date(2011, 3, 23),datetime.time(9, 0, 0),datetime.datetime(2011, 3, 23, 9, 0),datetime.timedelta(minutes=90)])
"SELECT '2011-03-23'::date, '09:00:00'::time,'2011-03-23T09:00:00'::timestamp, '0 days5400.000000 seconds'::interval"
mx.DateTime also supported
Arrays
foo = [1, 2, 3]bar = [datetime.time(9, 0), datetime.time(10, 30)]
cursor.mogrify("SELECT %s, %s",[foo, bar])
"SELECT ARRAY[1, 2, 3], ARRAY['09:00:00'::time,'10:30:00'::time]"
Tuples
foo = (1, 2, 3)
cursor.mogrify("SELECT * FROM customers WHEREcustomerid IN %s",
[foo])
'SELECT * FROM customers WHERE customerid IN (1, 2, 3)'
Hstore
import psycopg2.extras
psycopg2.extras.register_hstore(cursor)
x = {'a': 'foo', 'b': 'bar'}
cursor.mogrify("SELECT %s",[x])
"SELECT hstore(ARRAY[E'a', E'b'], ARRAY[E'foo',E'bar'])"
Unicode Support
Cause all result strings to be returned as Unicode strings:
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
Transaction Control
Transaction blocks are used by default. Must use
dbconn.commit()
or
dbconn.rollback()
Transaction Control: Autocommit
import psycopg2.extensions
dbconn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = dbconn.cursor()cursor.execute("VACUUM")
Transaction Control: Isolation Mode
import psycopg2.extensions
dbconn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE) # or other level
cursor = dbconn.cursor()cursor.execute(...)...dbconn.commit()
Exception Handling
StandardError|__ Warning|__ Error
|__ InterfaceError|__ DatabaseError
|__ DataError|__ OperationalError| |__ psycopg2.extensions.QueryCanceledError| |__ psycopg2.extensions.TransactionRollbackError|__ IntegrityError|__ InternalError|__ ProgrammingError|__ NotSupportedError
Error Messages
try:cursor.execute("boom")
except Exception, e:print e.pgerror
Error Codes
import psycopg2.errorcodes
while True:try:
cursor.execute("UPDATE something ...")cursor.execute("UPDATE otherthing ...")break
except Exception, e:if e.pgcode == \
psycopg2.errorcodes.SERIALIZATION_FAILURE:continue
else:raise
Connection and Cursor Factories
Want: accessing result columns by nameRecall:
dbconn = psycopg2.connect(dsn='...')cursor = dbconn.cursor()cursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")for row in cursor.fetchall():
print "Name: %s %s" % (row[0], row[1]) # stupid :(
Connection and Cursor FactoriesSolution 1: Using DictConnection:
import psycopg2.extras
dbconn = psycopg2.connect(dsn='...',connection_factory=psycopg2.extras.DictConnection)
cursor = dbconn.cursor()cursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")for row in cursor.fetchall():
print "Name: %s %s" % (row['firstname'], # or row[0]row['lastname']) # or row[1]
Connection and Cursor FactoriesSolution 2: Using RealDictConnection:
import psycopg2.extras
dbconn = psycopg2.connect(dsn='...',connection_factory=psycopg2.extras.RealDictConnection)
cursor = dbconn.cursor()cursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")for row in cursor.fetchall():
print "Name: %s %s" % (row['firstname'],row['lastname'])
Connection and Cursor FactoriesSolution 3: Using NamedTupleConnection:
import psycopg2.extras
dbconn = psycopg2.connect(dsn='...',connection_factory=psycopg2.extras.NamedTupleConnection)
cursor = dbconn.cursor()cursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")for row in cursor.fetchall():
print "Name: %s %s" % (row.firstname, # or row[0]row.lastname) # or row[1]
Connection and Cursor FactoriesAlternative: UsingDictCursor/RealDictCursor/NamedTupleCursor:
import psycopg2.extras
dbconn = psycopg2.connect(dsn='...')cursor = dbconn.cursor(cursor_factory=psycopg2.extras.
DictCursor/RealDictCursor/NameTupleCursor)cursor.execute("""SELECT firstname, lastnameFROM customersORDER BY 1, 2LIMIT 10""")for row in cursor.fetchall():
print "Name: %s %s" % (row['firstname'],row['lastname'])
# (resp. row.firstname, row.lastname)
Supporting New Data Types
Only a finite list of types is supported by default: Date, Binary,etc.
• map new PostgreSQL data types into Python• map new Python data types into PostgreSQL
Mapping New PostgreSQL Types IntoPython
import psycopg2import psycopg2.extensions
def cast_oidvector(value, _cursor):"""Convert oidvector to Python array"""if value is None:
return Nonereturn map(int, value.split(' '))
OIDVECTOR = psycopg2.extensions.new_type((30,),'OIDVECTOR', cast_oidvector)
psycopg2.extensions.register_type(OIDVECTOR)
Mapping New Python Types intoPostgreSQL
from psycopg2.extensions import adapt,register_adapter, AsIs
class Point(object):def __init__(self, x, y):
self.x = xself.y = y
def adapt_point(point):return AsIs("'(%s, %s)'" % (adapt(point.x),
adapt(point.y)))
register_adapter(Point, adapt_point)
cur.execute("INSERT INTO atable (apoint) VALUES (%s)",(Point(1.23, 4.56),))
(from Psycopg documentation)
Connection Pooling With Psycopg
for non-threaded applications:
from psycopg2.pool import SimpleConnectionPool
pool = SimpleConnectionPool(1, 20, dsn='...')dbconn = pool.getconn()...pool.putconn(dbconn)pool.closeall()
for non-threaded applications:
from psycopg2.pool import ThreadedConnectionPool
pool = ThreadedConnectionPool(1, 20, dsn='...')dbconn = pool.getconn()cursor = dbconn.cursor()...pool.putconn(dbconn)pool.closeall()
Connection Pooling With Psycopgfor non-threaded applications:
from psycopg2.pool import SimpleConnectionPool
pool = SimpleConnectionPool(1, 20, dsn='...')dbconn = pool.getconn()...pool.putconn(dbconn)pool.closeall()
for non-threaded applications:
from psycopg2.pool import ThreadedConnectionPool
pool = ThreadedConnectionPool(1, 20, dsn='...')dbconn = pool.getconn()cursor = dbconn.cursor()...pool.putconn(dbconn)pool.closeall()
Connection Pooling With DBUtils
import psycopg2from DBUtils.PersistentDB import PersistentDB
dbconn = PersistentDB(psycopg2, dsn='...')cursor = dbconn.cursor()...
see http://pypi.python.org/pypi/DBUtils/
The Other Stuff
• thread safety: can share connections, but not cursors• COPY support: cursor.copy_from(), cursor.copy_to()• large object support: connection.lobject()• 2PC: connection.xid(), connection.tpc_begin(), . . .• query cancel: dbconn.cancel()• notices: dbconn.notices• notifications: dbconn.notifies• asynchronous communication• coroutine support• logging cursor
Part II
PL/Python
Setup
• included with PostgreSQL• configure --with-python• apt-get/yum install postgresql-plpython
• CREATE LANGUAGE plpythonu;
• Python 3: CREATE LANGUAGE plpython3u;
• “untrusted”, superuser only
Basic ExamplesCREATE FUNCTION add(a int, b int) RETURNS intLANGUAGE plpythonuAS $$return a + b$$;
CREATE FUNCTION longest(a text, b text) RETURNS textLANGUAGE plpythonuAS $$if len(a) > len(b):
return aelif len(b) > len(a):
return belse:
return None$$;
Using Modules
CREATE FUNCTION json_to_array(j text) RETURNS text[]LANGUAGE plpythonuAS $$import json
return json.loads(j)$$;
Database Calls
CREATE FUNCTION clear_passwords() RETURNS intLANGUAGE plpythonuAS $$rv = plpy.execute("UPDATE customers SET password =
NULL")return rv.nrows$$;
Database Calls With Parameters
CREATE FUNCTION set_password(username text, passwordtext) RETURNS boolean
LANGUAGE plpythonuAS $$plan = plpy.prepare("UPDATE customers SET password = $1
WHERE username= $2", ['text', 'text'])rv = plpy.execute(plan, [username, password])return rv.nrows == 1$$;
Avoiding Prepared Statements
CREATE FUNCTION set_password(username text, passwordtext) RETURNS boolean
LANGUAGE plpythonuAS $$rv = plpy.execute("UPDATE customers SET password = %s
WHERE username= %s" %(plpy.quote_nullable(username),plpy.quote_literal(password)))
return rv.nrows == 1$$;
(available in 9.1-to-be)
Caching Plans
CREATE FUNCTION set_password2(username text, passwordtext) RETURNS boolean
LANGUAGE plpythonuAS $$if 'myplan' in SD:
plan = SD['myplan']else:
plan = plpy.prepare("UPDATE customers SET password= $1 WHERE username= $2", ['text', 'text'])
SD['myplan'] = planrv = plpy.execute(plan, [username, password])return rv.nrows == 1$$;
Processing Query Results
CREATE FUNCTION get_customer_name(username text)RETURNS boolean
LANGUAGE plpythonuAS $$plan = plpy.prepare("SELECT firstname || ' ' ||
lastname AS ""name"" FROM customers WHERE username =$1", ['text'])
rv = plpy.execute(plan, [username], 1)return rv[0]['name']$$;
Compare: PL/Python vs. DB-API
PL/Python:
plan = plpy.prepare("SELECT ...")for row in plpy.execute(plan, ...):
plpy.info(row["fieldname"])
DB-API:
dbconn = psycopg2.connect(...)cursor = dbconn.cursor()cursor.execute("SELECT ...")for row in cursor.fetchall() do:
print row[0]
Set-Returning and Table Functions
CREATE FUNCTION get_customers(id int) RETURNS SETOFcustomers
LANGUAGE plpythonuAS $$plan = plpy.prepare("SELECT * FROM customers WHERE
customerid = $1", ['int'])rv = plpy.execute(plan, [id])return rv$$;
Triggers
CREATE FUNCTION delete_notifier() RETURNS triggerLANGUAGE plpythonuAS $$if TD['event'] == 'DELETE':
plpy.notice("one row deleted from table %s" %TD['table_name'])
$$;
CREATE TRIGGER customers_delete_notifier AFTER DELETEON customers FOR EACH ROW EXECUTE PROCEDUREdelete_notifier();
Exceptions
CREATE FUNCTION test() RETURNS textLANGUAGE plpythonuAS $$try:
rv = plpy.execute("SELECT ...")except plpy.SPIError, e:
plpy.notice("something went wrong")
The transaction is still aborted in < 9.1.
New in PostgreSQL 9.1
• SPI calls wrapped in subtransactions• custom SPI exceptions: subclass per SQLSTATE,.sqlstate attribute
• plpy.subtransaction() context manager• support for OUT parameters• quoting functions• validator• lots of internal improvements
The End