Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Databases and SQLite
Steve BrownMelbourne Linux Users Group
Programming SIGhttp://www.mlinux.org
17 February 2009
Melbourne Linux Users Group 17 February 2009
Outline
• What are databases and why?
• SQL
• SQLite
• Java bindings for SQLite
• Python bindings for SQLite
1
Melbourne Linux Users Group 17 February 2009
Definition of Database
• A database is a structured, persistent collection of data records storedon a computer system
• The software that implements and controls a database is called a databasemanagement system (DBMS).
• The organization of data in a database is described by a database modelor database schema.
2
Melbourne Linux Users Group 17 February 2009
Examples of Database Models
Source: http://en.wikipedia.org/wiki/File:Database models.jpg
3
Melbourne Linux Users Group 17 February 2009
Some Types of Databases
• Flat (or table)– Defines data as a two-dimensional array of records (rows) and columns
– Each column has a name and a data type
• Hierarchical– Defines data in a tree-like structure of branches and leaves
– Examples: filesystems, XML documents
• Object– These attempt to match the database model more closely to object-oriented
programming styles
– There are several approaches (e.g. object relational databases, object-oriented
databases) which are not as yet well standardized
• Relational database– The most common type of database in use—more on these coming up
4
Melbourne Linux Users Group 17 February 2009
Ancient History
• Databases began to be used in the 1960’s, as computers came to beapplied to business applications
• Most early DBMS’ were network or hierarchical model—most were tightlycoupled to the underlying data, to make efficient use of scarce resources
• In 1970, Edgar Codd at IBM developed the concept of the relationaldatabase, which represented data as a set of tables of fixed lengthrecords, together with relationships between the tables implemented askeys (values appearing in multiple tables)
• The concept was very general, and quickly caught on
• The first popular RDBMS (relational DBMS) for microcomputers wasdBase II for CP/M and later DOS– This was followed by many others
5
Melbourne Linux Users Group 17 February 2009
Relational Databases
• The best way to store data for efficient searching and retrieval is to usefixed length records
• But fixed length records aren’t efficient for data with optional fields, orvariable length records
• Codd realized that tables could be tied together with keys, commonfields in multiple tables, so that the database could be both efficientlyaccessed and efficiently stored
• Codd used tuple calculus to show that this kind of database model coulddo insertions, updates, etc. and still allow efficent lookups
6
Melbourne Linux Users Group 17 February 2009
Structured English Query Language
• IBM worked on a prototype RDBMS (“System R”) soon after Coddpublished his paper
• In the course of this, they came up with a language for describing dataretrieval and insertion operations
• This was initially called SEQUEL, but later changed to SQL
• Eventually, this became a standard language for interacting withdatabases– Strictly speaking, an SQL database is not a exactly relational database– But now, when most people say ’relational database’, they mean an
SQL database
7
Melbourne Linux Users Group 17 February 2009
How It Works
• A relational database comprises a set of tables with named columns(formally, the set of fields [i.e. columns] in a table is sometimes called arelvar)
• Some of the columns [fields] belong to more than one table [relvar]
• These columns [fields] are called keys
• Every key must be unique in at least one table [relvar], i.e. no two rowsof the table can have the same value for this field– This is sometimes called a candidate key in the table for which it is
unique, and a foreign key in other tables
• Candidate/foreign key columns implement the relationship between tables
8
Melbourne Linux Users Group 17 February 2009
Database Integrity
RDBMS usually will enforce rules on the database models to help insuredata integrity and efficient access
• Every table has at least one unique key, called a primary key (enforcingthe rule that every table row is unique)
• Every foreign key value must occur as a candidate key value in someother table (enforcing a one-to-many relationship)
• Foreign keys may also be specified as unique (enforcing a one-to-one-or-none relationship)
• Not all keys implement relationships—some keys are used for efficientaccess
9
Melbourne Linux Users Group 17 February 2009
Database Relations
Recipe_IDPreparation Recipe_ID
One−to−Many Relation
ServesTitle Quantity Unit Ingredient
10
Melbourne Linux Users Group 17 February 2009
Database Relations
Cat_id NameCat_idModel Link
One−to−one linkage
11
Melbourne Linux Users Group 17 February 2009
Database Relations
Many−to−one linkage
Lnk_IDLink URLLnk_ID Tag
12
Melbourne Linux Users Group 17 February 2009
Database Relations
Many−to−many linkage
RecIDName LinkTagID Tag
RecID TagID
13
Melbourne Linux Users Group 17 February 2009
SQL
SQL expressions fall in several different categories:
• Queries for retrieving
• Data Manipulation Language (DML) statements, for inserting, deleting,or modifying the data
• Data Description Language (DDL) statements for creating, destroying,or modifying tables
• Data Control Language (DCL) statements for controlling access rights
• Comments
14
Melbourne Linux Users Group 17 February 2009
SQL Queries
• SQL queries return data from the database
• Queries begin with the SELECT keyword and can include a wide range ofqualifiersSELECT FirstName,LastName FROM Patients WHERE Age>35;
• Queries can also perform joins, which are data sets spanning two or moretables which have some relationshipSELECT Ingredients.Quantity,Ingredients.StuffFROM Ingredients,RecipesWHERE Recipes.Name=’lasagna’ AND Recipes.RecID=Ingredients.RecID
• Queries can be become very complex
15
Melbourne Linux Users Group 17 February 2009
Results Sets
• Queries return result sets, which are like tables (they have columns orrows
• Depending on the RDBMS implementation and the language binding, aresult set can be returned as a temporary table or as a view
• A view is structure which can be iterated with a cursor, allowing accessto a single row of the result set at a time
• Some views are updatable—changes to the view will propagate back tothe underlying table
16
Melbourne Linux Users Group 17 February 2009
Data Manipulation Language
• DML instructions can add rows to a tableINSERT INTO Patients(FirstName,LastName,Age)VALUES (’Fred’,’Flintstone’,29);
• update rows in a tableUPDATE Patients SET Age=29WHERE FirstName=’Fred’ AND LastName=’Flintstone’;
• and delete rows from a tableDELETE FROM PatientsWHERE FirstName=’Fred’ AND LastName=’Flintstone’;
17
Melbourne Linux Users Group 17 February 2009
DML Transactions
• There is no “undo” command for the DML, but some complicateddatabases require that changes in many places all be made together tokeep the database consistent
• To support this, SQL databases usually implement transaction control:BEGIN TRANSACTION; -- dialect warning
UPDATE Accounts SET Balance=Balance+50WHERE AccountNumber=342;
UPDATE Accounts SET Balance=Balance-50WHERE AccountNumber=117;
COMMIT TRANSACTION; -- or ROLLBACK TRANSACTION
• Grouping DML into transactions can also be more efficient in some cases
18
Melbourne Linux Users Group 17 February 2009
Data Description Language
• DCL commands create databases, tablesCREATE TABLE Recipes (Title TEXT, Serves INT,Prep TEXT, RecID INTEGER PRIMARY KEY);
• delete tablesDROP TABLE Recipes
• change the data modelALTER TABLE Recipes DROP COLUMN Serves
• and other operations depending on implementation
19
Melbourne Linux Users Group 17 February 2009
Data Control Language
• DCL commands control access to the table
• Depending on implementation, access control can usually apply to specificcommands, users, and hosts
20
Melbourne Linux Users Group 17 February 2009
Client/Server Database Implementations
21
Melbourne Linux Users Group 17 February 2009
Advantages to Client/Server Paradigm
• Scalability
• Portability
• Separates application from database schema
• Facilitates unit testing
• Enables WWW applications
22
Melbourne Linux Users Group 17 February 2009
SQLite
• Flat-file implementation
• Implements large subset of SQL
• NOT a client/server-type implementation
23
Melbourne Linux Users Group 17 February 2009
Advantages to SQLite
• Low overhead, lightweight implementation
• Persistent, sophisticated data storage
• Environment for application prototyping, testing database schema
24
Melbourne Linux Users Group 17 February 2009
Java Binding for SQLite
• Java provides a standard object interface for talking to RDBMSes calledJDBCTM
• JDBC requires a driver object specific to the RDBMS
• SQLiteJDBC is a SQLite driver for JDBCTM
25
Melbourne Linux Users Group 17 February 2009
Python Binding for SQLite
• sqlite3 has been part of the Python library since 2.5
• sqlite3 provides connection and cursor objects to manage databases
26
Melbourne Linux Users Group 17 February 2009
Potential Problems with SQLite
• Scaling
• Database locking
• Dialect differences with other RDBMS
27
Melbourne Linux Users Group 17 February 2009
SQL Shortcomings
• Lack of standardization
• APIs don’t insulate programmer from SQL syntax– Note that both the Java and Python RDBMS API still require the programmer to
write SQL
• Complex queries
28
Melbourne Linux Users Group 17 February 2009
Pitfalls with Databases
• Optimizing data models
• Overhead
• Poor fit with application language
29
Melbourne Linux Users Group 17 February 2009
References
• SQLite: http://www.sqlite.org/
• Java JDBC:http://java.sun.com/javase/6/docs/api/java/sql/package-summary.html
• Java JDBC tutorial:http://java.sun.com/docs/books/tutorial/jdbc/index.html
• SQLiteJDBC: http://zentus.com/sqlitejdbc/
• Python/sqlite: http://docs.python.org/library/sqlite3.htmlhttp://oss.itsystementwicklung.de/trac/pysqlite/wiki/CodeSnippets
30