Upload
emery-ross
View
221
Download
1
Embed Size (px)
Citation preview
Databases and Statistical Databases
Session 4
Mark Viney
Australian Bureau of Statistics
5 June 2007
Terms
Databaseƒ A shared collection of logically related data (and description of this data), designed to meet the information needs of an organisation
DataBase Management System (DBMS)ƒ A software system that enables users to define, create and maintain the database and provides controlled access to this database
Terms (example)
Databaseƒ Personnel Databaseƒ Stock databaseƒ Statistical Database
DataBase Management System (DBMS)ƒ Oracleƒ DB2ƒ Accessƒ MySqlƒ FoxProƒ Firebird
Why keep information in databases?
Accessibility of dataƒ Increased concurrency (reads and writes)ƒ Sharing data
Improved data integrityImproved security
ƒ access only to necessary dataRelatable
ƒ More information from same amount of dataVisible
Why keep information in databases? (continued)
backup and recoveryImproved productivity
ƒ common tools / common processes
Disadvantages of databases
ComplexitySizeCost of DBMSNeed to upgrade versionsAdditional hardware costsHigher impact of failure
www.cableready.net/newsletter/winter99.html
Databases
used to be solely mainframecommonly on minicomputersincreasingly available on microcomputersmostly accessed by SQL
Relational Databases
entitiesƒ datatypesƒ validation
relationshipsƒ rules for interaction
Database Tables
rows and columnsfixed number of columnsmultiple rows (records)columns are of same datatype
Structured Query Language - SQL
Standard database language that allows:-ƒ Database creation and relation structuresƒ Basic data management tasksƒ Both simple and complex queries
SQL - Data Definition - DDL
allows creation, modification and deletion of database objectsƒ Creation - CREATE
CREATE TABLE TAB1 (COL1 NUMBER, COL2 NUMBER);
ƒ Modification - ALTERALTER TABLE TAB1 ADD COL3 NUMBER;
ƒ Deletion - DROPDROP TABLE TAB1;
Structured Query Language - SQL Data Manipulation - DML
Standard language to allow access the data stored in databasesƒ Extraction - SELECT
SELECT COL1,COL2 from TAB1;ƒ Loading - INSERT
INSERT INTO TAB1 (COL1,COL2) VALUES(7,22);ƒ Manipulation - UPDATE
UPDATE TAB1 SET COL2 = COL1 + 2;ƒ Deletion - DELETE
DELETE FROM TAB1 WHERE COL2 = 4;
Database Modeling
representation of "real world"conceptual modellogical modelphysical model
Keys
Primary Keysƒ uniquely identifies a record
Foreign Keysƒ pointer to a Primary Key in another table
Indexes
May be applied to columns to allow fast data access
May be applied to single columns or several columns
Direct pointers to rows containing specific values in the indexed column(s)
may be unique or non-uniqueMay have more than one index per table
Normalisation
A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise
Normalisation- unnormalised
A representation of the data that contains repeating groups
Normalization - unnormalised form
Normalisation- 1st normal form
A relation in which the intersection of each row and column contains one and only one value
1NF
Normalization - 1st normal form
1NF
Normalisation- 2nd normal form
A relation that is ƒ in first normal form ƒ every non-primary key attribute is fully functionally dependent on the primary key
2NF
Normalization - 2nd normal form
2NF
Normalisation- 3rd normal form
A relation that is ƒ in first and second normal form and ƒ in which no non-primary key attribute is transitively dependent on the primary key
3NF
Normalization - 3rd normal form
3NF
Loading data into databases
Bulk loading toolData IntegrityValidationad-hoc loading
Data Extraction
Assemble data into usable formatSpreadsheetTimeseriesData CubePublication
Data manipulation
Inside databaseƒ Sophisticated manipulation language - SQL
Outside databaseƒ Timeseries
Seasonal AdjustmentChain Volume Measures (Constant Price)
ƒ SAS, SPSS
Transactional Integrity
the ability to apply rules to the data via database constraints
ability to group several discrete data insertion or data manipulation into one logical data change
In SQL, controlled via COMMIT and ROLLBACK statements
Transactional Integrity
database constraintsƒ values must conform to specific rules
exist in a specific columnbelong to a "set"uniqueness
If a validation against a constraint failsƒ the current transaction fails
Transactions & Recovery
Each transaction is logged by the DBMSEach transaction is logged by the DBMSBackups taken periodicallyBackups taken periodicallyData can be recoveredData can be recovered
ƒ to an archived backupto an archived backupƒ to a point in timeto a point in time
COMMIT;
INSERT INTO TABLE1 (COL1,COL2) VALUES(7,22);UPDATE TABLE1 SET COL1 = 77 WHERE COL2 = 22;DELETE FROM TABLE1 WHERE COL1 = 7;
ROLLBACK;
INSERT INTO TABLE2 (COL3,COL4) VALUES('ABC',11);UPDATE TABLE2 SET COL3 = 'XYZ';DELETE FROM TABLE2 WHERE COL3 = 'xyz';
COMMIT;
Transaction example
transaction 1
transaction 2
Database Systems a Practical Approach to Design, Implementation and Management
Thomas Connolly, Carolyn Begg, Anne Strachan (Addison-Wesley) 1999
cartoons - Randy Glassbergen
References
Questions?