33
Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Embed Size (px)

Citation preview

Page 1: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Databases and Statistical Databases

Session 4

Mark Viney

Australian Bureau of Statistics

5 June 2007

Page 2: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Terms

Databaseƒ A shared collection of logically related data (and description of this data), designed to meet the information needs of an organisation

DataBase Management System (DBMS)ƒ A software system that enables users to define, create and maintain the database and provides controlled access to this database

Page 3: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Terms (example)

Databaseƒ Personnel Databaseƒ Stock databaseƒ Statistical Database

DataBase Management System (DBMS)ƒ Oracleƒ DB2ƒ Accessƒ MySqlƒ FoxProƒ Firebird

Page 4: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Why keep information in databases?

Accessibility of dataƒ Increased concurrency (reads and writes)ƒ Sharing data

Improved data integrityImproved security

ƒ access only to necessary dataRelatable

ƒ More information from same amount of dataVisible

Page 5: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Why keep information in databases? (continued)

backup and recoveryImproved productivity

ƒ common tools / common processes

Page 6: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Disadvantages of databases

ComplexitySizeCost of DBMSNeed to upgrade versionsAdditional hardware costsHigher impact of failure

www.cableready.net/newsletter/winter99.html

Page 7: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Databases

used to be solely mainframecommonly on minicomputersincreasingly available on microcomputersmostly accessed by SQL

Page 8: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Relational Databases

entitiesƒ datatypesƒ validation

relationshipsƒ rules for interaction

Page 9: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Database Tables

rows and columnsfixed number of columnsmultiple rows (records)columns are of same datatype

Page 10: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Structured Query Language - SQL

Standard database language that allows:-ƒ Database creation and relation structuresƒ Basic data management tasksƒ Both simple and complex queries

Page 11: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

SQL - Data Definition - DDL

allows creation, modification and deletion of database objectsƒ Creation - CREATE

CREATE TABLE TAB1 (COL1 NUMBER, COL2 NUMBER);

ƒ Modification - ALTERALTER TABLE TAB1 ADD COL3 NUMBER;

ƒ Deletion - DROPDROP TABLE TAB1;

Page 12: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Structured Query Language - SQL Data Manipulation - DML

Standard language to allow access the data stored in databasesƒ Extraction - SELECT

SELECT COL1,COL2 from TAB1;ƒ Loading - INSERT

INSERT INTO TAB1 (COL1,COL2) VALUES(7,22);ƒ Manipulation - UPDATE

UPDATE TAB1 SET COL2 = COL1 + 2;ƒ Deletion - DELETE

DELETE FROM TAB1 WHERE COL2 = 4;

Page 13: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Database Modeling

representation of "real world"conceptual modellogical modelphysical model

Page 14: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Keys

Primary Keysƒ uniquely identifies a record

Foreign Keysƒ pointer to a Primary Key in another table

Page 15: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Indexes

May be applied to columns to allow fast data access

May be applied to single columns or several columns

Direct pointers to rows containing specific values in the indexed column(s)

may be unique or non-uniqueMay have more than one index per table

Page 16: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalisation

A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise

Page 17: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalisation- unnormalised

A representation of the data that contains repeating groups

Page 18: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalization - unnormalised form

Page 19: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalisation- 1st normal form

A relation in which the intersection of each row and column contains one and only one value

1NF

Page 20: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalization - 1st normal form

1NF

Page 21: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalisation- 2nd normal form

A relation that is ƒ in first normal form ƒ every non-primary key attribute is fully functionally dependent on the primary key

2NF

Page 22: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalization - 2nd normal form

2NF

Page 23: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalisation- 3rd normal form

A relation that is ƒ in first and second normal form and ƒ in which no non-primary key attribute is transitively dependent on the primary key

3NF

Page 24: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Normalization - 3rd normal form

3NF

Page 25: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Loading data into databases

Bulk loading toolData IntegrityValidationad-hoc loading

Page 26: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Data Extraction

Assemble data into usable formatSpreadsheetTimeseriesData CubePublication

Page 27: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Data manipulation

Inside databaseƒ Sophisticated manipulation language - SQL

Outside databaseƒ Timeseries

Seasonal AdjustmentChain Volume Measures (Constant Price)

ƒ SAS, SPSS

Page 28: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Transactional Integrity

the ability to apply rules to the data via database constraints

ability to group several discrete data insertion or data manipulation into one logical data change

In SQL, controlled via COMMIT and ROLLBACK statements

Page 29: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Transactional Integrity

database constraintsƒ values must conform to specific rules

exist in a specific columnbelong to a "set"uniqueness

If a validation against a constraint failsƒ the current transaction fails

Page 30: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Transactions & Recovery

Each transaction is logged by the DBMSEach transaction is logged by the DBMSBackups taken periodicallyBackups taken periodicallyData can be recoveredData can be recovered

ƒ to an archived backupto an archived backupƒ to a point in timeto a point in time

Page 31: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

COMMIT;

INSERT INTO TABLE1 (COL1,COL2) VALUES(7,22);UPDATE TABLE1 SET COL1 = 77 WHERE COL2 = 22;DELETE FROM TABLE1 WHERE COL1 = 7;

ROLLBACK;

INSERT INTO TABLE2 (COL3,COL4) VALUES('ABC',11);UPDATE TABLE2 SET COL3 = 'XYZ';DELETE FROM TABLE2 WHERE COL3 = 'xyz';

COMMIT;

Transaction example

transaction 1

transaction 2

Page 32: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Database Systems a Practical Approach to Design, Implementation and Management

Thomas Connolly, Carolyn Begg, Anne Strachan (Addison-Wesley) 1999

cartoons - Randy Glassbergen

References

Page 33: Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

Questions?