Upload
brandon-fox
View
219
Download
1
Embed Size (px)
Citation preview
Unit 6
Data Storage Design
Key Concepts1. Database overview2. SQL review3. Designing fields4. Denormalization5. File organization6. Object-relational database features
What Is Physical Database Design?The part of a database design that deals
with efficiency considerations for access of data
Key issues include:Processing speedStorage spaceData manipulation and data access patterns
Sometimes, the analyst and the designer are the same person,
Deliverables
What Is SQL?Structured Query LanguageOften pronounced “sequel”The standard language for creating and
using relational databasesANSI Standards
SQL-92 – most commonly availableSQL-99 – included object-relational features
Common SQL CommandsCREATE
used to create databases and database objects.examples:CREATE TABLECREATE DATABASE
SELECT used to retrieve data using specified formats and
selection criteriaINSERT
used to add new rows to a tableUPDATE
used to modify data in existing table rowsDELETE
used to remove rows from tables
Example CREATE TABLE Statement
Here, a table called DEPT is created, with one numeric and two text fields.
The numeric field is the primary key.
Example INSERT Statement
This statement inserts a new row into the DEPT table
DEPTNO’s value is 50DNAME’s value is “DESIGN”LOC’s value is “MIAMI”
SELECTThe SELECT, and FROM clauses are required.
All others are optional.
WHERE is used very commonly.
SELECT Statement: Example 1
Result: all fields of all rows in the DEPT table
Select * from DEPT;
SELECT Statement: Example 2
Result: all fields for employee “Smith”
Select * from EMP where ENAME = 'SMITH';
SELECT Statement: Example 3
Result: employee number, name and job for only salesmen from the EMP table, sorted by name
Select EMPNO, ENAME From EMP where JOB = 'SALESMAN' order by ENAME;
What Is a Join Query?A query in which the WHERE clause
includes a match of primary key and foreign key values between tables that share a relationship
SELECT Statement: Example 4
Result: all employees’ number and name (from the EMP table, and their associated department names, obtained by joining the tables based on DEPT_NO.
Only employees housed in department located in Chicago will be included
Select EMPNO, ENAME, DNAME from EMP, DEPT where EMP.DEPT_NO = DEPT.DEPT_NO and DEPT.LOC = 'CHICAGO';
SELECT Statement: Example 4(cont.)
Join queries almost always involve matching the primary key of the dominant table with the foreign key of the dependent table.
What Is an Aggregation Query?A query results in summary information
about a group of records, such as sums, counts, or averages
These involve aggregate functions in the SELECT clause (SUM, AVG, COUNT)
Aggregations can be filtered using the HAVING clause and/or grouped using the GROUP BY clause
SELECT Statement: Example 5
The job name and average salary for each job of employees in the EMP table.
Only jobs with average salaries exceeding $3000 will be included
Select JOB, Avg(SALARY) from EMP Group by JOB Having Avg(SALARY) >= 3000;
SELECT Statement: Example 5(cont.)
Note that clerks and salesmen are not included, because the average salaries for these jobs are below $3000.
Example Data Manipulation
Modifies the existing employee’s (7698) salary
Removes employee 7844 from the EMP table
Update EMP set SAL = 3000 where EMPNO = 7698;
Delete from EMP where EMPNO = 7844
Designing FieldsField – the smallest unit of named
application data recognized by system software such as a DBMS
Fields map roughly onto attributes in conceptual data models
When designing fields, consider:identitydata typessizesconstraints
Data type –A coding scheme recognized by system software for representing organizational data
SQL Server Data Types
Storage type Data types
date and time values
smalldatetime, datetime
integral bit, tinyint, smallint, int, bigint
non-whole numbers
decimal, numeric, money, smallmoney, float, real
characters and strings
char, varchar, text
Unicode characters and strings
nchar, nvarchar, ntext
Binary strings binary, varbinary, image
Other cursor, sql_variant, table, timestamp, uniqueidentifier, xml
Considerations for Choosing Data Types Balance these four objectives:
1. Minimize storage space2. Represent all possible values of the field3. Improve data integrity for the field4. Support all data manipulations desired for
the field
Mapping a composite attribute onto multiple fields with various data types
Creating and Using Composite Attribute Types
Data Integrity ControlsDefault Values
used if no explicit value is enteredFormat Controls
restricts data entry values in specific character positions
Range Controls forces values to be among an acceptable set
of valuesReferential Integrity
forces foreign keys to align with primary keysNull Value Controls
determines whether fields can be empty of value
Referential integrity is important for ensuring that data relationships are accurate and consistent
What Is Denormalization?The process of combining normalized
relations into physical tables based on affinity of use of rows and fields, and on retrieval and update frequencies on the tables
Results in better speed of access, but reduces data integrity and increases data redundancy
This will result in null values in several rows’ application data.
This will result in duplications of item descriptions in several rows of the CanSupplyDR table.
Duplicate regionManager data
What Is a File Organization?A technique for physically arranging the
row objects of a file
Main purpose of file organization is to optimize speed of data access and modification
11-35
Secondary Storage ConceptsBlock
a unit of data retrieval from secondary storage
Extent a set of contiguous blocks
Scan a complete read of a file block by block
Blocking factor the number of row objects that fit in one
block
Determining Table Scan TimeBlock read time is determined by seek,
rotation and transfer.
Average_table_scan_time = (#rows/blocking_factor) * block_ read_time
What Is a Heap?A file with no organization
Requires full table scan for data retrieval
Only use this for small, cacheable tables
What Is Hashing?
A technique that uses an algorithm to convert a key value to a row address
Useful for random access, but not for sequential access
What Is an Indexed File Organization?
A storage structure involving indexes, which are key values and pointers to row addresses
Indexed file organizations are structured to enable fast random and sequential access
Index files are fast for queries, but require additional overhead for inserts, deletes, and updates
Random Access Processing Using B+ Tree IndexesRandom Access Processing Using B+ Tree Indexes
Indexes are usually implemented as B+ trees
These are balanced trees, which preserve a sequential ascending order of items as they are added.
Issues to Consider When Selecting a File Organization
File sizeFrequency of data retrievalsFrequency of updatesFactors related to primary and foreign keysFactors related to non-key attributes
Which Fields should be Indexed?
Design of Object Relational Features
Object-relatonal databases support:Generalization and inheritanceAggregationMultivalued attributesObject identifiersRelationships by reference (pointers)
Generalization in Oracle 9i/10g
Aggregation in Oracle 9i/10g
Multivalued Attributes in Oracle 9i/10g
Object Identifiers in Oracle 9i/10g
SQL Server Object-Relational Features
SQL Server 2005 SQL Server 2008
Common Language Runtime (CLR) integration
Common Language Runtime (CLR) integration
Spatial and geographic data types
.NET Language Integrated Query (LINQ)
Object-Relational Designer