45
1 RTI International is a trade name of Research Triangle Institute 3040 Cornwallis Road P.O. Box 12194 Research Triangle Park, North Carolina, USA 27709 Phone 919-990-8397 e-mail [email protected] Fax 919-541-6178 Database Architecture and Design Workshop George Grubbs May 17, 2005

Database Architecture and Design Workshop - RTI International

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Database Architecture and Design Workshop - RTI International

1

RTI International is a trade name of Research Triangle Institute

3040 Cornwallis Road ■ P.O. Box 12194 ■ Research Triangle Park, North Carolina, USA 27709 Phone 919-990-8397 e-mail [email protected] 919-541-6178

Database Architecture and Design WorkshopGeorge Grubbs

May 17, 2005

Page 2: Database Architecture and Design Workshop - RTI International

2

Field Director’s Guide to “Database Design Appreciation”

Page 3: Database Architecture and Design Workshop - RTI International

3

Why should you care?Expectation setting

Become familiar with data modeling and the database design process, terminology and concepts.

To understand what goes on when a survey is being developed; or changed.

To better communicate with database designers and programmers when development or modifying a survey instrument.

To appreciate the overall database design process and its value.

To get better study outcomes and smoother system development efforts.

Page 4: Database Architecture and Design Workshop - RTI International

4

Workshop Objectives

Answer the questions:

What is a (relational) database?

How does it relate to a survey?

What is database design?

How do you design a database?

What about data warehousing?

Page 5: Database Architecture and Design Workshop - RTI International

5

Workshop Schedule

ScheduleSome basics: database concepts & terminology

Database design process

Data modeling – part 1

BreakData modeling – part 2

Data warehouse fundamentals

Introduction to the Case Study

Performing the Case Study

Page 6: Database Architecture and Design Workshop - RTI International

6

First things first: “What is a database?”

It is a structured collection of related data. Office terminology: file, record, field. Database terminology: logical: entity, entity instance, attribute.Database terminology: physical: table, row, column.

Example: Survey database using “logical terminology”: Questionnaire, Respondent, and Response entities. Response entity instance = “John Smith’s response to question 10. Response entity’s attribute might be called “answer”. And the data value for “John Smith’s response to question 10” might be “true”.

What is the process for getting data into an (electronic) database from a source document? That’s next

Page 7: Database Architecture and Design Workshop - RTI International

7

Source document to electronic form and into the database

DBMSSurvey_ID: “123”.Q1: “Y” Q2: “N”

Some kind of programming language: Visual Basic, “C”, EntryPoint, etc.

DBMS = MS Access, SQL Server 2000, Oracle, DB2. OS = Operating System =

Windows, Linux, Unix.

Hard drive

Uses SQL, (Structured Query Language) to interface with the DBMS.

Page 8: Database Architecture and Design Workshop - RTI International

8

Where is the data?

Here it is!

Page 9: Database Architecture and Design Workshop - RTI International

9

Disk array with 18 TB capacity

1 TB = 1,000,000,000,000 bytes1 TB = 1,099,511,627,776 bytes

1 KB = 1024 bytes

1 MB = 1 KB x 1 KB = 1,048,576

1 GB = 1 KB x 1 MB = 1,073,741,824

1 TB = 1 KB x 1 GB = 1,099,511,627,776

1 PB = 1 petabyte = 1 KB x 1 TB

1 EB = 1 exabyte = 1 KB x 1 PB

1 ZB = 1 zettabyte = 1 KB x 1 EB

1 YB = 1 yottabyte = 1 KB x 1 ZB

Example: Survey with 400 questions and each response averages 100 characters.

1GB = 26,844 surveys.

1 TB = 27,487,791

1 PB = 28,147,497,671

Page 10: Database Architecture and Design Workshop - RTI International

10

Database design First things first: Requirements

Output and

End Users

Questionnaire, Interview instrument

Interviewers

process

Page 11: Database Architecture and Design Workshop - RTI International

11

You might get some requirements from considering the interviewees

The Population

Page 12: Database Architecture and Design Workshop - RTI International

12

Data modeling is the heart of db design

1. Construct a logical data modelEntity-Relationship Diagram (ERD)

Key-Based Data Model (KBDM)

Fully-Attributed Data Model (FADM)

2. Construct a physical data modelPhysical Data Model (PDM)

Make data model improvements

Page 13: Database Architecture and Design Workshop - RTI International

13

Main goals in database design

Minimize redundant data (ideally, each data value should be in only one place in the database).

Reflect the business rules of the application domain (data quality).

Construct a clear and understandable data model that is well-documented (used to “communicate”).

Benefits: data quality, structural integrity, data consistency, performance, understand requirements.

Page 14: Database Architecture and Design Workshop - RTI International

14

Logical Data Model - ERD

The ERD is very simple: it only considers entities and their relationships.

An entity models something in the “real world” – that is, something in our “application domain” which is a “survey domain” in our case – e.g., an entity would be a “person”.

Let’s look at some entity examples, then deal with relationships

Page 15: Database Architecture and Design Workshop - RTI International

15

Example entities with a few attributes

QuestionnaireType A, Eff Date: 2/13/04, …

Type B, Eff Date: 2/13/03, …

Type B, Eff Date: 4/1/05, …

RespondentJoyce E. Smith, Female,

Live in North Carolina,

Age 42, …

Question1, What state you from?

2. What is your age?

1. Plan to re-visit?

InterviewerJohn W. Romano, Male, 5 years of experience, …

Carrie Jones, Female, 1 year of experience, …

ResponseTrue, False, True, True, …

So how are these entities related?

Let’s see

Page 16: Database Architecture and Design Workshop - RTI International

16

Entity relationships

Questionnaire

Respondent

InterviewerResponse

Question

is co

mpleted

by

has

has

makes

interviews

completesis made by

is interviewed by

is part of is answer to

Page 17: Database Architecture and Design Workshop - RTI International

17

Cardinality

Questionnaire

Respondent

Interviewer

Response

Question

Cardinality is the occurrence relationship between two entities.

N

1

1

N

1

N

1

N

N

M

The number of times one entity instance can occur for each instance of a related entity.

Page 18: Database Architecture and Design Workshop - RTI International

18

KBDM: Key-Based Data ModelPrimary Keys (PK)

Respondentrespondent_id (PK)

Interviewerinterviewer_id (PK)

Questionnairequestionnaire_id (PK)

Questionquestion_id (PK)

Responseresponse_id (PK)

A primary key value uniquely identifies a row in a table.The lines are used to indicate types of relationships and cardinality.

Page 19: Database Architecture and Design Workshop - RTI International

19

KBDM: Key-Based Data ModelPrimary Keys and Foreign Keys (FK)

Interviewerinterviewer_id (PK)

Questionnairequestionnaire_id (PK)

Questionquestionaire_id (PK) (FK)

question_id (PK)

Responserespondent_id (PK)(FK)

question_id (PK)(FK)

questionnaire_id (PK)(FK)

Resp_Intrvr_Assocrespondent_id (PK)(FK)

interviewer_id (PK)(FK)

Respondentrespondent_id (PK)

questionnaire_id (FK)

Foreign keys are used to establish relationships between tables.

Page 20: Database Architecture and Design Workshop - RTI International

20

FADM: Fully-Attributed Data ModelAdd Attributes (and Normalize)

Questionnairequestionnaire_id (PK)

type_code

effective_date

Questionquestion_id (PK)

questionnaire_id (FK)

question_text

Responserespondent_id (FK)

question_id (FK)

questionnaire_id (FK)

response

Only a few attributes shown.

Respondentrespondent_id (PK)

last_name

questionnaire_id (FK)

Interviewerinterviewer_id (PK)

last_name

Resp_Intrvr_Assocrespondent_id (FK)

interviewer_id (FK)

notes

Page 21: Database Architecture and Design Workshop - RTI International

21

A word about “normalization”

To normalize a database design is to put it in third normal form or 3NF.

There are quite a few normal forms: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF and even others.

The goal of normalization is primarily to minimize data redundancy, but a fully normalized database can be very inefficient due to query complexity; therefore, once performance is known, a database design is de-normalized to improve performance.

Page 22: Database Architecture and Design Workshop - RTI International

22

Normalization examples

Respondentrespondent_id (PK)

interviewer_1_name

interviewer_2_name

Respondentrespondent_id (PK)

Respondent_Intvwr

respondent_id (PK) (FK)

interviewer_name (PK)

Respondentrespondent_id (PK)

Respondent_Intvwrrespondent_id (PK) (FK)

interviewer_id (PK) (FK)

Interviewerinterviewer_id (PK)

interviewer_name

What’s wrong with having repeating fields?

What if you need to have more than 2 interviewers?

What if an interviewer’s name changes?

Page 23: Database Architecture and Design Workshop - RTI International

23

A word about “referential integrity”

Respondentrespondent_id (PK)

last_name

questionnaire_id (FK)

Questionnairequestionnaire_id (PK)

type_code

effective_date

Would it make sense to have someone in the “Respondent” table with a “questionnaire_id” that did not point to” a questionnaire in the “Questionnaire” table?

NULL = binary ‘0’

Parent

Child

Questionnaire1 Type A 2/13/04

3 Type B 4/10/05

RespondentSP12 Smith 3

WI13 Jones 2

WI65 Phang

Page 24: Database Architecture and Design Workshop - RTI International

24

The Logical FADM in ERwin

Questionnairequestionnaire_id

type_codeeffective_datedescription

Respondentrespondent_id

last_name (IE1.1)first_name (IE1.2)middle_initial (IE1.3)home_state_codeagesex_codemarital_status_codetravel_group_sizequestionnaire_id (FK)

Questionquestionnaire_id (FK)question_id

question_labelquestion_text

Interviewerinterviewer_id

last_namefirst_namemiddle_initialsex_codeyears_experience

Responserespondent_id (FK)questionnaire_id (FK)question_id (FK)

response

Resp_Intrvr_Assocrespondent_id (FK)interviewer_id (FK)

interview_notes

Page 25: Database Architecture and Design Workshop - RTI International

25

The Physical FADM in ERwinTarget DBMS: MS Access 2000

Questionnairequestionnaire_id: Long Integer

type_code: Text(1)effective_date: Date/Timedescription: Memo

Respondentrespondent_id: Text(8)

last_name: Text(20) (IE1.1)first_name: Text(15) (IE1.2)middle_initial: Text(1) (IE1.3)home_state_code: Text(2)age: Integersex_code: Text(1)marital_status_code: Text(1)travel_group_size: Integerquestionnaire_id: Text(1) (FK)

Questionquestionnaire_id: Text(1) (FK)question_id: Long Integer

question_label: Text(6)question_text: Text(40)

Interviewerinterviewer_id: Text(5)

last_name: Text(20)first_name: Text(15)middle_initial: Text(1)sex_code: Text(1)years_experience: Integer

Responserespondent_id: Text(8) (FK)questionnaire_id: Text(1) (FK)question_id: Long Integer (FK)

response: Text(1)

Resp_Intrvr_Assocrespondent_id: Text(8) (FK)interviewer_id: Text(5) (FK)

interview_notes: Memo

Page 26: Database Architecture and Design Workshop - RTI International

26

Example databaseRefer to PDM handout for column names – not all columns shown here.

Questionnaire1 A 2/13/04

2 B 2/13/03

3 B 4/10/05

Question1 1 1 Are you a native of this state?

3 1 1 Are you from this state?

3 2 2 First time visitor?

2 6 6 Did you spend over $500?

RespondentSP12 Smith Joyce E F NC 42 … 3

WI13 Jones Ed M GA 23 … 1

WI65 Phang Li A F CA 31 … 1

InterviewerI1 Romano John W M 5

I2 Jones Kim J F 1

I3 White Jim L M 8

Resp_Intrvr_AssocSP12 I1 Suspicious of motives.

WI65 I2 Eager for the interview.

SP12 I3 Receptive after review.

ResponseSP12 3 1 T

WI13 1 6 F

SP12 3 2 F

WI65 1 2 T

Page 27: Database Architecture and Design Workshop - RTI International

27

Intro to Structured Query Language - SQL

Create a table

CREATE TABLE Questionnaire (questionnaire_id long PRIMARY KEY NOT NULL,type_code text(1) NOT NULL,effective_date datetime NOT NULL,description memo

)

Insert data into a table

INSERT INTO Questionnaire (questionnaire_id, type_code, effective_date, description) VALUES (1, ‘A’, #5/17/05#, ‘Miami visitor survey’)

Page 28: Database Architecture and Design Workshop - RTI International

28

SQL continued

Update a column value

UPDATE Questionnaire SET type_code = ‘B’ WHERE questionnaire_id = 1

Retrieve data from a database

Page 29: Database Architecture and Design Workshop - RTI International

29

Retrieving information from the database

List the respondents from North Carolina along with their age.

SELECT first_name, middle_initial, last_name, age FROM Respondent WHERE home_state_code = ‘NC’;

What are the questions for the Type B (4/10/05) questionnaire?

SELECT question_text FROM Questionnaire, Question WHEREQuestionnaire.questionnaire_id = Question.questionnaire_id AND type_code= ‘B’ AND effective_date = ‘4/10/05’ ORDER BY question_label

Equating table keys, e.g. “questionnaire_id” is called a “join”.

Page 30: Database Architecture and Design Workshop - RTI International

30

More SQL

What are the questions and responses for Joyce Smith and what is her home state?

(Notice the use of “t1”, “t2” and “t3” – that is just a shorthand way of referring to table names.)

SELECT t1.question_label, t1.question_text, t3.response, t2.home_state_code

FROM Question t1, Respondent t2, Response t3

WHERE t1.questionnaire_id = t3.questionnaire_id

AND t2.respondent_id = t3.respondent_id

AND t2.last_name = ‘Smith’

AND t2.first_name = ‘Joyce’

ORDER BY t1.question_label;

What’s wrong with this query?

Page 31: Database Architecture and Design Workshop - RTI International

31

I left out …

Indexing: alternate keys, inversion entries.

Column value constraints; NULL, NOT NULL.

Views.

Triggers and Stored Procedures.

Document the design.

Change management – version control.

Keep the design and the database in sync.

Page 32: Database Architecture and Design Workshop - RTI International

32

Data Warehouse

“What is a data warehouse?”A database.

Contains detailed and summary data.

Normally, is not an online, transactional database.

Usually contains data integrated from several sources.

Supports business intelligence (BI) applications, online analytical processing (OLAP), and data mining.

Page 33: Database Architecture and Design Workshop - RTI International

33

• FACT – measured value. exs: “interview time” and “practice time”

• DIMENSION – descriptive attribute. exs: “age range” and “gender”

Dimensional data models look like this.

Dimensional data model:

Star design.

Snowflake design.

Page 34: Database Architecture and Design Workshop - RTI International

34

The Data Warehousing, Data Mining, and BI – OLAP Process.

Clean-Extract-Transform-Load

Data Warehouse Guy Data Mining GuyBI - OLAP Gal

Page 35: Database Architecture and Design Workshop - RTI International

35

Data Warehousing Technologies

DSS: Decision Support

OLAP: On-line analytical processing

Data Mining : Knowledge discovery “Maybe I’ll discover

a real nugget and win the Nobel Prize!”

Slice and dicedata “cubes”

Page 36: Database Architecture and Design Workshop - RTI International

36

Example of a data warehouse

InquiryInquiry

AnswerAnswerData DictionaryData Dictionary

DataDataWarehouseWarehouse

SearchSearchEngineEngine

SecuritySecurityProvisionsProvisions

andandAccessAccess

AuthorityAuthority

Florida State Education Data WarehouseFlorida State Education Data Warehouse

Clean,Extract, Transform, Load

Data Data WarehouseWarehouse

(Oracle)(Oracle)StatewideStatewideCourse Course NumberNumber

State StuState StuFinancialFinancial

AidAid

Fed FamFed FamEd LoanEd Loan

ProgProg

Fl StuFl StuAsst GrtsAsst Grts

WDEFWDEF

WDISWDIS

FETPIPFETPIP

WDISWDISSupportSupport

PostPost--SecondarySecondaryEd CoordEd Coord

FL BrightFL BrightFutureFuture

Pre KPre K--1212Course Course

Code DirCode Dir

DisabilitiesDisabilitiesOpportOpportSchp;arSchp;ar

OpportunityOpportunityScholarshipScholarship

Eval & Eval & ReportingReporting

Assess &Assess &EvaluationEvaluation

FacilitiesFacilities

SupportSupportPre KPre K--1212

Sch TransSch TransMgmtMgmt

GEDGED

StaffStaffPre KPre K--1212

StudentStudentPre KPre K--1212

AnnualAnnualFinancialFinancial

ReportReport TeacherTeacherCertCert

BudgetBudget

CostCost(SCARS)(SCARS)AggregateAggregate

FTEFTEFundingFunding TalentedTalented

2020

DCCDCCStudentStudentAnnualAnnual

PersonnelPersonnelReportReport

DCCDCCStaffStaff

SUSSUSStudentStudent

DCCDCCFinanceFinance

SUSSUSStaffStaff

SUSSUSFinanceFinance

Data sources: Other databases

Page 37: Database Architecture and Design Workshop - RTI International

37

Case study: “Miami Area Visitor Survey”

Page 38: Database Architecture and Design Workshop - RTI International

38

Tasks

1. Design a database to support the web-based online survey.

2. Extend the database design to contain the data from the online web survey.

3. Refine the database design to respond to typical queries.

Page 39: Database Architecture and Design Workshop - RTI International

39

Task 1: Database to support online survey

Tables containing data to populate drop-down lists.What are these?

State codes, cities, zip codes, countries.

List of reasons to visit Miami.

List of leisure activities.

Page 40: Database Architecture and Design Workshop - RTI International

40

Lookup Tables

State_LUstate_code: Text(2)

state_name: Text(60)

City_LUstate_code: Text(2)city_name_id: Integer

city_name: Text(40)

Zip_LUstate_code: Text(2)city_name_id: Integerzip_code: Text(5)

Country_LUcountry_code: Text(3)

country_name: Text(50)

Visit_Reason_LUvisit_reason_id: Integer

visit_reason_desc: Text(40)

Leisure_Activity_LUleisure_activity_id: Integer

leisure_activity_desc: Text(40)

Page 41: Database Architecture and Design Workshop - RTI International

41

Task 2: Database to contain the data

Tables with columns for keys and for fields to contain what the respondents enter.

Include a place for “Other” inputs. Make the design flexible to accommodate changes. Normalize the design.Q1: State, City, Zip, Country. Ex: “NC”, “Charlotte”, “28212”, “USA”.

Q2: Reasons for visiting Miami: First, Second, Third reasons. Pick from lists, plus “other” text.

Q3: Leisure activities: Multiple – pick from list, plus “other” text.

Q4: Time spent on trip. Ex: “2”, “days”; “5”, “hours”.

Q5: Number of nights away from home on trip: Ex: “0”, “1”, 6”.

Q6: Number of total visits to Miami in 2 years: Ex: “1”, “4”.

Q7: Plan to return to Miami? Ex: “Yes”, “No”. Reason.

Q8: Respondent Gender: Ex: “Male”, “Female”.

Survey date.

Page 42: Database Architecture and Design Workshop - RTI International

42

Survey Data Tables

Survey_Responsesurvey_response_id: AutoNumber

survey_date: Date/Timestate_code: Text(2) (FK)city_name: Text(40) (FK)zip_code: Text(5) (FK)country_code: Text(3) (FK)gender_code: Text(1)time_spent_miami_value: Integertime_spent_miami_units_code: Text(1)nbr_nights_miami: Integerttl_night_away_home: Integerttl_visits_miami_2_yrs: Integerplan_return_miami_ind: Text(1)

Visit_Reason_Othersurvey_response_id: Long Integer (FK)

visit_reason_other_desc: Text(100)

Visit_Reason_LUvisit_reason_id: Integer

visit_reason_desc: Text(40)

Visit_Reasonsurvey_response_id: Long Integer (FK)visit_reason_rank: Byte

visit_reason_id: Integer (FK)

Leisure_Activitysurvey_response_id: Long Integer (FK)leisure_activity_id: Integer (FK)

Leisure_Activity_Othersurvey_response_id: Long Integer (FK)

leisure_activity_other_desc: Text(100)

Leisure_Activity_LUleisure_activity_id: Integer

leisure_activity_desc: Text(40)

Country_LUcountry_code: Text(3)

country_name: Text(50)

Zip_LUstate_code: Text(2) (FK)city_name: Text(40) (FK)zip_code: Text(5)

What’s missing?A place to store “reason for returning or not returning to Miami.

Page 43: Database Architecture and Design Workshop - RTI International

43

Task 3: Refine the database design

Understand the types of queries needed.

Think about a data warehousing approach.

Create some tables to hold aggregate and summary data.

Page 44: Database Architecture and Design Workshop - RTI International

44

Will database technology come to this?

Credit: http://www.cartoonstock.com/directory/d/databases.asp

Page 45: Database Architecture and Design Workshop - RTI International

45

Thanks

I hope you enjoyed the workshop.

For any follow-up questions, e-mail me at [email protected].

Presentation available at: www.rti.org/ifdtc.

I forgot a minor detail, the “Final Exam”!