View
221
Download
3
Category
Preview:
Citation preview
Database Design Techniques for Clinical Research
Melissa K. Carroll, M.S.
October 20, 2003
Overview
• Relational database design
• Implementing a relational database in Microsoft Access
• Designing a database for a typical study by our group
• Resources
• Questions
Relational Database Design
What is a Database?
• Collection of data organized for efficient operations
• Everyone uses them multiple times a day, often without realizing it
• Examples– Airline reservations
– Online shopping
• Underlying design principles are largely universal
Problems with “Flat” Files
• Data Redundancy– Leads to more work and inconsistencies
– Wreaks havoc on performing basic manipulations such as searching and sorting
• File Management– Multiple files
– Concurrent users
– Security
– Intermediate results
• Ad hoc programming (reinventing the wheel)
Data Modeling: Entity-Relationship Model
• Models data as entities, with attributes, and relationships between entities
• Entity: person, place, or thing
• Instance: example of an entity
• Attribute: feature of an entity
• Relationship: describes association between (usually two) entities
E-R Notation
RELATIONSHIP
ENTITY
UNIQUEATTRIBUTE
ATTRIBUTE ATTRIBUTE
ENTITY
JOINT UNIQUEATTRIBUTE
ATTRIBUTE
JOINT UNIQUEATTRIBUTE
ATTRIBUTE
Many-to-Many Many-to-One
Designing an E-R Diagram
• Issues to Consider– What questions will the data be used to answer?– What are the entities and how do they relate to
each other?– What attributes uniquely identify entities?– What attributes need to be sub-divided?
• Goal: Eliminate Redundancy
• Process is called “normalizing” data
Sample E-R Diagram: MP3 Files
Artist
Artist ID Artist Name
Album
Album ID Album Title
Release Year Label
Song
Song ID Song Title
Recording
Recording ID Quality
Time
Composer
Composer ID Composer Name
Recorded Appears On
Of A
Composed
Relational Model
• Lower-level model used for actual database implementation
• Translating from E-R model– Entities become tables– Attributes become fields– Many-to-many relationships become tables
• Unique identifiers from involved tables as fields
– Unique identifiers from “one” sides are added as fields to corresponding “many” sides
Relational Database Management Systems (RDBMSs)
• Database Management System (DBMS): software with purpose of helping user design and use a database
• Relational Database Management System (RDBMS): DBMS for databases based on relational model
• Most major commercial products (e.g. MS Access, Oracle, MySQL, SQL Server)
SQL
• Need language to tell the DBMS– The design of the database– Actual data to be entered– What data to retrieve and in what format
• SQL = standardized language used by almost all major DBMSs
• Standard language provides interoperability and portability
SQL Examples
• CREATE TABLE artist (artistID INT AUTO_INCREMENT, artistName VARCHAR(75))
• INSERT INTO artist (artistName) VALUES (“The Beatles”)
• UPDATE album SET label = “EMI” WHERE albumTitle = “Abbey Road”
SQL Examples Continued
• SELECT songTitle, quality FROM song, recording WHERE song.songID = recording.songID
• SELECT songTitle, quality FROM song INNER JOIN recording ON song.songID = recording.songID
• SELECT albumTitle, albumAge AS releaseYear - Date() FROM album
SQL Examples Continued
• SELECT Count(artistID) from artist
• SELECT MAX(recording.quality) FROM artist, recorded, recording WHERE artist.artistName = recorded.artistName and recorded.recordingID = recording.recordingID and artist.artistName = “The Beatles”
Relational Database Implementation in Microsoft
Access
Clinical Research Database Design
Typical Simple Study
• Baseline and fixed number of follow-ups
• Subject reaches each time point only once
• Different time points have different scale protocols
• Considerable overlap in scales between time points
• Isolated from other studies
Four Database Design Approaches
• Approach One: entire assessment administration as entity, e.g. all of baseline or all of 12 week– One table per time point, items as attributes
• Approach Two: scale administrations within each assessment as entity, e.g. 12 Week Hamilton– One table per scale per time point, items as attributes
• Approach Three: scale administration as entity– One table per scale, items as attributes
• Approach Four: item as entity– One table (theoretically)
Evaluation of Approach One
• May seem appropriate because common format for analysis is one record per subject
• Problems– Limited number of fields allowed in some
DBMSs– Will have many missing values– General redundancy issues (shares with
Approach Two, to follow)
Pros and Cons of Approach Two Versus Approach Three
• Pros– “Horizontal” format
– Flexibility for handling inter-time point scale disparities
• Cons (for simple studies)– Data model complexity
– Table creation and modification time multiplied
– Space consumption
– More data locations (entry and retrieval complexity)
– Re-assigning to different time points
Reassigning Scale Time Points Using Approaches Two and Three
Approach TwoTable: Hamilton Baseline Table: Hamilton Baseline Table: Hamilton 12 WeeksSubject ID aHam1 aHam2 aHam3 Subject ID aHam1 aHam2 aHam3 Subject ID bHam1 bHam2 bHam337 1 3 2 37 1 3 262 3 3 1 62 3 3 1 62 2 2 151 2 1 1 51 2 1 1 51 1 1 1
Approach ThreeTable: Hamilton Table: HamiltonSubject ID Ham1 Ham2 Ham3 Timepoint Subject ID Ham1 Ham2 Ham3 Timepoint37 1 3 2 1 37 1 3 2 2 change:62 3 3 1 1 62 3 3 1 151 2 1 1 1 51 2 1 1 162 2 2 1 2 62 2 2 1 251 1 1 1 2 51 1 1 1 2
Reassigning Scales: Modified Approach Three
Table: Visit Table: VisitVisit ID Subject ID Timepoint Visit ID Subject ID Timepoint1 37 1 1 37 22 62 1 2 62 13 51 1 3 51 14 62 2 4 62 25 51 2 5 51 2
Table: Hamilton Table: HamiltonVisit ID Ham1 Ham2 Ham3 Visit ID Ham1 Ham2 Ham31 1 3 2 1 1 3 22 3 3 1 2 3 3 13 2 1 1 3 2 1 14 2 2 1 4 2 2 15 1 1 1 5 1 1 1
Table: MMSE Table: MMSEVisit ID MMSE1 MMSE2 MMSE3 Visit ID MMSE1 MMSE2 MMSE31 1 0 1 1 1 0 12 1 1 1 2 1 1 13 0 1 1 3 0 1 14 1 1 1 4 1 1 15 1 1 0 5 1 1 0
Approach Two Cons for More Complex Studies
• Poor at handling an indefinite number of follow-up time points
• Modified Approach Three is better at handling studies in which subjects are assessed at the same time point multiple times– May happen due to progressing through the study
multiple times– May also happen due to e.g. being screened
multiple times
Evaluation of Approach Four
• Pros– Could potentially handle changes more
elegantly– Perhaps more “normalized” theoretically
• Cons– Considerably harder to design entry interface– Harder to obtain data in formats usually
required
• Doesn’t fix non-database problems with data collection changes
Databases and Datasets
• Database: Collection of data organized for efficient entry, updating, storage, and retrieval
• Dataset: Subset of data retrieved from database in a format optimized for a specific reporting or analysis purpose
• Well-designed databases should facilitate creation of datasets in any desired format
• Datasets should be formatted for a particular purpose and used only for that purpose
Normalizing Data Within Scales: Medication Data
Option 1MedScaleMedScale IDBrand Name1Generic Name1Drug Class1Dosage1Unit1Frequency1…Brand NameNGeneric NameNDrug ClassNDosageNUnitNFrequencyNEtc.
Option 2Medication MedScaleMed ID MedScale IDBrand Name Med ID1Generic Name Dosage1Class Unit1
Frequency1…Med IDNDosageNUnitNFrequencyNEtc.
Option 3Medication MedScale MedScaleMedicationMed ID MedScale ID MedScaleMedication IDBrand Name Etc. MedScale IDGeneric Name Med IDClass Dosage
UnitFrequency
Normalizing Data Continued: Comparison of Medication Queries
Multi-Study Issues: To Separate or Not To Separate
• If same data will count for multiple studies– Keeping design and data in sync– E.g. updating all copies when data changed– E.g. ensuring scale changes are reflected in all
tables and forms
• If handling multiple, possibly “isolated” studies– Keeping design in sync– Can still use views so actual storage is
transparent to user
Summary
• Careful planning must go into designing a database
• First step in design is to model the data– E-R relational model is effective
• DBMSs, such as Access, offer tools for creating, using, and maintaining databases
• When designing clinical research databases, as with any databases, priority should be normalization, hence elimination of redundancy
• Properly designed databases will supply data in any format desired
Resources
• Access Help (Help in top menu, Contents and Index, Contents tab)
• Access Database Wizard (in main menu upon opening)
• Oreilly Access Database Design & Programming, 3rd Edition– For database design theory: online chapter at
http://www.oreilly.com/catalog/accessdata3/chapter/ch04.html
• Access (97/2000/etc.) Bible – Available here; not 100% accurate
• Database System Concepts Fourth Edition (Silberschatz, Korth, Sudarshan)
Recommended