Upload
didier
View
47
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Characteristics of a Great Relational Database. Louis Davidson ([email protected]) Data Architect . Who am I?. Been in IT for over 17 years Microsoft MVP For 8 Years Corporate Data Architect Written four books on database design - PowerPoint PPT Presentation
Citation preview
October 11-14, Seattle, WA
Louis Davidson ([email protected]) Data Architect
Characteristics of a Great Relational Database
AD-318 | Characteristics of a Great Relational Database 2
Who am I?Been in IT for over 17 yearsMicrosoft MVP For 8 YearsCorporate Data ArchitectWritten four books on database design• Ok, so they were all versions
of the same book. They at least had slightly different titles each time
• Writing the fifth version now• They cover some of the same material…in a bit
more depth…
AD-318 | Characteristics of a Great Relational Database 3
It has often been said, if you live…
http://www.flickr.com/photos/bluespf42/163987671/sizes/l/in/photostream/
AD-318 | Characteristics of a Great Relational Database 4
You shouldn’t throw…
http://www.flickr.com/photos/chrisjones/7226119/
AD-318 | Characteristics of a Great Relational Database 5
Top Secret Developer PresentationI found this presentation in the secret stash of a manager I once worked with. I didn’t realize then justhow deep the conspiracy wentI share it here with you for the very first time ever*
* Does not include the other times this presentation has been given. Offer void in AL,TN,GA, AZ, KY, WA, or anywhere else on the planet. Your mileage may vary.
October 11-14, Seattle, WA
Po ArdeezineCIO Bah Dezine Consulting
Characteristics of a Good EnoughRelational Database
HE-MAN
DBA HATER’S
CLUB
AD-318 | Characteristics of a Great Relational Database 7
The CharacteristicIT
JUSTWORKS
(period)We don’t get paid for internal style!
http://www.flickr.com/photos/rnphotos/4689893987/sizes/m/in/photostream/
AD-318 | Characteristics of a Great Relational Database 8
Externals are all that matterConsider the human bodyThe external interface is judged on it’s ability to interact with others, not on how the pancreas works, or the liver, or kidneys, or the rest of the icky insidesThe internals, well, no one quite understands themA good enough program is like this. As long as the interface passes muster, who cares.
http://en.wikipedia.org/wiki/File:GiseleBundchen.jpg
AD-318 | Characteristics of a Great Relational Database 9
Maintenance costs are someone else’s concern!
http://www.flickr.com/photos/dancox_/2632603962/
AD-318 | Characteristics of a Great Relational Database 10
Summary
If the requirements don’t specifically mention it, then who cares?It is better to appear good than to be goodMarginal acceptance criteria is usually that it works NOWTesting should be done to make sure values are correct enough
AD-318 | Characteristics of a Great Relational Database 11
Questions? Contact info..Bite me, I don’t even care that much about my own database, why would I answer your questions
Note: If you agreed with this presentation in total, please give me your name so I can put you on my no-hire list
October 11-14, Seattle, WA
Characteristics of a Great Relational DatabaseLouis Davidson Data Architect
AD-318 | Characteristics of a Great Relational Database 13
Say you want a T-Bone Steak…
AD-318 | Characteristics of a Great Relational Database 14
But the costs for the two steaks are very different.
Can I produce such greatness on a budget?
AD-318 | Characteristics of a Great Relational Database 15
Choose your targetIt is almost impossible to end up with perfectionThe characteristics we will cover are habits to practiceThe realities of the day will dictate how well you can reasonably do
Advice: Imitate Greatness• You won’t become a better grill master trying to
achieve IHOP steaks.
AD-318 | Characteristics of a Great Relational Database 16
Good enough is the enemy of better.
AD-318 | Characteristics of a Great Relational Database 17
Design Golden RuleDo unto users what you would have them do unto you. www.twitter.com/sqlconfucius
Solve customer problems first and foremost, not your programming problemsReport writers and support staff are your customers tooThink about the stuff you complain about in your life and shoot for great, not just good enough
AD-318 | Characteristics of a Great Relational Database 18
Characteristic 1 - Well Performing
Well performing requires it to perform well everywhere necessaryFor example, which car would win in a race?
http://www.flickr.com/photos/baggis/271789442
http://www.flickr.com/photos/mtsn/243344705
AD-318 | Characteristics of a Great Relational Database 19
Washing machine moving race?
http://www.flickr.com/photos/pete_gray/2206005523/
AD-318 | Characteristics of a Great Relational Database 20
Just the First Step
Well performing requires it to work everywhere in every manner
necessaryhttp://www.codinghorror.com/blog/2007/03/the-works-on-my-machine-certification-program.html
AD-318 | Characteristics of a Great Relational Database 21
Well PerformingIndexing• Too Little < Just Right < Too Much • Check sys.dm_index_usage_stats to see if indexes
useful • Run LOTS of performance test scenariosSet based queries • NOT(Cursors)= Good• Sometimes unavoidable, use proper typeAvoid overmodularization• User Defined Functions can kill performance• View Layering
AD-318 | Characteristics of a Great Relational Database 22
Well Performing, Even moreWatch queries for proper seeks/scansUse sys.dm_io_virtual_file_stats to understand your file performanceUnique Rows, Scalar Column Values • (First Normal Form)• Reduce the number of queries (to 0) that use
partial column valuesProper handling of concurrency/locks/latches • Without sacrificing “IT WORKS” (NOLOCK, Blech)
AD-318 | Characteristics of a Great Relational Database 23
?
My boss read me this tweet and suggested we use
NOSQL because SQL Server doesn’t scale and makes life
harder:
@lancehilliard: "Blog engine using RDBMS makes 19 queries to render a homepage. Substituting NoSQL
makes fewer queries w/ less computation."
#devlink
What do you think?
AD-318 | Characteristics of a Great Relational Database 24
You will make it run faster, or else
AD-318 | Characteristics of a Great Relational Database 25
Characteristic 2 - Normal
http://www.flickr.com/photos/brotherxii/3159459278/
AD-318 | Characteristics of a Great Relational Database 26
NormalizationA process to shape and constrain your design to work with a relational engineSpecified as a series of forms that signify compliance A definitely non-linear process. • Used as a set of standards to think of compare to
along the way• After practice, normalization is mostly done
instinctively Written down common sense!
AD-318 | Characteristics of a Great Relational Database 27
Normalized - BrieflyColumns - One column, one valueTable/row uniqueness – Tables have independent meaning, rows are distinct from one another.Proper relationships between columns – Columns either are a key or describe something about the row identified by the key.Scrutinize dependencies• Make sure relationships between three values or tables are
correct.• Reduce all relationships to be between two tables if
possible
AD-318 | Characteristics of a Great Relational Database 28
Normal – How Normal?Myth:• 3rd Normal Form is enough, and more than that makes your
database application run slowerReality• Properly normalized databases are usually faster to work
with overall• Normalization is more about requirements that anything
else• Most 3rd Normal Form databases are likely in 5th already!Goal• Users have exactly the number of places to put data into
the system that they need.
AD-318 | Characteristics of a Great Relational Database 29
Normalization [1NF] Example 1Requirement: Allow the user to store their complete name and possible aliases
Normalization is mostly just common sense….
First Name Last Name
Aliases
AD-318 | Characteristics of a Great Relational Database 30
• Requirement: Table of school mascots
• To truly be in the spirit of 1NF, some manner of uniqueness constraint needs to be on a column that has meaning
• It is a good idea to unit test your structures by putting in data that looks really wrong and see if it stops you, warns you, or something!
Normalization [1NF] Example 2
<-- Go Vols!MascotId Name=========== -----------1 Smokey112 Smokey4567 Smokey 979796 Smokey
Color-----------BrownBlack/WhiteSmoky Brown
School-----------UTCentral HighLess Central HighSouthwest Middle
~~~~~~~~~~~ ~~~~~~~~~~~
AD-318 | Characteristics of a Great Relational Database 31
Normalization [1NF] Example 3Requirement: Store information about books
What is wrong with this table?• Lots of books have > 1 Author.What are common way users would “solve” the problem?• Any way they think of!What’s a common programmer way to fix this?
BookISBN BookTitle BookPublisher Author=========== ------------- --------------- -----------111111111 Normalization Apress Louis222222222 T-SQL Apress Michael333333333 Indexing Microsoft Kim444444444 DMV Book Simple Talk Tim444444444-1 DMV Book Simple Talk Louis
, Louis& Louisand Louis
AD-318 | Characteristics of a Great Relational Database 32
BookISBN BookTitle BookPublisher …=========== ------------- --------------- 111111111 Normalization Apress …222222222 T-SQL Apress …333333333 Indexing Microsoft …444444444 Design Apress …
Author1 Author2 Author3----------- ----------- -----------LouisMichaelKimKevin Louis
Normalization [1NF] Example 3Add a repeating group?
What is the right way to model this?
AD-318 | Characteristics of a Great Relational Database 33
Normalization [1NF] Example 3Two tables!
And it gives you easy expansion
BookISBN BookTitle BookPublisher =========== ------------- ---------------111111111 Normalization Apress222222222 T-SQL Apress 333333333 Indexing Microsoft444444444 DMV Book Simple TalkBookISBN Author=========== =============111111111 Louis222222222 Michael333333333 Kim444444444 Tim
ContributionType----------------Principal AuthorPrincipal AuthorPrincipal AuthorCo-AuthorCo-Author444444444 Louis
AD-318 | Characteristics of a Great Relational Database 34
Normalization [1NF] Example 4Requirement: Store users and their names
How would you search for someone with a last name of Niesen? David?What if the name were more realistic with Suffix, Prefix, Middle names?
UserId UserName PersonName=========== ~~~~~~~~~~~~~~ --------------- 1 Drsql Louis Davidson 2 Kekline Kevin Kline3 Datachix2 Audrey Hammonds4 PaulNielsen Paul Nielsen
AD-318 | Characteristics of a Great Relational Database 35
Normalization [1NF] Example 4Break the person’s name into individual parts
This optimizes the most common search operationsIt isn’t a “sin” to do partial searches on occasion:• Like if you know the last name ended in “son”
If you also need the full name, let the engine manage this using a calculated column:• PersonFullName as Coalesce(PersonFirstName + ' ')
+ Coalesce(PersonLastName)
UserId UserName PersonFirstName PersonLastName=========== ~~~~~~~~~~~~~~ --------------- --------------1 Drsql Louis Davidson 2 Kekline Kevin Kline3 Datachix2 Audrey Hammonds4 PaulNielsen Paul Nielsen
AD-318 | Characteristics of a Great Relational Database 36
Normalization [BCNF] Example 5Requirement: Driver registration for rental car company
Column Dependencies• Height and EyeColor, check• Vehicle Owned, check• WheelCount, <buzz>, driver’s do not have
wheelcounts
Driver Vehicle Owned Height EyeColor WheelCount ======== ---------------- ------- --------- ----------Louis Hatchback 6’0” Blue 4Ted Coupe 5’8” Brown 4Rob Tractor trailer 6’8” NULL 18
AD-318 | Characteristics of a Great Relational Database 37
Normalization [BCNF] Example 5Two tables, one for driver, one for type of vehicles and their characteristics
Driver Vehicle Owned (FK) Height EyeColor======== ------------------- ------- --------- Louis Hatchback 6’0” BlueTed Coupe 5’8” Brown Rob Tractor trailer 6’8” NULL
Vehicle Owned WheelCount ================ -----------Hatchback 4Coupe 4Tractor trailer 18
AD-318 | Characteristics of a Great Relational Database 38
Normalization [4NF] Example 6Requirement: define the classes offered with teacher and book
Dependencies• Class determines Trainer (Based on qualification)• Class determines Book (Based on applicability)• Trainer does not determine Book (or vice versa)If trainer and book are related (like if teachers had their own specific text,) then this table is in 4NF
Trainer Class Book========== ============== ================================Louis Normalization DB Design & ImplementationChuck Normalization DB Design & ImplementationFred Implementation DB Design & ImplementationFred Golf Topics for the Non-Technical
AD-318 | Characteristics of a Great Relational Database 39
Normalization [4NF] Example 6Trainer Class Book========== ============== ================================Louis Normalization DB Design & ImplementationChuck Normalization DB Design & ImplementationFred Implementation DB Design & ImplementationFred Golf Topics for the Non-Technical
Class Book=============== ==========================Normalization DB Design & ImplementationImplementation DB Design & ImplementationGolf Topics for the Non-Technical
SELECT DISTINCT Class, BookFROM TrainerClassBook
Question: What classes do we have available and what books do they use?
Doing a very slow operation, sorting your data, please wait
AD-318 | Characteristics of a Great Relational Database 40
Normalization [4NF] Example 6Break Trainer and Book into independent relationship tables to ClassClass Trainer =============== =================Normalization LouisNormalization ChuckImplementation FredGolf Fred
Class Book=============== ==========================Normalization DB Design & ImplementationImplementation DB Design & ImplementationGolf Topics for the Non-Technical
AD-318 | Characteristics of a Great Relational Database 41
Why Normal?Enhance Data Integrity • Parsing data is messy• Duplicated data often gets out of syncGive the engine the data in a format it wants• Indexes, statistics, etc all work on scalar valuesEliminating Duplicated Data • Disk is still the most expensive operationAvoiding Unnecessary Data Tier Coding • If this is where the performance bottleneck is, then
this should be a no-brainer, right?
AD-318 | Characteristics of a Great Relational Database 42
Consider the RequirementsAlmost every value could be broken down moreConsider a document. It could be stored either as rows of:• Complete documents• Chapters/Sections• Paragraphs• Sentences• Words• Characters• BitsThe right way is determined by the actual need
Normalization is a practical task, not an academic one.
AD-318 | Characteristics of a Great Relational Database 43
Characteristic 3 - Coherent
AD-318 | Characteristics of a Great Relational Database 44
Puzzles are a fun diversion…
AD-318 | Characteristics of a Great Relational Database 45
…not a design goalAn incoherent design/implementation is far more difficult to solve than a mazeMazes have been worked out so there is one and only one solutionThe consumers of the data shouldn’t have to run a maze to find the data they needData should empower the users
AD-318 | Characteristics of a Great Relational Database 46
CoherentUsers who see your schema should immediately have a good idea of what they are seeing.• Proper Normalization goes a long way towards
this goalDevelop and follow a (not eight) human readable standard • The worst standard available is better than 10
well thought out standards being implemented simultaneously
AD-318 | Characteristics of a Great Relational Database 47
Well meaning, but terrible…
AD-318 | Characteristics of a Great Relational Database 48
NamesIf you must abbreviate, use a data dictionary to make sure abbreviations are always the same• Names should be as specific as possible• Data should rarely be represented in the column name• If you need a data thesaurus, that is not cool.Tables• Singular or Plural (either one)• I prefer singularColumns• Singular - Since columns should represent a scalar value• A good practice to get common look and feel is to use a
“class” word as the name or suffix that gives general idea of the type/usage of the column
AD-318 | Characteristics of a Great Relational Database 49
Column Names – Class Word Examples• Name is a textual string that names the row value, but
whether or not it is a varchar(30) or nvarchar(128) is immaterial (Example Company.Name)
• userName is a more specific use of the name classword that indicates it isn’t a generic usage
• EndDate is the date when something ends. Does not include a time part
• SaveTime is the point in time when the row was saved• PledgeAmount is an amount of money (using a numeric(12,2),
or money, or any sort of types)• DistributionDescription is a textual string that is used to
describe how funds are distributed• TickerCode is a short textual string used to identify a ticker row
AD-318 | Characteristics of a Great Relational Database 50
Coherency GoalsGood - Databases are at least designed by individuals that have some idea of what they are doingGreat - Individual databases feel like they were created by one architect level personPerfection - All databases in the enterprise look and feel like they were all created by the same qualified person
AD-318 | Characteristics of a Great Relational Database 51
Mrphpph, grrrrm rppspppth…
AD-318 | Characteristics of a Great Relational Database 52
Sorry.
We are a vendor and don’t want to share out
schema… so we obfuscate it to make sure our competitors
can’t see it.
This makes things incoherent for our users.
What should we do?
AD-318 | Characteristics of a Great Relational Database 53
Characteristic 4 - Fundamentally Sound
Does this resemble your ETL developer after working with your data?Constraints and proper design help to keep the muck out of our database
AD-318 | Characteristics of a Great Relational Database 54
Typical Systems
oltp data
user process
extracttransfor
mcleaning
dwdata
cleaning
user process
cleaning
user process
cleaning
user process
cleaning
user process
cleaning
cleaning
user process
user process
AD-318 | Characteristics of a Great Relational Database 55
The goal
oltp data
user process
extracttransfor
mlimited
cleaning
dwdata
user process
user process
user process
user process
user process
user process
HOW do you do this? I don’t completely care… But I have plenty of suggestions!
AD-318 | Characteristics of a Great Relational Database 56
How your database looks without constraints
With FOREIGN KEY, UNIQUE, and CHECK constraints
Provides documentation for users to understand your structures without needing the model(More important) Provides useful guidance to the relational engine to understand expected usage patterns
Don’t just model relationships…
Ok, so you can’t see the check constraints in the model, but the optimizer knows they are there
AD-318 | Characteristics of a Great Relational Database 57
The Constraint Guarantee - FKWith “trusted” constraints, the followingqueries are guaranteed to return the same value
SELECT count(*)FROM InvoiceLineItem
SELECT count(*)FROM InvoiceLineItem JOIN Invoice ON Invoice.InvoiceNumber = InvoiceLineItem.InvoiceNumber
AD-318 | Characteristics of a Great Relational Database 58
Check for trusted/disabled keysSELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName,
OBJECT_NAME(parent_object_id) AS tableName, NAME AS constraintName, Type_desc, is_disabled, is_not_trustedFROM sys.foreign_keys
UNION ALL
SELECT OBJECT_SCHEMA_NAME(parent_object_id) AS schemaName, OBJECT_NAME(parent_object_id) AS tableName,
NAME AS constraintName, Type_desc, is_disabled, is_not_trustedFROM sys.check_constraints
This procedure runs through the constraints in a DB and makes them trusted/enabled.
http://drsql.org/Documents/Utility.constraints$ResetEnableAndTrustedStatus.sql
AD-318 | Characteristics of a Great Relational Database 59
Demo – Performance of Constraints
AD-318 | Characteristics of a Great Relational Database 60
We tried using constraints, but we kept
getting errors, so we started using UI code to
check data instead.
We keep getting data issues though. Why?
AD-318 | Characteristics of a Great Relational Database 61
Characteristic 5 - DocumentedWhat is this?• Coffee Cup
What is this USED for?• Coffee cup?• Pencil holder?• Change Jar?• Sample
Transporting Vessel?
If you are questioning whether or not to document the purpose of this cup, if this is used to hold coffee for anyone in your office, no problem.
AD-318 | Characteristics of a Great Relational Database 62
Non-standard usage
CautionNot
Mt Dew
PencilsLouis’Coffee
AD-318 | Characteristics of a Great Relational Database 63
Documentation should not be open to far too many interpretations
SPEED LIMIT ENFORCED BY
AIRCRAFT
SPEED MONITORING DONE FROM
AIRCRAFT
AD-318 | Characteristics of a Great Relational Database 64
Documentation should not be just flat out confusing
AD-318 | Characteristics of a Great Relational Database 65
DocumentationLike the coffee cup example, document all cases that aren’t intuitively obvious.Don’t bury your constituents in documentation generated from code scrapers• Not that they are necessarily bad, but good documentation
requires a distinctively “human” approachEvery table and column should have a succinct definition describing it’s purpose Make full use of the extended properties to get the documentation available contextually
KEY WORD: Succinct!
AD-318 | Characteristics of a Great Relational Database 66
If I document everything so well, can’t they fire me
first?
AD-318 | Characteristics of a Great Relational Database 67
Characteristic 6 - Secure
“Today you can go to a gas station and find the cash register open and the toilets locked. They must think toilet paper is worth more than money.” —Joey Bishop
http://www.flickr.com/photos/freefoto/5692512457/
AD-318 | Characteristics of a Great Relational Database 68
Dorothy and the Red Shoes
She had the power all along, she just didn’t know it. If some users were just a bit more curious about what they could do,
If you are bothered that in the book the shoes were silver, you probably need to seek professional help.
AD-318 | Characteristics of a Great Relational Database 69
Secure – Don’t be a headline
AD-318 | Characteristics of a Great Relational Database 70
SecureSecure the server first – Keeping hackers away from your server/backups keeps them away from your server/backupsGrant rights to roles rather than users – It is easier, and less likely that users get elevated security for long periods of timeGrant blanket security no higher than the schema – Use db_reader/db_writer in only the extremest of situationsDon’t overuse the impersonation features: EXECUTE AS is a blessing, and it opens up a world of possibilities. It does, however, have a darker side
AD-318 | Characteristics of a Great Relational Database 71
Security ContinuedEncrypt sensitive data: SQL Server has several means of encrypting data, and there are other methods available to do it off of the SQL Server box. • Encryption is like indexes. Use as much as you need to,
but not less.Most organizations do most security in client code (often based on tables that they build in the application.)• Ideally minimally using the database_principal identity
as the basis for identification.
AD-318 | Characteristics of a Great Relational Database 72
Security – Continued (even more)Keep permissions to the minimum necessary, even for the application• If the fence is up and the gate is closed
and locked, sheep can’t just wander away
• If the application requires DBO rights, it should be considered the first place to blame when something goes wrong
Yum
Baa?
Boo!
Our hero!
Yay! DBAaaah..
AD-318 | Characteristics of a Great Relational Database 73
Encapsulated
AD-318 | Characteristics of a Great Relational Database 74
Encapsulated – Level 1 Hints• Codd’s goal was separation of implementation and usage• Early database implementations required you to know the
paths to data, names of indexes, etc• Hints revert to this mode of thinking• Use them as sparingly as possible• Review hint usage every CU, SP, and/or Major ReleaseUI <> Table structure• Usually this starts in requirements
• Wrong: I want to store the name and addresses together• Right: I want to see the name and addresses on screen together
• UI is reasonably easy to change, data structures with state are not.
AD-318 | Characteristics of a Great Relational Database 75
Encapsulated – Level 2Layered approach• Ideally, there are layers of malleable code between the data
structures and the UI• Stored procedures (note, duck here) are a good candidate for a layer
• They are best for parameterization of queries• They should be used as replacements for queries, and some processes that
require intermediate data storage• They should NOT be used as replacements for large blocks of code.
• T-SQL is awesome for retrieving and manipulating data• T-SQL is pretty awful at iterating though rows one-by-one
Data driven design• Data should be accessed in one way, by knowing the table finding a
row by it’s key and getting the column. • You should not have to choose a column programmatically• Adding similar data should not require modification of code (adding
functionality should)
AD-318 | Characteristics of a Great Relational Database 76
Recap – Great Databases are…Correct – And all that that entailsWell Performing – Gives you answers fast Normal – normalized as much as necessary/possible based on the requirementsCoherent –comprehendible, standards based, names/datatypes all make sense, needs little documentation Fundamentally Sound – fundamental rules enforced such that when you use the data, you don’t have to check datatypes, base domains, relationships, etc Documented – Anything that cannot be gather from the names and structures is written down and/or diagrammed for others Secure – Users can only see data they are privy to Encapsulated – Changes to the structures cause only changes to usage where a table/column directly accessed it
AD-318 | Characteristics of a Great Relational Database 77
RealityThis is not about job security for a bunch of architectsWhen the tool is created that creates a database that is• Normalized• Well named• Understandable• Coherent• Documented• Secure• Well performing and it no longer needs a data architect/dba to get it right, I hope I saw it coming and was part of the team creating the tools!
AD-318 | Characteristics of a Great Relational Database 78
Questions? Contact info..Louis Davidson - [email protected] – http://drsql.org Get slides hereTwitter – http://twitter.com/drsql
MVP DBA Deep Dives 2!
SQL Blog http://sqlblog.com/blogs/louis_davidson
Simple Talk Blog – What Counts for a DBAhttp://www.simple-talk.com/community/blogs/drsql/default.aspx
AD-318 | Characteristics of a Great Relational Database 79
Complete the Evaluation Form to Win!Win a Dell Mini Netbook – every day – just for handing in your completed form. Each session evaluation form represents a chance to win.
Pick up your evaluation form:• In each presentation room• Online on the PASS Summit websiteDrop off your completed form:• Near the exit of each presentation room• At the Registration desk• Online on the PASS Summit website
Sponsored by Dell
October 11-14, Seattle, WA
Thank youfor attending this session and the 2011 PASS Summit in Seattle