Upload
vidhya-sankaran
View
100
Download
2
Embed Size (px)
Citation preview
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.1
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Extract from Clariteq’s workshop:
Advanced Data Modeling -Communication, Consistency, and Complexity
Alec SharpSenior ConsultantClariteq Systems Consulting Ltd.West Vancouver, BC, CanadaMobile – 604 [email protected]
Proprietary material –please do not distrib
ute!
Thanks, Alec
Clariteq ADM extract
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.2
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
“Thanks!” from Alec for participating!
� Me: [email protected]
� My company: www.clariteq.com
� My book: Workflow Modeling, Second Edition (A complete rewrite of the first edition, not just a minor
refresh)
� Microblog: www.twitter.com/alecsharp
� Data Modeling blog:
www.erwin.com/expert_blogs/authors/22/Alec’s bio:
Alec Sharp, a senior consultant with Clariteq Systems Consulting, has deep expertise in a rare combination of fields – business process analysis and redesign, application requirements specification, and data modeling. With almost 30 years of hands-on consulting experience, his practical approaches and global reputation in model-driven methods have made him a sought-after resource in locations as diverse as Ireland, Illinois, and India.
He is also a popular conference speaker, mixing content and insight with irreverence and humour. Among his many top-rated presentations are “The Lost Art of Conceptual Modeling,”“The Human Side of Data Modeling,” “Crossing the Chasm - From Process Model to IT Requirements,” and “Getting Traction for Process – What the Experts Forget.”
Alec literally wrote the book on business process modeling – he is the principal author of “Workflow Modeling: Tools for Process Improvement and Application Development, Second Edition” The first edition was published in 2001, and the second edition was published in 2009. It has consistently been the top-selling title on business process modeling, and is widely used as a consulting guide and as an MBA textbook.
Alec’s popular workshops on Workflow Process Modeling, Data Modeling (introductory and advanced,) and Requirements Modeling (with Use Cases and Services) are conducted at many of the world’s best-known organizations. His classes are practical, energetic, and fun, with the most common participant comments being “best course (or best instructor) I’ve ever had.”
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Clariteq courses for analysts
Workflow Process Modeling – Defining, Mapping, and Analyzing Business Processes 2 days
Business processes matter, because business processes are how value is delivered. Understanding how to work with business processes is now a core skill for business analysts, process and application architects, functional area managers, and even corporate executives. But too often, material on the topic either floats around in generalities and familiar case studies, or descends rapidly into technical details and incomprehensible models. This workshop is different – in a practical way, it shows how to discover and scope a business process, clarify its context, model its workflow with progressive detail, assess it, and design a new process. Everything is backed up with real-world examples, and clear, repeatable guidelines.
Data Modeling – A Business-Oriented Approach to Entity-Relationship Modeling 2 days
Data modeling is critical to the design of quality databases, but is also essential to other requirements techniques such as workflow modeling and requirements modeling (use cases and services) because it ensures a common understanding of the things – the entities – that processes and applications deal with. This workshop introduces entity-relationship modeling from a non-technical perspective, provides tips and guidelines for the analyst, and explores contextual, conceptual, and detailed modeling techniques that maximize user involvement.
Requirements Modeling – Proven Techniques for Use Cases and Service Specifications 2 days
Use cases have offered great promise as a requirements definition technique, but many analysts get disappointing results. That’s because published methods are often inconsistent, complex, or focused on internal design. This unique workshop clears up the confusion. It shows how to employ use cases to discover external requirements – how users wish to interact with an application – and how to use service specifications to define internal requirements – the validation, rules, and data manipulation performed behind the scenes. Better yet, it shows in concrete terms how the two perspectives interact, and demonstrates synergies with data modeling and business process workflow modeling.
Advanced Data Modeling – Communication, Consistency, and Complexity 2 or 3 days
After gaining some practical experience, data modelers encounter situations such as the enforcement of complex business rules, handling recurring patterns, satisfying regulatory requirements to capture complex changes and corrections, dealing with existing databases or packaged applications, integrating with dimensional modeling, and other issues not covered in introductory data modeling classes. This highly participative workshop provides approaches for many advanced data modeling situations, as well as techniques for improving communication between data modelers and subject matter experts.
Facilitation & Presentation – Session Techniques for Business Analysts 2 days
The primary approach for discovering and validating business requirements has shifted from one-on-one interviews to facilitated workshops. This began with JAD or “joint application development” sessions, and has now become the norm. Just as important as gathering information in a facilitated session are skills in presenting that information for validation and to inform a wider audience. While there are many general-purpose courses available on these topics, there is very little available that is specifically designed for the needs of the business analyst. This unique workshop will provide specific methods and techniques in both skills – facilitation and presentation.
Now available! Business Analysis Overview – Model-Driven Techniques for Processes, Applications, and Data 2 days
Essential content from Clariteq’s Process, Requirements, and Data Modeling workshops.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.4
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
The problem… Why it’s a problem…
1. Missing the point
altogether
2. Starting with a data
modeling lecture
3. Not investigating the
“as-is” model
4. Fear of asking
“dumb” questions
5. Not applying graphic principles
6. Getting stuck in a data modeling rut
7. Generalizing too much,
too soon
Seven typical problems
You’ll turn potential participants into actual
non-participants
You need it to show how much better life will be with the “to-be”
You need to show that they’re the experts, and someone will be glad that you asked
An ERD is a graphic – otherwise,why bother?
You won’t get full participation,
understanding, and buy-in
You’re really just showing off –
give us mere mortals a break!
We’re designing businesses, not databases
1 - think about it – do architects bring hammers and saws to their first meeting with a client
2 - by putting them to sleep
3 - and know what has to be left in place, and what has to be converted or integrated
4 - and besides, you never really know the business as well as you think you do
5 - maybe you could just give them the DDL
6 - because for some of the folks, you aren’t using the right language
7 - besides, your “elegant” model is probably wrong if no one can validate it
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.5
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
The behavior… What it means…
1. Accessibility
2. Directionality
3. Simplicity
4. Consistency
5. Visibility
6. Relevance
7. Plurality
Seven positive behaviors
Data modeling can be challenging enough to participate in - make it easy for everyone to get involved
Like process models and org charts, data models are easiest to understand if they have a direction.
The forces of complexity are everywhere –resist them! Use simple techniques and
frameworks, at least at the beginning.
Like children, adults learn from repetition –
always do the same things the same way, & they’ll learn modeling by osmosis.
It’s best if your clients spot the need for things like generalization – be patient, and give them every chance.
Data models can be quite abstract to many people, so “attach” concrete, relevant artifacts and issues to them
Data modeling, and data model diagrams, appeal to some, but not all – use other techniques to involve everyone
For example…“Just do it!” - don’t start with a lecture on data modeling
Draw models so that
dependency is visually obvious.
Use methods that let you start simple, and
add detail in layers
Follow the same “script”
whenever adding a new entity
Draw models so that generalization , etc. are visually obvious.
Use familiar “props” like forms or reports to illuminate models
Use scenarios and narratives in addition to E-R diagrams.
And maybe…
8. Patience (is a virtue)
9. Humility (Don’t be afraid to ask! Spend more time saying “tell me more.”)
10. Empathy (Feel their pain! Put yourself in their shoes!)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.6
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Order ID
Placed Date
Delivery Date
Status
etc.
Product ID
Description
Unit Price
etc.
Customer ID
Name
Billing Address
Shipping Addressetc..
Order ID
Product ID
Quantity
etc..
• A description of a business in terms of the things it needs to know about
• Things (Entities) and Facts about Things (Attributes & Relationships)
• “Real world”, not technical implementation
• Graham Witt – “A narrative supported by a graphic”
Customer definition:A Customer is a person or organization that is a past, present, or potential user of our products or services. Excludes the company itself when we use our own products or services, but includes cases where the Customer doesn’t have to pay (e.g., a charity.)
Plus “Assertions” (rules)- Each Order must contain one or more Order Lines (i.e., at least one Order Line)- Each Order Line is contained in exactly one Order- Each Order can contain at most one Order Line per Product
Key Point
Not the same as database design
What is a data model?
Customer
places
Order
Order Line
contained in specifies
Product
Entitya distinct thing of interestabout which the business must maintain information
RelationshipA named associationbetween two entities
AttributeA property of an entitythat can be expressedas a piece of data
IdentifierOne or more attributes that can be used to uniquelyspecify a single instance(only in detailed data models)
placed by
There are many ways to describe a business...
• How it works - Process Model
• How it’s organized - Organization Chart
• Where it operates - Location Map
and…
• What it needs to maintain records about - Data Model
Data modeling symbols will vary slightly among the different “dialects”, but the meaning is constant.
The symbols are much more standardized than they used to be.
Data Modeling involves:
Gathering knowledge from Subject Matter Experts (the hard part!)
Representing that knowledge using a set of standard symbols and conventions (the easier part!)
Not just for Database Design anymore!
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.7
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Entity types and conventions
Reference or TypeIndependentClassifies or categorizes other entities and/or allows the recording of allowable values for a descriptive attributeDrawn diagonally out from or beside the classified entity
CharacteristicDependent on one parentRecords multi-valued facts about a parent entity that have been “cast out” from that entityDrawn below parent
AssociativeDependent on two or more parentsRecords facts about a relationship (association) between two or more parent entities – is often the resolution of a M:M relationship between the parentsDrawn between and below parents
KernelIndependentA fundamental thing of interest to the enterprise whose existence does not depend on any other entity – it can “stand alone”Drawn at the top of its area
Recursive relationshipA relationship between instances of the same entity. Can be 1:1, 1:M, or M:M
SupertypeContains facts (attributes and relationships) that are common to all instances of the entity. Any kind of entity can be a supertype.
SubtypeContains facts that are specific to a particular subset of instances of the entity.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.8
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Didn’t know they were important
Different levels of detail support different perspectives
Type of Data Model .. The need
1
2
3
Contextual (Scope)
Conceptual
(Overview)
Logical (Detail)
� Agreement on “big picture” and
vocabulary for process or subject
� Agreements on basic concepts, more vocabulary, and rules
� Excruciating detail for physical
design
Upper levels often lost because…
Tool provided no support
Remember…• Maintain SME involvement
• Get maximum value from the technique
Started at too low a level
Three types of data models
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.9
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Summary – data model types
1 2 3Contextual
(Scope)
� Agreement on “big picture”, main terms
and definitions� May be a simple block
diagram, or primarily
textual – a list� Optional – not
necessary on smaller
projects� Later in this course,
we’ll look at some
important techniques for dealing with
contextual models
� Agreements on basic concepts and rules
� Excruciating detail
for physical design
Conceptual (Overview)
Logical (Detail)
Main differences
� Ensures that everyone is on the same wavelength before diving into the details
� Overview: main entities, attributes, and relationships
� Lots of M:M relationships
� Relationships show multiplicity
� No keys
� No reference entities except where they are “structural”
� Many attributes will be non-atomic and multi-valued
� Verified by direct inspection
� A “one-pager”
� 20% of the modeling effort
� Provides all detail for first-cut physical database design and requirements specification
� Detailed: ~ 5 times as many entities as the conceptual model
� M:M relationships resolved
� Relationship optionality added
� Primary, foreign, alternate keys
� Lots of reference entities
� Fully normalized – no multi-valued, redundant, or non-atomic attributes. All attributes defined and “propertized”
� May be verified by other means: sample data, report mockups, …
� May be partitioned
� 80% of the modeling effort
Note that across the industry, there is a lack of consistency in defining these types of models. In the
“Zachman Framework” these would be the planner’s, owner’s, and designer’s views.
Analogies:
- The contextual model is like the site plan with a definition of what will be built. The focus is scope or
“footprint.”
- The conceptual model is like a floor plan and sketches for a building. The focus is the essential terms,
definitions, and facts / rules.
- The logical data model is like the detailed blueprints for a building. The focus is on the individual
data items the enterprise needs, and the rules that govern them.
A basic message for conceptual modeling – “Resist the urge to normalize or generalize until it
matters!!!
The logical model is not necessarily the “as built” model – the physical database design. The database
designer or DBA will make changes in the interest of performance, recoverability, distribution, etc.
Everyone who currently supports an application should:
- draw the application’s logical data model following strict top-down drawing conventions.
- abstract the model “up” to a conceptual data model
- at least consider reviewing the conceptual model with analysts, developers, and subject matter experts
to ensure that it reflects the intentions of the business.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.10
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Contextual data model
� A list of the main topics (“subject areas”) in scope, and an associated vocabulary or glossary
� Glossary may include items other than Entities E.g., processes, transactions, industry terminology, Key Performance Indicators [KPIs], etc.
� Primarily textual; optionally, a diagram showing the topics and their interrelationships, e.g.
Main use: “Do we understand the scope and the main terms?”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.11
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Conceptual data model
� Shows main or core entities, relationships, and attributes
� Gets the “concept” across
� Great for communication, but not for database design
� Best done before any significant process modeling or applicationrequirements (use cases and service specifications)
Let's see what
happens when we
take these three entities to the
"Logical" level...
The conceptual model is the “crossroads” at which both business and IT can communicate – both
parties have “shared accountability” to ensure that there is a common understanding of the basics.
As you add detail, your conceptual model will evolve into a logical data model, but don’t lose the
conceptual view!!! It is an absolutely vital tool for presentations, training, and so on.
After Logical Data Modeling, the next stage in the progression would be to turn your logical data
model into a Physical Database Design for your particular implementation environment (MS Access,
SQL Server, Oracle, DB2, etc.)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.12
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Logical data model
� All necessary detail – it’s the data specifications
� Input to first-cut physical database design
� Completed after use cases and service specifications are finalized
This could be made even more detailed
• we haven’t shown entities like “Semester”, “Building”, or “Room”
• we haven’t shown reference entities like “Course Method” or “Degree Level”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.13
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
From conceptual to initial logical
• Multi-valued - the attribute can have multiple different values for one instance of the entity, either “at a time” or “over time”E.g., “Employee Name” if aliases or previous names are tracked
• move it down to the “many” end of a 1:M relationship into a characteristic entity
• if it’s a fact about a M:M relationship between entities, move it down to the “many” end of a 1:M relationship into an associative entity
• both move data structure into 1st Normal Form – 1NF
• Redundant - the same attribute value is recorded multiple times, in different entity instances, possibly inconsistently E.g., “Company Name” in a “Department” entity
• move it up to the “one” end of a M:1 relationship to one of the parent (or higher) entities (2nd Normal Form – 2NF)
• you might have to create a new parent entity where non existed before
• Constrained - a descriptive attribute needs to be restricted to a set of standardized values to improve integrity and reporting E.g., “Employee Type”
• move it out to the “one” end of a M:1 relationship to a reference or other related entity (3rd Normal Form - 3NF)
The progression from conceptual to logical is largely based on identifying and dealing with three attribute characteristics
For multi-valued attributes, ask “On what basis does the attribute repeat?” The answer should be in the
form “It occurs once per …” This will provide a clue as to what entity the multi-valued attribute should
be moved to.
Two variations of the same example:
- If a Resource has multiple Chargeout Rates over time, then the Chargeout Rate doesn’t vary in
relation to some other entity. We could say that the Chargeout Rate attribute repeats “within” the
Resource entity, so we’ll simply move it down (“cast it out”) into a characteristic entity called
Resource Chargeout Rate. It will need the attributes Effective Date and End Date in addition to
Amount.
- If a Resource has multiple Chargeout Rates, one per Project that the Resource is contracted to, then
we could say that the Chargeout Rate attribute repeats “in relation to” the Project entity. In other
words, we know that Chargeout Rate is a fact about the relationship between Resource and Project, and
belongs in an associative between them. That associative may depict a contract or agreement, and
might have the word “Contract” in its name.
Another example:
- If the attribute Expected Duration is in the Project entity, and it is multi-valued, with one value per
project phase, then Expected Duration should be moved down into a Characteristic (of Project) entity
called Project Phase. The Task entity would likely be a characteristic of Project Phase
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.14
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Migrating multi-valued attributes
Attributes can’t repeat within an entity –
“repeating” or “multi-valued” attributes are moved into a characteristic entity
For each Section, there can be one or more Lecture times. Depending on the type of Course, there may be none.
For each Section, there can be one or more Tutorial times. There will always be at least one.
We must move each "repeating group" into a child entity.
Note –Later, we’ll discuss the
inclusion of primary keysand the added relationship symbols
This is one of the rules for normalization - entities are in First Normal Form once all the repeating
attributes or groups of attributes have been sent (“cast out”) to their own entities.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.15
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Migrating attributes of relationships
When the multi-valued attribute is actually a fact about a relationship, we create an associative entity:
"When did John Smith enroll in Math 100?""What grade did John get at midterm?""What was his final grade?““What is the average grade for Math 100 Section 3?”
These required facts are not about Student, or Section, but the relationship betweena Student and a Section
We need to create a new associative entity
“Many to many” relationships will almost always get a “promotion” to an entity, as in the example
above, because there are usually attributes about the relationship that must be recorded.
This is a variation on putting data into First Normal Form.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.16
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Migrating redundant attributes
We eliminate redundancy by ensuring that every attribute is in the entity that it describes, so that the attribute value is recorded only once.
• Before migration, attribute values about a Department would be recorded redundantly with every Course offered by that Department, so it is moved up to a parent entity.
• Before migration, values of the Delivery Method Description attribute would be carried redundantly in many instances of Course, so it is moved out to a “type” (or “reference” or “lookup” or “classification”) entity.
Eliminating redundancy puts entities into Second Normal Form if the redundant attributes move “up”
the parentage hierarchy, and into Third Normal Form if the attributes move “out” to a related entity
(often a “type” entity.)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.17
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
World’s shortest course on normalization
• Unnormalized (UNF or 0NF)� Contains a “repeating group”
• First Normal Form (1NF)� Repeating attributes moved down to Characteristic
or Associative entities
• Second Normal Form (2NF)� Only applies to dependent entities
� No attributes in a child entity are really facts about a parent (or grandparent or…)
� That is, no Characteristic or Associative entity redundantly contains facts from its parent(s) – if it does, move the fact(s) up(create a new parent entity if necessary)
• Third Normal Form (3NF)� If any entity redundantly contains facts from a
related (non-parent) entity, move the fact(s) out to the other entity (create a new entity if necessary)
• BCNF (Boyce-Codd NF)� Not an issue if you keep your wits about you
• Fourth and Fifth Normal Form (4NF, 5NF)� “Large” (3-way or more) associatives need to be
broken down into more granular entities
UNF
1NF
2NF
3NF
4NF, 5NF?...
Other normal forms – forget about it!
The reason we’re covering this? You have to be able to make it simpler for the data “layperson”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.18
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Script – adding a dependent entity
An “orderly script” –adding a new characteristic or associative entity to a logical model
1. Place the entity (and relationships) on the diagram according to dependency
2. Ask “What is one of these things?” then name and define the entity accordingly
3. Add relationship names, and add multiplicity (or confirm, if it was already specified)
4. Add attributes
5. Perform further attribute migration, dealing with multi-valued attributes first, and reference data last(1NF, 2NF, 3NF in sequence)… and only then worry about…
6. Relationship optionality
7. Primary keys or uniqueness constraints
8. Additional constraints (e.g., rules on date ranges)
Whenever you add a new entity
• check to see if attributes or relationships from nearby entitiesshould be moved to the new entity
• check that you haven’t introduced transitivity (clue: “loops”)
Consistency is very important to engaging your clients in the data modeling process. Have a method,
or have scripts – do the same things the same way, and draw the same things the same way. If you do
this, participants will learn modeling “by osmosis” and will learn what to expect. (E.g., that a M:M
relationship will eventually get resolved.)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.19
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Seven questions for date ranges and dates
For records dependent on the same parent…
1. Can there be gaps between date ranges of adjacent (in time) records?
2. Must the date ranges be contiguous (no gaps)?
3. Can the date ranges overlap?
For any date range…
4. Can a date range begin in the future?
5. Is a date range inclusive or exclusive of the
End Date? (“until” or “through?”)
6. Must a date range fit within the date range of a
parent entity?
7. Will the dates have to handle global time zones?
Note that in this example, we could ask the questions for both date ranges:
- Effective / End Date
- Recorded / Corrected Date
To clear up confusion around question 5, some organizations have standardized on “Last Valid Date”
instead of “End Date.”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.20
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Script – meeting a new requirement…
1) State the new
requirement as an assertion
2) Develop a conceptual solution
3) Develop a logical solution
� Start out using the client’s language� Then, ensure that the assertion uses terms from the
data model (entity names, relationship names, etc.) This “leads” you to the solution.
� Confirm it!
� Look for the simplest option first: no change needed, a new reference attribute, a multi-valued attribute(s),
M:M relationship, new entity � Explore rules, like “what is the basis for multi-valued?”� Confirm it!
� Fully normalized, fully attributed
� Follow an “orderly script” –don’t get ahead of yourself or the client
� Confirm it!, possibly using other easy-to-follow formats
such as screen or report mock-ups.
Confirm and extend the model:� discover new requirements, using a variety of techniques
Philosophy
� don’t dive in – start simple, add detail in layers
� start out in “natural language”
Issues in meeting new requirements:
Original modeler moves on, often without properly documenting the model, and subsequent modelers
don’t really understand the conceptual underpinnings of the model
Failure to confirm the requirement with the subject matter expert, often by not using techniques like
narrative assertions or concrete examples, and instead jumping too quickly into the details (keys,
normalization, detailed attributes, reference data, etc.)
When dealing with new requirements, modeler/DBA works at the physical level, instead of at the
conceptual level. The result – a tendency to “bolt on” new tables (entities) rather than properly
“building in” the new requirement. This results in more complexity than is really needed.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.21
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Refining the logical
• Non-atomic attributes:The attribute has “internal structure” - it could be decomposed into more granular (“atomic”) attributes. E.g., “Employee Address” is non-atomic, “Employee Address Street Name” is atomic – it is at the finest level of granularity that will ever be manipulated or displayed
• Semantically overloaded attributes:The attribute is “overworked” - it contains multiple differentattributes, typically encoded into a single attribute
• in the earlier days of systems, this was done deliberately by designers to save space (think of the Y2K problem…)
• now, it will more likely be done inadvertently by business people who don’t know the negative consequences of overloaded coding schemes
As the model nears completion, the entities have been made as granular (normalized) as necessary. Once the model meets known requirements, we’ll also “granularize”the attributes by finding and resolving the following:
Finally, name and define attributes, and document attribute properties
The distinction between non-atomic and semantic overload can be confusing:
A non-atomic attribute needs to be broken down into finer attributes, each of which is a “smaller” part
of the same overall attribute. See page 36 for more information and examples.
A semantically overloaded attribute also needs to be broken down, but into distinctly different
attributes as opposed to smaller pieces of the same attribute. See page 57 for more information and
examples.
Note – we don’t typically do this until after we’ve searched for, discovered, and satisfied outstanding
requirements using the techniques that we’ll look at shortly.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.22
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
A natural progression
Get into the high “value-added” space
� Contextual – helpful for large models
� Conceptual – a great way to add value
� Improve communication among all players
� Highlight disconnects – terms, rules, scope, …
!
Contextual
Conceptual
Logical
PhysicalDB
Design
Focus – scopecontext and boundaries, glossary of main terms and definitions
Focus – overviewbusiness perspective, all terms and definitions, overall structure, major facts and rules
traditional modeling and development
Focus – detailall facts, detailed rules, input to 1st cut physical design
The “Danger Zone”Analysts shouldn’t worry about physical design issues while data modeling.
reverse engineering
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.23
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Three phases in data modeling
1) Establish initial Conceptual Data Model
2) Develop initial Logical Data Model
3) Refine & extend Logical Data Model
� Focus is on developing a
core set of entities:
� named
� defined
� minimally attributed
� bound by basic rules
and relationships
� placed on an ERD
� Might start bottom-up:
brainstorm details then
synthesizing “up”
� Might start top-down:
build a contextual model,
then flesh out required
details analyzing “down”
� Experiment w. alternatives
� Refine the contextual
model, if you had one.
� Focus shifts to attribute
rigor and structure when
going to the logical level
� First check attributes for:
� completeness
� necessity
� name and definition
� placement
� Resolve attributes that are:
� multi-valued
� redundant
� constrained
� Continue experimenting
with alternate structures
� Refine conceptual model
� Focus is on refinement, and
validation via new
requirements using…
� …an event-based
approach: fast and easy…
� …or full business analysis:
� process workflow model
� use cases (external)
� service specs (internal)
� Profiling existing data
� informational needs
� Resolve attributes that are
semantically overloaded,
non-atomic, or derived
� Document attribute
properties and validation
� Specify identifiers
� Refine conceptual model
Of course, step 0) is to establish Project Scope and Objectives
We covered all of the previous stuff so you’ll be able to simplify some of the techniques for others.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.24
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Definition Dependency
Detail Demonstration
� “What is one of these things?”
� List common and unusual instances
� “Are there any known anomalies?”
� “What are the potential differences of opinion?”
� “What type of entity is this?”
� “What other entity does it depend on?”
�Essentially, is it a free-standing thing, a type of things, or repeating detail about some other thing?
�Keep it in its place!
�GEFN! HPDL!
�Sample instances
�Schematics
�Props
Reminder – the four Ds of data modeling
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.25
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Definitions must focus on what a single instance is:
• Not “how they’re used” or “how they’re created” or “why we care” or “how the process works” or
“interesting problems and tidbits” etc.
• Ask “What is one of these things?”
The most useful questions:
“Can anyone think of examples that might surprise someone else –
that is, anomalies or potential sources of confusion?”e.g., to define “Customer:”
• “In our area, other divisions are treated as customers”
• “We record recipients of charitable donations as customers.”
“Could we list some examples?”
• Rita Smith, Acme Auto, Ministry of Finance, homeowners… (aha!)
“Does this deal with “kinds of things” or “specific things?”
• “kind” - Customer Category vs. “specific” – an individual Customer
• if it’s a specific thing, still ask if there are recognized types
(e.g., Personal, Corporate, Government; Lead, Prospect, Active)
Entity definition basics
Key Point
“What is one of these things?”
The entity definition tells which things in the real world are included within our understanding of that
entity. For instance:
• The world has hundreds of millions of people who are “students”
• Which ones would we expect to find in a specific university’s Student database?
• Which ones would be excluded?
Two other useful questions:
• Are there life cycle issues to consider? For instance, Applicant to Candidate to Employee to Retiree
– does “Employee” include “Applicant” and “Retiree?”
• Does the same real-world thing appear as multiple entities? E.g., one person could be both a
“Driver,” a “Registered Vehicle Owner,” and a “Legal Vehicle Owner.” If this is of interest, you
might need to “generalize by” creating a “Person” entity.
A common error in entity definition - describing the current implementation instead of the “essence” of
what the entity is. E.g., “This entity is the ASF-72 created by Emily down in Personnel.
Another common error - using the entity name to define itself. E.g., "A Contract is a contract between
the corporation and …"
Finally, note that the last example on the slide indicates two separate “type” classifications –
Customer Legal Entity Type and Customer Status Type
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.26
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Entity definition format:
1. A description of which real-world things will be included in scope. This might be developed from a list of standard “thing types” – person, organization, request, transfer, item, location, activity, etc.
Be sure to identify specific inclusions or exclusions.
2. Illustrate with examples:
• 5 – 10 sample instances
• diagrams
• current “props” like reports or forms
3. Interesting points – anomalies, synonyms, common points of confusion, etc.
CustomerA Customer is a person or organization that is a past, present, or potential user of our products or services.
Current examples include Solectron (contract manufacturer,) Cisco System (OEM,) Arrow Electronics (distributor,) Best Buy (retailer,) M&P PCs (assembler,) and individual consumers.
Excludes the company itself when we use our own products or services, but includes cases where the Customer doesn’t have to pay (e.g., a charity.)
Entity definition format and example
CustomerWe have a variety of Customers that operate in multiple geographies, and these must be tracked in order to consolidate purchasing statistics and enable our rating process to identify our best Customers.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.27
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Guidelines for working with assertions
1. Focus on the appropriate case –most assertions begin with the word “Each”
2. Exclusively use terms from the data model –entity, relationship, and attribute names
• If there’s a concept that can’t be described with existing terms, you’ll need to add to the data model
3. If the assertion describes a relationship, you must
state it in both directions
4. If the assertion describes a relationship, be clear on whether cardinality is “one” or “one or more”
Each Instructor teaches one or more Sections(Sounds good…)
Each Section is taught by one Instructor(Really…?)
Entity definitions and uniqueness constraints are also assertions.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.28
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Logical Time Physical Time
� Effective date/time,Start date/time,
Begin date/time,etc.
� Time that data reflects the
intent of the business at the time of update
� Reality
Remember
• Can be updated
Remember
• Cannot be updated
� Recorded date/time, Transaction date/time,
Update date/time,etc.
� Time when a record was
written to the database
� Representation
Two important time concepts
Wrong – with developments like
Sarbanes-Oxley, we don’t changestored data, we add new records.
A third type of time is “User Time” - any other date/time of interest to the business
(e.g., Reservation Arrival Date)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.29
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Time dependent data – key points
� Facts that change independently should be recorded independently
� Never name the entity “History” –it probably includes present and future values
� Distinguish between
• business Effective Date
• database Recorded Date
� It’s tempting to put “Effective Date” in the key,
but it might change
� Be sure to define what End / Expiry date means
� Capture the need (the “reality”) first in the model,then factor in performance considerations
� You might need to consider time zones
• GMT / UMT
• Local offset
Plus –
• don’t change stored values, add new records
• check for “one at a time, many over time” vs. “many at a time, many over time”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.30
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Four key points about complex associations
1. You can’t tell whether a model is correct or not simply by inspecting it
– you must have business involvement
This gives rise to the other three points…
2. You must draw the model in a top-down fashion (or other systematic approach) so you can actually see dependencies
3. You must state your assumptions or understanding in narrative form as assertions, using terms (entity names, relationship names, and
attribute names) from the data model
4. You must illuminate the data model by using sample data, schematic
diagrams, scenarios, or some other understandable form
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.31
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
A quick exercise…
1. The company decides which items will be carried at which stockrooms.
2. The company qualifies suppliers to provide specific items.
(A supplier can be qualified to provide multiple items, and an item may
be provided by multiple suppliers)
3. The company enters into a contract with qualified suppliers for each
item they will provide to a specific stockroom.
Will this model satisfy the business constraints?
If not, identify specific problems and develop a better model
A 5NF violation occurs if independent relationships between pairs of entities have been lumped
together with other independent relationships.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.32
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
4th Normal Form
• 4NF - “Primary Key cannot contain 2 or more independent,
multivalued attributes of another entity”
• The classic example:
Employees may have Skills and/or Languages
This version is incorrect, becauseSkill and Language are independent
This version is correct
Again the rule is
If only certain combinations of entities are valid, create an associative entity to record those
combinations
The associative should be as “small” as possible. That is, two entities each having a two part key is
preferable to one entity with a three-part key, if each “small” entity with a two-part key could exist
independently of the other.
If Language and Skill weren’t independent, then the original model is okay. (For example, if each
Skill could only be practiced in certain Languages)
4NF is pretty obvious. Things get trickier when we look at 5NF
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.33
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
5th Normal Form
• How we model three or more related entities depends on the rules
• Agents represent Manufacturers in Regions - if any combination is valid, the model to the right is fine
• What if there are additional constraints?
– “business rules”
– only certain combinations are valid
Agent ID
Agent
Manufacturer ID
Manufacturer
Region ID
Region
Agent ID
Manufacturer ID
Region ID
Representation
Fifth Normal Form deals with associations between three (or more) entities when there are independent
relationships between two (or more) of those entities.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.34
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
5th Normal Form
• Assume the following constraints:– Agents only represent certain Manufacturers
– Manufacturers only distribute in certain Regions
– Regions are only covered by certain Agents
• Now we have a “cyclic dependency” within the key of Representation– violates 5NF
“Cyclic dependency”:Agents are related to Manufacturers,Manufacturers are related to Regions,
and Regions are related to Agents
What are the problems with the form shown above?
“Independent multi-valued relationships” and “cyclic dependency” are the usual normalization
bafflegab that hides the real issue – a 5NF violation occurs if independent relationships between pairs
of entities have been lumped together with other independent relationships.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.35
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Two sides of the house
Corporate
mission, strategy,
goals, and objectives
Operational
Business Processes
Operational
Applications
supports
Operational
Data
support
support
Executive Functions
and Processes
DSS, EIS, BI,
reporting, etc.
facilities
supports
Data Mart,
ODS, …
support
support
ETML*
We’ve looked at techniques that are appropriate for this side of things…
… but other techniques are appropriate for the information deliveryenvironment
Atomic
Data
Warehouse
* extract, transform, move, load
Entity-Relationship Model Star Schema or Dimensional Model
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.36
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Oh-oh…
A detailed data model might be too complex to present to business folks for query, OLAP, BI, etc.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.37
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Dimension Dimension
Dimension Dimension
Fact
� Used to model and implement
data structures for various
types of business intelligence
tools.
� One or more dimensional
models per warehouse model
� We’ll use the terms dimensional model and star schema
interchangeably
� Any combination of dimensions
can be used in a query
• the same dimension will appear in many dimensional models
• should be managed as “shared dimensions”
Dimensional models
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.38
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
� the central thing you want to count or measure
�has a count, usually “1”
�often details of a transaction or other core Associative entity(e.g., Sale, Shipment, Crime, Claim, …)
�can have attributes, but when they apply to a Fact they are called measures(e.g., Sale has Total Amount, Time, Payment Method)
“Facts” “Dimensions”
�how you want to organize or summarize the facts
�often a Type or Kernel entity(e.g., Region, Time Period, Product, Customer, …)
�can have attributes(e.g., Product has Category, Price, and Color)
Dimensional model concepts
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.39
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
� The fact is usually an associative
entity from somewhere quite “low”in the ERD
� The fact will usually include a “count” of something, even if the
value is implicitly “1”
• E.g., “dollars” or “hours” or “units”
� The dimensions are “clusters” of the fact’s parents, grandparents,
etc. entities
� Any combination of dimensions can be used in a query
• the same dimension will appear in many dimensional models
• should be managed as “shared
dimensions”
Dimensional model – example
CalendarPolice Force –
Location
Court Statute
Crime
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.40
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Step Notes
1
2
3
4
5
Identify questions
Identify facts
Identify dimensions
Add attributes
Add calculations
What sorts of relationships among the data are of interest? E.g., want to study sales by product color and customer, or by region and employee seniority.
What is the central thing (or things) of interest? Often a transaction or event entity with multiple
parents and classifications. E.g., a Sale
How will facts be organized? Usually an entity
related to the fact entity (a foreign key.) E.g., Employee, Customer, … May be hierarchic, e.g. Country, Region, “State”, …
What additional detail is needed? Facts have“measures” and dimensions have “attributes”. E.g., Sale units, total price, time of day, …
Identify calculations such as totals, average, or projection that should be pre-defined. E.g., average sale price, total sales per month,
The classic methodology
You may end up producing more than one star schema. Each will get collapsed into a single table
(named for the “fact”). Tables will then have to be joined (but these will be far simpler than what
would otherwise be necessary)
A few guidelines:
• Don’t try to get all your operational data perfect first, or you’ll never get anywhere
• Accept that after the data structure is in use, the questions will change. Embrace iteration.
• Manage the volume. Combining two “facts” (star schemas) into one table may cause exponential
volume increase. Focus initially on the critical measures and attributes.
• Start with a good, normalized data model that clearly shows dependency, as we’ll demonstrate in a
minute…
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.41
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Title ID
Name
Author
Title
Format Type Code
Name
Format Type
Title ID
Copy SID
Purchase Price Amt
Acquisition Date
Status Code
Format Type Code (fk)
Copy
is an instance of
Loan ID
Title ID
Copy SID
Due Date
Return Date
Status Code
Loan Item
Loan ID
Date
Cardholder ID (fk,nn)
Loan
Cardholder ID
Name
Number
Member Since Date
Cardholder
takes is part of
Dimensionis classified by
is taken by
Fact
Dimension
Dimension
Dimension
Dimension
Publisher ID
Name
Publisher
available from
Not a dimension
But it’s easier with an ERD
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.42
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
� Any parent (or grandparent or…)
entities that are encountered following M:1 relationships from the
fact are possible dimensions
� Any entities that are 1:M or M:M
from the fact cannot be dimensions without “faking” the data
� Additional dimensions not in the
original structure (e.g., Time Period) can be added
� Essentially, a basic dimensional model (no snowflakes) collapses
an ER model to a two-level structure with a 1:M relationship
between each dimension and the fact
From E-R to dimensional
Loan
Calendar Cardholder
Author Title
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.43
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Jim’s sister-in-law June has just returned from a BI conference, and she has Jim all wound up about building a query database so he can analyze sales (purchases by customers.)
Construct a dimensional model for Jim, using the following E-R model as a starting point. At this point, don’t worry about individual attributes – just which entities would collapse into which fact or dimension. A few notes:
- Jim’s has grown to a nationwide chain, with stores in many regions. Most regions cover one or more states, although some regions only cover part of a state (e.g., Northern California and Southern California). Each store is in a single city, though, and each city is in only one region.
- The layout of stores (Sections, Aisles, Store Categories, etc.) varies widely across the stores.
- The “Store Category” indicates if the store is a mall location, streetfront, “captive” (contained within another retail outlet,) etc. Web sales are not a factor.
Jim is especially interested in how the same Title sells depending on where in the Store it is displayed, because the same Title might end up in different Sections. He also wants to look at Sales by Store, Region, Artist, Publisher, Supplier, Category, … well, just about everything! You’ll have to decide what’s possible, and then be prepared to explain it to Jim!
Exercise: dimensional modeling
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.44
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Dimensional modeling exercise
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.45
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Solution: dimensional model
As it turns out, having an E-R model is invaluable in producing a valid star schema, although many
data warehouse experts will argue the point…
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.46
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
� fixed number of repeating attributes
� may be an “array”e.g., for each Quarter, also
record:• Target Sales Amount• Sales Per Employee Amount
• … ?Divisional Sales (in 1,000,000s)
Year Q1 Q2 Q3 Q4
2005 1.45 1.37 1.40 1.67
2006 1.46 1.40 1.63 1.91
2007 2.11 2.32 … …
Each row is a vector
Handling “vectors” of attributes
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.47
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
“Row-wise” table
• one row per vector;attributes go in
separate columns
“Column-wise” table
• multiple rows per vector;attributes go in a single column
• same handling as for other multi-valued attributes
• easier SQL queries(e.g., average sales)
• More efficient for sparse data
• flexible:– change vector length– add additional attributes
(like Top Sales Rep for each Quarter)
• familiar layout
• from “row to screen”is easier
• fewer tables and joins
• more suitable in DW/DSSenvironment
Advantages Advantages
Alternatives for modeling vectors
Has anyone had experience with this situation?
The point – don’t be too quick to translate reporting layouts into operational data structures
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.48
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
�When one entity occurrence can be related to another occurrence of the same entity type
�Three variations –1:1, 1:M, M:M
�Recursion and generalization often go together
Division
Department
Section
OrganizationUnit
generalizes
Recursion
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.49
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Recursion - recognizing the data structure
The name on the M:M (network) relationship could be more descriptive:
• contains / contained in
• precedes / follows
• substitute with / substitute for
Drawing out examples (the fourth “D” in data modeling) will always help
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.50
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Supertypes and subtypes
ManagementJob
� Breaks an entity down into two or more 'subtypes', or generalizes two or more into a single 'supertype'
• common relationships and attributes go into supertype
• unique relationships and attributes go into subtype
� subtypes are mutually exclusive and mandatory –
there is exactly one subtype instance for each supertype
� a.k.a. generalization-specification, or gen-spec
BargainingUnit Job
Job Title
Creation Date
Job Type Code
Hourly Wage Amt
Confidential Flag
Salary Amount
Certification
Job
required
for
requires
Supertype
Subtypes
Employeedescribesduties of
performs
all jobs
only B.U. jobs
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.51
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Generalization vs. subtyping
� “Generalization” is the usual bottom-
up O-O term;
“subtyping” is the usual top-down E-R term
� Generalize whenever two or more entities, each with their own distinct
attributes and relationships, also share
other attributes and relationships
� Automobile, Aircraft, and Vessel have
common attributes that could be generalized into Vehicle…
� …or, Vehicle could be sub-typed into
Automobile, Aircraft, and Vessel, with the same outcome
� Note that it’s common for a subtyped entity to also be classified by a type or
reference entity
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.52
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Facilitation – models are built in “sessions.” Why?
1 - The plan:orderly one-on-one interviews
2 - The reality:"the analyst as messenger"
3 - The response:facilitated sessions
Advantages:• speed and quality• commitment• communication, team building• business understanding
Disadvantages:• longer elapsed time• incompleteness• encourages parochialism• no real communication or
consensus
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.53
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Should I always use facilitated sessions?
Conceptual Data Models
� up to 8 or 10 content experts
• cross-functional
• mid to senior level
� up to 3 or 4 analysts
• facilitator, analyst, …
� up to 3 or 4 technical experts -
architect, DBA, developer, ...
� Focus is agreeing on concepts,
terminology, rules
� Sessions are essential!
Logical Data Models
� multiple, smaller groups of
content experts, or individuals
• specialists
• managers or supervisors
• “front line” contributors
� small number of IT specialists (or just one) –
analyst, DBA, developer, …
� Focus might be on Process or
Application Requirements
� Sessions are less suitable!
Key point! - Conceptual and Logical data modelingrequire substantially different skill sets.
Conceptual model to support “Fill Order” process will involve cross-functional reps
May separate into multiple logical modeling sessions for
• Customer Relationship piece
• Sales
• Manufacturing Planning and Manufacturing
• Logistics
• Accounts Receivable
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.54
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Facilities requirements
Don’t forget flipchart pens,
whiteboard pens, “wall safe” masking tape,flipchart stands & paper, rolls of plotter paper or butcher paper, Post-its,
rubber bands, note paper, …
The facilities really do influence session results...
• comfortable, roomy, and away from work area
• wide U-shaped layout
• lots of whiteboard space and “plain” wall space
whiteboard
flipchartflipchart
participant seating
refreshments, etc.
facilitator’ssupplies
No empty seats – “energy holes”
Room for everyone to work on the wall
As an alternative to the U shape, you might have “rounds” of 4 or 5 people each
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.55
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Attitude – “I’m here to do a job, not work a miracle”
DO -• Help develop objectives and plan• Enforce rules & plan• Maintain focus on topic• Press for completion and quality• Help everyone participate• Ensure recordingDON’T -• Develop content• Push a point of view
Sponsor
Facilitator Participant
• Participate!• Provide information• Suggest ideas• Make decisions
• Confirm scope and objectives• Determine and “invite” participants• Arrange other resources• Resolve difficult decisions
Everyone has a job to do - don’t try to be Atlas!
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.56
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
The world’s shortest course on facilitation
What You Do What They'll Do
� Write something up
� Watch facial expressions, and ask
� Find areas of agreement
� Use alternate forms of information
� Take time to think, and use the group
� Remember your role –facilitate, not participate
� Acknowledge what is
� Tell you if it's wrong
� Appreciate the opportunity
� Take care of the disagreement
� Build a better product
� Use the time too, and generate the way forward
� Do their job –you stick to yours
� Deal with it
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.57
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
“Before I begin my speech, let’s cover a few of the basic rules of grammar. A noun is any... ”
“Before we begin our data modeling session, let’s go over some key points about data modeling. First, an Entity is any uniquely identifiable person, place, thing, event, concept, or organization of interest to the enterprise about which facts maybe recorded. Any questions? I didn’t think so…”
Don’t begin with a lecture on data modeling
Avoid starting with the theory and practice…
Data modeling sessions go better
Allows use of data modeling in non-typical situations
a) - Getting started bottom-up
If you can get away with it,
don’t even call it “data modeling”
Why not?
• “Purple monkey water wrench” – a phrase I saw in an article making the point that our IT terms
(foreign key, referential integrity, cardinality, …) aren’t any clearer to the client
• May lead to boredom and mental shutdown
• May lead to resentment and non-participation
• It’s unnecessary! Some things are easier to just do. Coaching basketball - initially, by example.
Non-typical situations
• Goal Setting and Planning
• BPx
• Package Evaluation and Selection
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.58
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Do begin with a brainstorm
Collect(Brainstorm)
Sequence(dependency,priority, …)
Reduce(eliminate,cluster…) Problem
or question
Expand
Usefulresult
When in doubt -
make a list!
CoRSE:The Facilitator’s Friend
Lots of
suggestions
Selected set of
answers or points
Organized set of
points or topics
Not always, but it’s a good default
Gets everyone involved easily, and level-sets (“role induction”)
Level-sets
If your data model isn’t going to start with brainstorming, maybe do a “venting” brainstorm.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.59
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
“CoRSE”: the specifics
Collect (Brainstorm)
� State problem or question
� Going clockwise (fast) everyone
makes one suggestion
• “pass” if nothing to add
• “pure” brainstorming is random, not “in turn”
� Stop when everyone 'passing', or agreement to stop, or time’s up
� Record without editorializing
� might ask for short phrase
� might paraphrase for confirmation
� Keep it moving, enforce rules
� No discussion
� quick clarification or positive
comments okay
� absolutely no negative commentary
���� Reduce
� Eliminate: redundant, out of
scope, …
� Cluster
� Select
����
Sequence
� Goal: workable sequence
� By dependency, chronology, priority, …
� Not permanent – just to
organize the session
����
Expand
� Collect more info: define,
alternatives, pro/con, …
� Apply CoRSE on each item
����
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.60
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Applying CoRSE to starting a model
� For “anything related to data or information in any way,
shape, or form” (e.g., things of interest, information needs, facts,
queries, calculations, reports, etc.) Or, simply gather nouns.
Brainstorm…
For each item, ask “Is this a thing, a fact about a thing, or other stuff?”
� Circle things� Cluster facts around the appropriate thing� Other stuff will include reports, forms, systems, departments,
processes, etc. –use these as clues for more things and facts about things
Choose the fundamental terms
� Kernels, then their dependents
Entity definitions and major attributes
� Focus on anomalies and “likely sources of confusion”� Don’t worry about normalization, generalization, keys, …
Collect
Reduce
Sequence
Expand
1
2
3
4
Accessibility – no jargon! Again – this is “role induction”
“Fact about a thing” – attributes or relationships. Don’t worry about keys!!! (or normalization or
atomic attributes or generalization or ANY of that stuff)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.61
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
b) - Getting started top-down
“Draw five boxes. Any five boxes.”
Quotation Booking Confirmation& Ticketing
Amendment Flight
Stockroom Item Supplier
Inventory Availability
&
Agreements
Intake Diagnosis ServiceAssessment TreatmentPlanning
At this point, these could be subject areas, activities, states, … - it doesn’t matter!
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.62
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Working with the “big picture”
Sources:
� Review “artifacts” such as
• input formats (screens, web pages, forms…)
• output formats (reports, queries…)
• training materials or periodicals on the topic
• other written documentation
• again, search for nouns and verbs
What to do with the five boxes:
� Have clients describe what they need to know about each “box,” or what they do, or what the problems are… Just keep listening for
and noting:
• nouns – possible entities
• verbs – possible relationships and processes
• rules – constraints
• issues (problems) and opportunities
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.63
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Building the storyboard
1. Draw 5 "bubbles"
2. Fill in the last (your "closer" - the purpose)
3. Fill in the first (your "hook")
4. Fill in the middle ones (the "body") –add or subtract bubbles as needed
5. Allocate details to bubbles
6. Iterate until it flows and builds properly
Only include detail that matters!
����Making thesechanges willbe difficult
So methodsfor building
have changed
Thereforesystems have
changed
But it is vitalto our
survival
Businessissues have
changed
Operational to informational
Distributed, component-based
Cross-functional
Details
Presentations – it’s a story, so storyboard it
Used to evaluate merit and sequence
Presentation should flow like a story
• does it make sense?
• does it build to the conclusion?
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.64
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
How not to present a data model
� Using visual cuesconsistently
� Having a starting point
and direction
� Abstracting
� Masking
unnecessary detail
� Highlighting
what matters
Our models should aid
understanding by:
“Let’s start here with
Special Tax Rate Variation Comment Type…”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.65
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Start simple, and add details in layers…
• begin with two or three fundamental things
• work “across” the model, not a “deep dive” in one area
• draw the model on a whiteboard as you speak to it
• save detail like optionality until later, and primary/foreign keys until much later
Speak exclusively in the language of the business
• don’t use terms like “entity”, “optionality”, etc.
• point to the relevant entity while addressing a concept
Back it up with sample data, queries, and scenarios
Identify specific business issues or opportunities, and show how the data model helps
We’ll now walk through a successful data model presentation, followed by discussion of key points
Presenting data models
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.66
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Presenting – some specifics
� Draw it on a whiteboard while you present it, even if you have a laptop presentation. “If it’s too complicated to draw, it’s too complicated to present.”
� Draw it top down, adding a few entities at a time.
� Constantly illustrate the model with sample instances, definitions, schematics, etc.
� Regularly highlight features and constraints of the model, in business terms. E.g., Currently we can allocate a Product to one Product Category, but this model enables us to allocate a Product to multiple Product Categories at a time, and to record changes in categorization over time.
� Encourage participation –the more questions and comments, the better!
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.67
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
The five techniques that really matter
Technique
1Organize their
minds to receive the presentation
Do it live3
Present
information in various forms
4
Show, then tell5
Why?
• Otherwise, you're just "noise"
• "Why is this person telling me these things?"
• Focuses, demands that they watch
• Involves them / you• It means 'attending
has value‘
• Adds interest• Different forms have
different strengths
• Point is more meaningful if experienced firsthand
• Saves time, simplifies
How?
• "Here's the point I want to make."• "This is why you care, and how I know."
(even if it's obvious)• "These are the caveats and limitations."• "This is how I'll make my point.“ (storyboard!)
• Use memory triggers, not a script• Build up content progressively on white board,
flip chart, or screen• Add brainstorming, discussion, or questions• Have them physically “do stuff”
• Supplement PowerPoint slides with flip charts, white boards, Post-Its, handouts, etc.
• Use props – the thing itself, not a description• Use visual, auditory, and kinesthetic
approaches
• Scenario / example first, then concept / abstraction
• Problem first, solution second• Thing first, description / discussion second
Big picture first2• Provides context and
perspective• Makes subsequent
detail understandable
• Show contextual data model first, build up detailed models later
• Process context first, process flow later• Describe 5 problem areas first,
specifics of each area later
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.68
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Framework Layer
ProjectCharter
Business Objectives
The mission, strategies (customers / markets, products /
services, differentiators), goals, objectives, and measures
(e.g., Key Performance Indicators) for the organisation.(MSGO – Mission, Strategies, Goals, Objectives)
What it covers… The Technique
Workflow
modellingBusiness
Process
The activities the business carries out in order to meet its objectives. Includes the actors involved, the sequence of
steps they carry out (workflow), and the result(s) produced Provides context - a framework for developing Use Cases
and Service Specifications.
Use CasesPresentation
Services
A mechanism through which an actor in a business process
interacts with a system. Usually a GUI (graphical user interface) and reports, but could involve scanners, IVR
(telephone) systems, etc.
Service
Specification
Business
Services
A “service” offered by a system – a specific function.
Includes the business rules and data updates it is
responsible for. Requires Event Analysis, State Transition Analysis, etc.
Data
Management
Services
Data
modelling
Files and databases that provide a system’s record-keeping
functions. Determines the things a system “knows” about, and the data that is maintained about those things.
Provides a platform - language and structure for developing
Use Cases and Services.
Data Modeling in context with other BA techniques
Go
als
Ap
pli
ca
tio
nP
rocess
Data
THIS IS NOT A SEQUENCE!!! There should always be an initial emphasis on defining objectives (the
“top” layer) and also a “scope level” statement of the business processes, application functions, and data
topics / subject areas that are in scope. Also, we always do some “guerrilla” data modelling during
which we at least clarify the primary terms and definitions, and ideally develop at least an initial
conceptual model. After that, you could choose to go through the layers in whatever order makes sense
given the situation.
The benefits:
• Divide and conquer
• Everything in its place Business Services
• Cross-validation
Other terms:
• Presentation Services = User Interface
• Business Services = Application Logic or Business Logic
• Data Management Services = Persistence Services
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.69
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Use Case
actor – verb – noun:
Advisor Enrolls Student
Linkages – top-down and bottom-up
When advisor enters five characters of Last Name
Then System lists matching Students
Output Message:Result Code
Enroll Student
Verify Student Status
Check Student pre-reqs
Check Section availability
Create Enrollment
Input Message:Student NumberCourse ID
Section ID
enrolls in
offersteaches
Student
Number
Name
GPA Section
Dates
Times
Locations
Instructor
ID
Name
Rating Code
Course
Department
Number
Registrar’sOffice
Department Advisor
Student Summary
Report
Attach Reg
Form and forward
Check Reg
Form for data
changes
Enroll
Student
Service
verb – noun:
Enroll Student
Entity
noun:
Student
Business
Process
Presentation
Services
Business
Services
Go
als
Ap
pli
ca
tio
n
Business
Objectives
Data
Management
Services
Business
Objectives
Pro
cess
Data
When advisor selects list itemThen System displays expanded Student view
When advisor etc.
Each layer interacts with its neighbor.
Not all methodologies address each perspective equally well.
• Information Engineering was weak to non-existent in addressing the business process (workflow)
and presentation (use cases) layers
• Most O-O and RAD/JAD techniques don’t address business process well, if at all
Noun - A thing of interest
• “Customer”
Verb – Noun
• An activity that must be performed (process, sub-process, service, …)
• “Register Customer”
Actor – Verb – Noun
• A Use Case or a step within a workflow model
• The intersection!
• “Sales Rep Registers Customer”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.70
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Business
Process
Presentation
Services
Business
Services
Progressive detail for all analysis techniques
Project Charter:
Starts at “Scope” level, may evolve
Scope DetailConcept
Overall Process Map
showing target and
related processes.
Process “framed,” and
initial assessment and
goals stated.
List of the main Use
Cases in the form:
Actor + Service +
(optionally) Technology
/ Platform
List of main Events and
corresponding
Services.
Contextual Data Model
(optional) and a
glossary defining the
main entities and other
important terms.
As-is (and later, to-be)
Workflow Models for
the process’ main
variations (cases) to
the Handoff level.
Initial Use Case
description (goal,
stakeholder interests,
and use case abstract)
for each Use Case.
Initial Service
description - result,
main actions, cross-
referenced to
Conceptual Data
Model
Conceptual Data Model
showing main entities,
relationships,
attributes, and
constraints
As-is Workflow Models
to the appropriate detail,
and to the Service level
for to-be. Optionally,
document procedures
for manual to-be steps.
Use Case dialogues at
the “clause” (“when-
then) level of detail
including alternate
sequences. Optionally,
Use Case Scenarios.
Each service fully
documented, including
input/output messages,
validation, business
rules, and data updates
to the attribute level.
Fully normalised
Logical Data Model
with all attributes fully
defined and
documented.
Workflow
modelling
Use Cases
Service
Specification
Data
modelling
Go
als
Ap
pli
ca
tio
n
Business
Objectives
Data
Management
Services
Business
Objectives
Pro
cess
Data
SpecifyUnderstand
Clariteq business analysis framework
Plan
Three levels of detail for ALL modelling
The reason that the “concept” level is important, and that we don’t dive right into the “detail” level is
that…
the level of precision, rigor, and detail that you need in order to build something
is far greater and different in nature than that which is necessary for the business person to know if
they’re going to like what you build!
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.71
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Business
Process
Presentation
Services
Business
Services
Different roles for different perspectives
Project Charter:
Starts at “Scope” level, may evolve
Scope DetailConcept
Overall Process Map
showing target and
related processes.
Process “framed,” and
initial assessment and
goals stated.
List of the main Use
Cases in the form:
Actor + Service +
(optionally) Technology
/ Platform
List of main Events and
corresponding
Services.
Contextual Data Model
(optional) and a
glossary defining the
main entities and other
important terms.
As-is (and later, to-be)
Workflow Models for
the process’ main
variations (cases) to
the Handoff level.
Initial Use Case
description (goal,
stakeholder interests,
and use case abstract)
for each Use Case.
Initial Service
description - result,
main actions, cross-
referenced to
Conceptual Data
Model
Conceptual Data Model
showing main entities,
relationships,
attributes, and
constraints
As-is Workflow Models
to the appropriate detail,
and to the Service level
for to-be. Optionally,
document procedures
for manual to-be steps.
Use Case dialogues at
the “clause” (“when-
then) level of detail
including alternate
sequences. Optionally,
Use Case Scenarios.
Each service fully
documented, including
input/output messages,
validation, business
rules, and data updates
to the attribute level.
Fully normalised
Logical Data Model
with all attributes fully
defined and
documented.
Workflow
modelling
Use Cases
Service
Specification
Data
modelling
Go
als
Ap
pli
ca
tio
n
Business
Objectives
Data
Management
Services
Business
Objectives
Pro
cess
Data
SpecifyUnderstand
Note – this is just one possibility for roles.
Plan
Planners,
Enterprise Architects,
and Business
Analysts
Business
Analysts
Specialist
Specialist
Specialist
Specialist
One a smaller project, the same person might work on all perspectives at all levels of detail; the larger
the project, the more likely it is that different, specialized roles will be involved.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.72
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Other perspectives improve data modeling
Business Process Workflow
Business Services
Presentation Services
Data Management Services
�similar to use of events or services
� inspect each step in the workflow, discuss data needs
� is the necessary data in the data model?
�develop use cases, describe reports & queries
� is the necessary data in the data model?
�describe rules for an event (service)
� is the necessary data in the data model?
�get some real data, conduct data profiling
�does the data have a home, did profiling uncover “hidden” needs?
Mission, Strategy, Goals, Objectives
� reporting requirements�EIS, BI, OLAP, etc. needs. Is the data there?
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.73
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Techniques and methodologies
The same techniques are used in different sequences, with different emphasis, in different methodologies
Always start with
• scope and objectives (Project Charter)
• agreement on a fundamental vocabulary (a little Data Modeling)
Small projects are often best handled “inside-out” and are more suitable for “Agile” techniques
• start by identifying the main objects the system will deal with (Data Modeling)
• then identify the events and services that act on the main objects (Events, Service Specifications,
State Transitions)
• then identify how these Services will be invoked (Use Cases, then overall Process Workflow)
Large projects are best handled “outside in” and aren’t suitable for all Agile techniques
• start with an understanding of the overall workflow and the jobs or departments that are involved
(Process Workflow)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.74
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
State diagrams
The concept
� Events happen
� Whether or not that event is legitimate depends on the current entity state
� If the event is legitimate, one or more entities will be updated and their state may change - a state transition
Depicts the allowable states for an entity, the transitions between them, and the rules governing those transitions
No other style of diagram depicts so many important aspects of a system without getting unreadable.
A State Diagram encompasses:
• an entity
• events
• entity states
• allowable state transitions (business rules!)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.75
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
The basic pattern
Section Available Filled Closed
Student
enrollsSection is
scheduled
Scheduled
Time to
open enrollment
Time to
finalize rosters
Completed
Time to
end term
Student
drops/transfers Cancelled
Section is
canceled
Starts with an
entity occurrence
in the null state.
Leaves when the
occurrence is
created
States are entered and
left in response to
events. All states
"matter", and are
mutually exclusive
Eventually, states
are entered where
no further update is
possible
Key Point
• The diagram is linear or circular
All entity state diagrams begin with the entity in the null state, and the first event is always something
that causes the creation of the entity occurrence.
An entity can be in one and only one state at a time - states are mutually exclusive. The most common
error when people are learning this technique is to come up with “overlapping” states.
It’s common to return to the null state if the entity occurrence is deleted, although this example doesn’t
show it (the Registrar saves everything!).
All states “matter” in the sense that the only reason for a state to exist is to enforce a business rule. For
instance, it appears that Students can’t drop or transfer once the Class is “Closed”, and the Class can’t
be cancelled. If these rules weren’t in place, we wouldn’t need the state “Closed”.
Note that this example is different from the one on the previous page, even though they’re for the same
entity – the reason: different business rules.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.76
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Why bother?
�Get up-front agreement on the rules that must be
enforced at UI (use cases)
and Business Services
(service specifications)
� Integrates events, services,
and data modeling
�Understandable –participate in important
systems decisions
� “See” and assess rules for
the first time
� Identify inconsistent or
undefined rules
- Systems Perspective -- Business Perspective -
Key Point
• Clients get started with almost no explanation
… this may seem like extra work, BUT…“pay me now, or pay me later”
The state diagramming technique, in practice, is quite intuitive for clients to pick up. We’ve been at
many sessions where the facilitator drew a simple state diagram on the whiteboard and clients
immediately started discussing and correcting it with no explanation whatsoever of the technique.
It never fails to amaze (and amuse) us how many different versions of “the rules” there are in the
average organization. Naturally, everyone thinks their set of rules is correct, and they are usually
surprised at the alternatives.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.77
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Four basic structures
Probation
Active
Disability
Employment
Term
Inactive
Employee is
hired
Employment
Term is
Purged
1. null state
2. state
3. state transition
4. event
Employee goes
on disability
Employment is
terminated
Probation term is extended
Employee is put on Probation
Employee returns
from disability
Employee passes
probation
This example is circular, which is less common now – it gets quite awkward.
Can you spot the error?
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.78
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Components: 1 - The Null State
entity
exists
entity in
null state
Create (birth)
Delete(death)
Update(pay taxes)
The entity in a state of non-existence (hasn’t been created yet)
Indicates which entity’s life cycle is depicted
The simplest life cycle
�An occurrence which
hasn’t been created
�For a single instance
of the entity
In the UML, the state diagram begins with a solid (filled in) circle.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.79
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Components: 2 - States
A distinct stage in the life of an entity
� A status or condition� Events are only valid against
particular states
� The only reason a state is created is to enforce a business rule
� States are mutually exclusive
Order shippedtaken
� An order can’t be cancelled once it has been shipped, so we only need the states “Taken” and “Shipped”
Order taken picked loaded shipped
� An order can be cancelled without penalty if picked, with penalty if loaded, and
not cancelable if shipped
State
� May be determined by
inspecting relationships or
attribute values
� Usually summarized in a “Status” or “State” attribute
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.80
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Components: 3 - State Transitions
� Shows pre-conditions -which state(s) an event is valid against
“from”
“to”
event
A change of an entity instance from one state to another
Depicts dependencies of entity states
� Shows post-conditions -
which state(s) result from an event
Key Point
• Visual business rules
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.81
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
State transitions - special cases
conjunction
Enrolling
ClassSchedule Class
Purge Class
Filled
bifurcation
Cancelled
Enroll Student
CancelClass
Completed
CompleteClass
“recursive”“simple”
� An event may be valid from multiple
states with the same resultant state
� From a given state an event can have
different outcomes
- Conjunction -- Bifurcation -
A
C
B
C
B
A
Bifurcation often occurs at “boundary condition” of repetitive operation.
e.g., Enrollment is completed until class is full.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.82
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Components: 4 - events / services
Enrolling
ClassClass is scheduled
Filled
Cancelled
Enrollment is completedClass is
canceled
Completed
Class isCompleted
Enrolling
ClassSchedule Class
Filled
Cancelled
Complete EnrollmentCancel
Class
Completed
CompleteClass
Events or services can be shown as the cause of the state transition
State analysis is an ideal “bottom-up” means ofdiscovering additional services
Key Point
• You can show events or services or both
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.83
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Perform state analysis forall Kernel and major Associative entities
Subtypes may all be covered by theSupertype’s life cycle
Subtypes may each have their own unique life cycle
Type and minor Characteristic entitieswith a simple “Create-Update-Delete” life cycle
may not warrant a diagram
Client
Claim
Policy
Prior
Address
Policy
TypeHome
Individual Marine
AutoGroup
Guidelines – a diagram for each entity?
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.84
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Guidelines – an event can affect multiple entities
� An event affecting a characteristic or associative entity is often constrained by a parent’s state(and vice versa, less often)
� A event changing the state of an entity may also cause a state change in parent or child entity
Enrollment is completed is constrained by the state of a parent entity…
Enrolling
ClassSchedule Class
Filled
Complete Enrollment
Active
EnrollmentComplete Enrollment
Class
Enrollment
Student
… and also causesa state change in itsparent’s life cycle
Key Point
• Start ST analysis at the “bottom” – with entities that have no dependents
Class and Enrollment each have their own life cycles, but they are related
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.85
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Building a state diagram
� Get event list for entity
� Brainstorm for valid states
� Select “mainstream” states.
� Start at null state, then select initial state from list
� Ask “What typically happens next?”, and select next state
� Continue until initial State Diagram is done
- First Cut -
� Ensure that states are mutually exclusive
� Identify the event for each state transition
� Ask “Can it cause transitions to or from other states?”(e.g., conjunction or bifurcation)
� Check each event see if it is constrained by or affects the state of parent or child entities
� If sub-types are involved, check whether the state diagram works for all sub-types
- Refine -
Key Point
• Mainstream first, exceptions later
• “Bottom up” - dependents first, parents later
Key Point
• Extremely iterative within and between state diagrams
1
2
� Add remaining “non-mainstream”states or events
� Check each event against each state
� Eliminate unused stated & events as appropriate
- Complete - 3
Key Point
• Lots of detailed cross checking
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.86
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
A checklist for state analysis
Every state must matter 1� Recognizable to business people
� Restricts operations in some unique way
All states must be mutually exclusive 2
Each event is “essential”3� e.g., “Enrollment is completed” (what)
not “Student enrolls via web (who and how)
Start with the “most dependent” entity (bottom of the data model) to guard against “overloading” life cycles
4
All states (including parent and child entities) checked against each event 5
Mainstream first…. exceptions later! 6
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.87
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Update service specs
Create new services for any newly-discovered events.
For each service, build a “state table” summarizing “from”and “to” states for each entity impacted by the event.
Refine validation, calculations, and updates in service documentation. Optionally, describe logic with a UML Activity Diagram or other format.
1
2
3
Entity State Before
State After
Student Registered Registered
Enrollment (“from”) Active Ended
Class (“from”) Filled, Available Available
Enrollment (“to”) Null Active
Class (“to”) Available Available, Filled
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.88
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Exercise: Handling state transitions
1) Design a generalized data model to record valid state transitions. If a particular response is required (such as an error message) when an invalid event
arrives, be sure to handle that as well.
2) (Optional) It can provide useful analytic information
to maintain a history of state changes for the instances of important entities. For example, in the actual project that the stock exchange exercise earlier in the course was based on, it was useful to
have a history of state changes for the “Listing” and “Trade Order” entities. Develop a data model to record a history of state changes for an entity like “Listing”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.89
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Solution - Valid state transitions
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.90
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
A few additional slides
I’ve added a few slides from our introductory Data Modeling workshop covering:
- Attribute naming with classwords
- Some conventions for assigning meaningless (surrogate) primary keys
- Checking for transitivity
These are some of the topics that often require clarification during the Advanced Data Modeling workshop.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.91
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Apply attribute naming conventions
Naming format: entity name (implied) + optional qualifiers + classword
Class Word Abbrev. Description
Amount AMT Dollars and cents, or other currency (e.g., Penalty Assessed Amount)
Code CDE Decodes into a name and/or description via lookup (e.g., Vehicle Type Code)
Constant CNS A fixed value, usually numeric (e.g., Pi Constant – 3.1415…)
Count CNT Like Quantity, but specifically for a quantity of items (e.g., Requested Count or On Hand Count)
Description DSC Multi-line descriptive text (e.g., Incident Description)
Date DTE YYYY/MM/DD (e.g., Incident Date)
Identifier ID or IDN Attribute that uniquely identifies an entity occurrence, usually system-generated (e.g., Customer ID)
Indicator or Flag IND or FLG Yes/No (True/False) attribute (e.g., Time Period Available Flag)
Name NME Single line of name text (e.g., First Name or Last Name)
Number NMB A unique identifier assigned by an organization (e.g., Driver License Number)
Secondary ID SID Forms a unique identifier when combined with identifiers inherited from the parent (e.g., Dependent SID)
Percent PCT Integer or number percentage (e.g., Penalty Percent)
Quantity QTY A count of anything – either items (like Count) or of a unit of measure like gallons or feet. (e.g., Maximum Width Feet Quantity) Variations are Volume (VOL), Length (LNG), or Area (ARE)
Rate RTE A ratio using defined numerator and denominator (Percent is a Rate attribute with a numerator of 100) (e.g., ???)
Text TXT Multi-line alphanumeric data other than Name or Description (e.g., Standard Disclaimer Text)
Time TME HHMMSSNN… to the needed fraction of a second (e.g., Incident Time)
Timestamp TMS Date and time in a single attribute (e.g., Record Creation Timestamp) (e.g., Record Creation Timestamp)
There are a variety of naming formats in general use - mixed case with words separated by blanks (e.g.
“Effective Date.”) is the most readable
There are certain date-related attributes that will occur many times in all models, such as “Effective
Date”, “End Date”, “Create Date”, “Superseded Date”. Agree on standard names (e.g., choose
“Effective Date”, “Start Date”, or “Begin Date”) and then use them consistently.
Attribute definition should explain the meaning and purpose of the attribute - in other words, how to
interpret attribute values. Not:
• … a restatement of the attribute name. For instance, for “Person Social Security Number”, the
definition “The Social Security Number of a Person” tells us nothing new. A better definition
would be “ A number issued to wage earners by the Social Security Administration for the purpose
of crediting employees with contributions to future retirement pay as stipulated in the Federal
Insurance Contributions Act.”
• … a description of how the attribute is handled by current systems. For instance, “Budget Center
Code is an 11 character code captured in the GL system and assigned to a Department.”
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.92
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Primary keys – essential concepts
What they are… What they’re not…
�One or more attributes with a unique value for each instance of an entity
�There might be many identifiers - one is chosen as the primary identifier, the rest are alternate
�A way to reference an instance of an entity (e.g., a row of a table)
�Used to establish relationships between entities (or tables)
�The only access or search path
�The fundamental way the business distinguishes:
• one instance from an other
• a new instance from existing(e.g., Customer applying for credit)
In short, how we relate entities is not necessarily how the client distinguishes
or accesses them
Customer:Possible keys:• Customer Name +
Postal Code• Sales Region +
Customer Number• Account Number
Part:Possible keys:• Part Category +
Manufacturer Prod #
Employee:Possible keys:• SIN or SSN• Name + Address• Name + Birthdate• Portrait + Voice
Reservation:Possible key:• Room Number +
Start Date
Assigning primary and foreign keys is really part of physical database design, but the concepts are
important so we’ll cover them here.
As modelers, we should focus initially on determining how the client determines the uniqueness of
entities, and how they search for particular instances.
What’s wrong with the possible primary keys shown above?
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.93
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Meaningless primary keys
Almost invariably eliminates any choice except keys made up frommeaningless, system-generated ID or Secondary ID (SID) components
Customer:• Customer ID…is better than…• Customer Name +
Postal Code• Sales Region +
Customer Number• Account Number
Part:• Part ID…is better than…• Part Category +
Manufacturer Prod #
Employee:• Employee ID…is better than…• SIN• Name + Address• Name + Birthdate• Portrait + Voice
Reservation:• Reservation ID…is better than…• Room Number +
Start Date
stable (unchanging)
� under your control
� contains no meaningful data, because it will eventually change(and no “special values” like Customer Number 9999999)
� 'key hierarchy' is unchanging when an inherited key is used as part of identifier
available
� known, or can be assigned, at instance creation
Essential characteristics
Key problems:
• embedded meaning
– Customer 99999
– Customer ID with Head Office Region Code built in
• insufficient for expansion
– 1 digit code field
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.94
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
• A means of specifying aparticular instance of an entity
• Typically
� Kernel - a system assigned ID
� Characteristic - the key of the parent plus an SID
� Associative - the key of all parents, plus an SID if necessary(if the same parent instances can be associated multiple times)Important associatives are often given their own ID (e.g., Order ID)
� Reference or Type – a recognizable Code or a meaningless ID
Keys - summary
OrganizationUnit Job
PositionBuilding
Employee
Employee ID
Name
Address
Birth Date
Gov’t ID Number
The Primary Key is shown above the dashed line
Job Code (PK)
Title
Description
Alternate Key
Org. Unit ID
Position SID
Building ID (FK)
Job Code (FK)
Org. Unit ID
EmployeeDependent
Employee ID
Emp. Dep. SID
Name
Relationship Code
Birth Date
Building ID
Name
Address
is located at
is assigned to
is the location of
is filled by
Employee ID is an inherited key that forms part of the primary
key of Employee Dependent in combination with the SID
(Secondary ID). It also acts as a
foreign key.)
An alternate method of
showing that the identifier of Job
is Job Code
Building ID is a foreign key
that implements the
relationship to Building
is contained in
contains
classifies
is classified by
There can be many “candidate” or “alternate” keys, also referred to as “business identifiers” or “natural
keys”
• for instance, Employee may have a unique Government ID Number, Employee Number, and
System Logon ID
• one of these could be chosen as the Primary Key, if they meet the criteria; otherwise (normally)
assign a system-generated identifier
• the rest are called Alternate Keys or something similar, and must also be unique (put a unique index
on them)
Some methods use a “shorthand” technique for showing inherited keys in associative or characteristic
entities - the relationship via which parent keys are inherited is marked as an “identifying” relationship.
In one technique, an “I” is put across the relationship line, and in another, identifying relationships are
drawn with a solid line, while others (non-identifying”) are drawn with a dashed line. Normally, we
show the complete, inherited primary key.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.95
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Key propagation rules
An exception - dependent entities (associative or characteristic) are assigned a meaningless ID if they
can be “transferred” to another parent, or if they are very deep in the hierarchy.
Also, if an associative entity only has one parent (e.g., “Order”, where the connection to the other
parent is via another dependent associative entity) it may get its own meaningless ID. This is often true
of associatives that represent an important transaction and are therefore almost like Kernels, e.g. Order,
Sale, Contract, Shipment, etc.
Note - keys always propagate to the “many” end of the relationship. How would you decide where to
place the foreign key in a fully optional 1:1 relationship?
Whether you show the propagated foreign keys on your diagram, or instead flag relationships as
“identifying” is a matter of personal preference or organizational standards. In this workshop, we’ll
always show the propagated foreign keys.
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.96
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
How far to go?
Each of the above alternatives employs the concept of “meaningless identifiers”, but differently
• the one on the left assigns an ID to kernel entities, while associative and characteristic entities
inherit the ID of their parent(s)
• the one on the right assigns all entities a unique ID
In teams, discuss the relative strengths and weaknesses of the two approaches. Which would you
choose?
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.97
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Transitivity
� A “loop” (two or more paths between a pair of entities) might indicate a problem -
• if the two paths record the same information, one of the relationships is redundant
a.k.a. “transitivity” or “a transitive relationship”
• like redundant attributes, redundant relationships introduce data integrity problems
� Are the two paths between “Order” and “Customer” transitive?
We can’t tell just by looking…
� The presence of a “loop”(a “cyclic relationship”) is only a
clue that there is a problem –for proof, we must perform
Information Loss Analysis(fancy name, simple method)
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.98
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Checking for transitivity
� Check for transitivity using
Information Loss Analysis -
• one at a time, check each relationship in the loop
• Ask –“Could this relationship be eliminated without losing necessary information?”
• If “Yes” –The relationship is redundant, and can be removed from the data model
• If “No” –The relationship is necessary, and remains in the data model
� If the two paths have clearly different meanings, there is probably no redundancy,
and therefore no need to apply Information Loss Analysis
Advanced Data Modeling extract © Clariteq Systems Consulting Ltd.99
ClariteqADM extract
© 2010 Clariteqcontact [email protected]
Transitivity - examples