Upload
audra
View
32
Download
0
Embed Size (px)
DESCRIPTION
DAMA, 2001 December. ORMvER. What’s Wrong With ER Modeling ?. Gordon C. Everest Carlson School of Management University of Minnesota. Problems and Solutions. ORMvER. OBJECTIVES FOR THIS PRESENTATION: - PowerPoint PPT Presentation
Citation preview
1
DAMA, 2001 December.
What’s Wrong With
ER Modeling?
Gordon C. EverestCarlson School of Management
University of Minnesota
ORMvER
2
Problems and Solutions
OBJECTIVES FOR THIS PRESENTATION:
• Show several PROBLEMS with ER modeling schemes,(actually, any “record-based” modeling scheme).
• Identify the ROOT CAUSE of the problem
ORMvER
• Show you a better way – a SOLUTION using Object Role Modeling (ORM)
To stop there would be irresponsible, so…
• NOT asking you to abandon what you have learned about data modeling and are doing in practice
• BUT to defer thinking in terms of entity records, andto begin doing data modeling at a richer, more conceptual level
3
Data Modeling
BEFORE WE CAN ANSWER THAT:
Why Do Data Modeling?
How do we do Data Modeling?
Why do we need Normalization?
DMOD
What’s Wrong with ER Modeling?
What is the Dominant Data Modeling Scheme today?
4
Database Design
Objective: (WHAT we are trying to do)
TO ACCURATELY AND COMPLETELY MODEL
SOME PORTION OF THE REAL WORLD UNIVERSE OF DISCOURSE (UoD)
OF INTEREST TO SOME ORGANIZATION OR COMMUNITY OF USERS.
DMOD
5 · OBJECTIVE of LOGICAL DATABASE DESIGN:TO ACCURATELY AND COMPLETELY MODELSELECTED PORTIONS OF THE REAL WORLDOF INTEREST TO A COMMUNITY OF USERS.
Logical Database DesignObjective, Principles, Benefits
DMOD
• USERS (COLLECTIVELY) WILL ALWAYS KNOW MORE ABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS, OR THAN COULD BE DEFINED TO THE SYSTEM.
• WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM,THE SYSTEM CANNOT MANAGE . . . THE USERS MUST!
• THEREFORE, NEED TO CAPTURE RICH SEMANTICS WITH COMPREHENSIVE DATA MODELING and DEFINITION, INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS.
FOR ==> GREATER QUALITY & RELIABILITY IN DATA
==> GREATER USER CONFIDENCE.
==> HIGHER USER / DEVELOPER EFFICIENCY
Let the ‘system’ do it!
6
Purpose of Data Modeling (WHY we do it)
DUAL, CONFLICTING PURPOSES DRIVE THE PROCESS:
• Facilitate Human Communication, Understanding, & Validation– capture and present meaning, the semantics of a model– direct representation of only essential model semantics
PRESENTATION CHARACTERISTICS:
– scoping and presenting subparts of a Model– unfolding presentation at different levels of abstraction or detail– visual prominence in proportion to semantic importance
SECONDARY:
• Basis for Implementation - defining & creating a Database– complete in all the necessary details– construction/generation able to be fully automated
DMOD
USER
SCHEMA
DATABASE
7
Modeling
(Re).present.(ation)
Knowledgein the world
Knowledgeexternalized,formalized, shared.
Knowledgein the head(mental models)
Reality MODELMODELINGPROCESS
DMOD
pres
ent
Re.
pres
ent
What drives or guides the process?
8
The Modeling Process
Real WorldUniverse of Discourse
MODELINGPROCESS
MODELING SCHEMEContext
ConstructsCompositionConstraints
MODEL
perceptionselection/filtering
DMOD
METHODOLOGY:
Steps/Tasks + Milestones + Deliverables +
REPRESENTATIONAL FORMS:Narrative, Graphical Diagram,Formal Language Statements
(the Syntax)
9
A Data Modeling “Scheme”
DEFINES the:
• Context
• Constructs (ENTITIES, OBJECTS)
• Collections, Compositions, Connections (RELATIONSHIPS)
• Constraints, Characteristics
WE LOOK FOR IN THE “REAL” WORLD UoD or Domain of Interest
and
USE IN BUILDING A DATA MODEL.
DMOD
10
Data Modeling Constructs
ENTITY(OBJECT)
ATTRIBUTE
RELATIONSHIP
IDENTIFIER [ FOREIGN KEY ]
characteristics
characteristics
What to look for:Relative emphasis differentiates Data Modeling approaches
DMOD
11
Student-Course Database - Table Diagram
COURSECourse# TitleDescriptionCredits
INSTRUCTORSSNLastNameFirstNameAddressPhoneDept
STUDENTStudent IDNameAddressMajorGPA
COURSEOFFERINGCourse#YearTermSectionBuildingRoomDaysTime StartControlEnrollmentInstructor SSN
REGISTRATIONCourse IDStudent IDGrade
LEGEND:
ENTITY NAME (upper case)
Identifier (bold face)
Attributes (not bold face)
Foreign Key Identifier M:1 relationship
DMOD
Diagram of the Schema:
What if you move the arrow head to the other end of the arc?
12
Student-Course Database – PopulatedActual instances of data values:DMOD
ACC101 Intro Accounting 4ENG101 English Composition 4MIS101 Intro MIS 4MIS103 Intro Database 4MIS403 Advanced Database 2
…
COURSE:Course# Title Credits
33741 Allen, Lillian Eng85959 Boyd, Don ACC64578 Carlis, John CSci11248 Davis, Gordon IDS77004 Everest, Gordon IDS55432 Fine, Alan IDS
…
INSTRUCTOR:InstrID Name Dept
1111111 Able, Emma MIS 3.42222222 Bright, Sue MIS 3.93333333 Challenger, X ACC 2.74444444 Dummie, Noe ACC 3.25555555 Everest, Monty MIS 3.8
…
STUDENT:StudentID Name Major GPA
1004 MIS101 2000 Fall 001 1-142 11248 481017 MIS101 2001 Spr 002 2-224 55432 603001 MIS103 2000 Fall 001 2-207 77004 27
…
COURSE OFFERING:CRSO# Course# Year Term Sect Room InstrID Enroll
Secondary (Composite) Key
1004 4444444 B+
1017 33333333001 1111111 B3001 2222222 A3001 5555555 A3001 7777777 A-
…
REGISTRATION:CRSO# StudentID Grade
13
Data Modeling – Schema Diagram
THINKING ABOUT ATTRIBUTES:
Record-Based:
DMOD
ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ...
14
Essentials of ER Modeling / Diagramming
ENTITY1 RELATIONSHIP ENTITY2
ENTITY1============
IDentifier 1---------------------
Attribute 1.1Attribute 1.2Attribute 1.3
:
ENTITY2============
IDentifier 2---------------------
Attribute 2.1Attribute 2.2Attribute 2.3ForeignID 1
:
M1
DMOD
Adding Attributes, omitting the Diamond:
ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ...
ENTITY1
ENTITY2
Attribute
relation
ship>
identifier
Attribute1
Attribute2
Attribute3
15
What’s wrong with
ER Modeling?
________
ORMvER
16
ER / Record-based Modeling
VALUEVALUEDOMAINDOMAIN
VALUEVALUEDOMAINDOMAIN
VALUEVALUEDOMAINDOMAIN VALUEVALUE
DOMAINDOMAIN
TABLE:ID ATTRIBUTES . . .
... roles
CLUSTERING of ATTRIBUTES into RECORDS/RELATIONS
– NOT a necessary or desirable first step– gets us into trouble: if too much, must decompose to normalize
X A B C D
DMOD
17
X A B C
Record-based Design
WHAT SEMANTICS ARE PRESUMED BY THE FOLLOWING RECORD STRUCTURE?
• What does it say about X ?• What does it say about A ?
• What does it say about the relationship X – A ?
• What does it say about the relationship A – B ?
There are at least 14 distinct semantic statements you can make in answering these questions!
• Do we know it is in Third Normal Form (3NF)? How?
ORMvER
18
Record-based Design
WHAT DOES IT SAY ABOUT X ?
X A B C
ORMvER
19
Record-based Design
WHAT DOES IT SAY ABOUT A ?
X A B C
ORMvER
20
Record-based Design
WHAT DOES IT SAY ABOUT THE RELATIONSHIP X–A ?
X A B C
ORMvER
21
Record-based Design
REPRESENTING THE RELATIONSHIP X–A ?
X A B C
A D ...
ORMvER
N
22
Record-based Design
WHAT DOES IT SAY ABOUT THE RELATIONSHIP A–B ?
X A B C
ORMvER
23
Record-based Design
REPRESENTING COMPLEX RELATIONSHIPS AMONG X, A, & B .
X A B C
A ...
B A? ...
? Separately consider the relationshipbetween A and B.
What if it is many-to-many?
What if other information is functionally dependent on A–B ?
ORMvER
24
Record-based Design - Compound Key
WHAT IS PRESUMED BY THE FOLLOWING RECORD STRUCTURE?
X Y A B C
ORMvER
25
Major Data Modeling Schemes
(1) SINGLE FILE (E-A) FLAT FILE “TABLE” HIERARCHICAL - nested repeating groups
e.g., COBOL
(M) MULTIFILE (E-R → E-A-R) NETWORK - hierarchical records
RELATIONAL (E-A-[R]) - flat records
(O) NO FILE (O-R) (No Clustering of Data Items into Records)
NIAM/“Binary” Modeling ORM (Object-Role Modeling - Halpin)
RECORD-BASED(Clustered Data Items)
DMOD Everest-DM-4p.121.
26
Data Modeling Schemes
CLASSIFIED by Degree of Clustering:• No clustering
– NIAM/ORM - Nijssen, Halpin
• Clustering to One Level => Atomic Data Values– Relational Modeling - Codd– ER Modeling - Chen– Extended ER (EER) - Teorey– Information Engineering (IE) – Clive Finkelstein -> James Martin– Oracle (Designer*2000) - Barker– IDEF1X - Appleton, US Gov’t, ERwin (tool), Bruce (book)
• Nested Objects– Hierarchical data structure (single file; COBOL)– CODASYL Network (ANSI NDL)– Nested Relations– Semantic Object Modeling (SOM) – Kroenke, Salsa (tool)
– Object Modeling (UML) – Rational Rose (tool)– ANSI SQL:1999
DMOD
27
Data Modeling Schemes – ClusteredDMOD
ER
NETWORK
HIERARCHIC
- multifile, hierarchical record- defined relationships
- single file- nested repeating groups- implicit hierarchical relationships
- Focus on E & R, hidden record structure- Usually flat records [optionally with attributes]- Defined relationships (general M:N)- Usually restricted to binary relationships
RELATIONAL- Multifile; flat records only- Relationships as foreign keys
so no M:N relationships
special case
=> semantic/ OBJECT models
28
Taxonomy of “Clusterered” Data Structures
DMOD
Clustered
Intr
a-R
eco
rdS
tru
ctu
re
Flat
Nested
SINGLE FLATFILE (“TABLE”)
RELATIONAL(“TABLES”)
HIERARCHICALFILE
(CODASYL)NETWORK
SingleFile
MultipleFiles
29
Stages of Data Modeling
CONCEPTUAL
DMOD
USER
SCHEMADATABASE
CLUSTERED “LOGICAL”
RELATIONAL
PHYSICAL
DomainKnowledge
ORM• Objects• Obj. ID’s• Roles/Relships• (Fnl. Dep)NO clustering=> NO “attributes”
Attribs in RecordsMultiValued, Nested - - - - - ->
Ternaries - - - - - ->
M:N - - - - - - - - - ->
Normalized (2,3,4)Relationships - - ->
w/attributesSub/SupTypes
Flat (1NF)Binary only1:Many onlyPrimary KeysForeign Keys
• Implementation in/for a DBMS
• Denormalize (for performance)+ triggers, stored procedures
ER
Start at the highest Conceptual Level!
30
Data Modeling - Representation Stages A SECOND CUT:
• Conceptual (ORMHALPIN/NIJSSEN SUMMFULTON UDMCDMTG)
– only what the user knows or needs to know– functional dependencies fully represented– Elementary Facts - no clustering of “attributes” into “records”
• Clustered (ERCHEN EERTEOREY SDMMcLEOD SOMKROENKE SQL:99ANSI UML)– identifiers (attributes or dependent relationships)– keep: M:N, ternary relationships, super/subtypes,
attributed relationships, multi-valued items/rgroups
• “Logical” (RELATIONALCODD SQLANSI )– flat files/tables; – stored identifiers; – 3NF (decompose)
– resolve: M:N, ternary, super/subtype relationships– foreign keys to represent relationships
• Denormalize (Recluster) - for performance
• Physical (IMPLEMENTATION in a DBMS)– triggers, stored procedures, user code to
represent and enforce semantics beyond the DBMS.
USER
SCHEMA
DATABASE
DMOD
NEW
31
Data Modeling Schemes - ER
• ENTITIES, that have ATTRIBUTES, and participate in RELATIONSHIPS.
• Originated with Peter Chen, 1976, TODS (1:1)• Notation has evolved, many variations
– Drop diamond; attributes inside entity box or suppressed.• No standard syntax notation (but similar semantics)• Common: attributes clustered into entity records.• Most popular today• Weak entity - Association entity -• Relationship naming: one name, direction unstated,
thus ambiguous; need direction (>) or rule (eg. left to right).
DMOD
EMPLOYEE works in DEPT
EmpNo EmpName UnitNo Name… …
1M
32
Data Modeling Schemes - Oracle
• In Oracle Designer*2000 tool (R. Barker, A-W, 1990)
• A flavor of ER modeling
• ENTITY in rounded box; optionally ATTRIBUTES inside
• ATTRIBUTE flags: # - [part of] identifier
* - mandatoryo – optional
• RELATIONSHIPS: - binary only- two names at end from which to be
read- optional ---, mandatory —,
many- identifying ———, fixed ———
DMOD
DEPTworks in
employs
EMPLOYEEEmpNo (#)EmpName (*)Address (o)
33
Data Modeling Schemes - IE
• Information Engineering (1970’s)
• Due to Clive Finkelstein, adapted by James Martin
• Used in several tools: IEF, IEW/ADW/Cool, ICES, …
• Widely used, many variations, no single standard• ENTITIES: in boxes, optionally with ATTRIBUTES, in or out
• RELATIONSHIPS: - usually binary only- many ——— , at most one
———- optional ——— (at the “other”
end)- mandatory, at least one ———
DMOD
EMPLOYEE DEPT
34
Data Modeling Schemes – IDEF1X• U.S. Air Force/Defense (1970’s), Appleton eXtensions• NIST (U.S. Govt) standard – 1993; revised in IDEF1X97; IEEE -
1998• Book by T. Bruce, 1992; Used in ERwin (now from CA), Visio, …• Widely used in and for U.S. Govt work, some outside• Some Relational restrictions: Foreign Keys, thus no M:N• “Unnecessarily complex, confusing, and forgettable” - Halpin• ENTITY: independent - , dependent -• ATTRIBUTE flags: - Alternate Key - (AKi), Foreign Key -
(FK)- optional (O) – mandatory is default
• RELATIONSHIPS: - binary only, “child” ——— (may be
arbitrary) - First Name always read toward the
child- identifying —— , non identifying -----
- “cardinality” on child: P - one or more, Z - zero or one, n - exactly n ----- Parent is optional (some allow many parents)
DMOD
EmpNoEmpNameSS# (AK1)Address (O)UnitNo (FK)
EMPLOYEE
DeptNoDeptName :
DEPT
employs/works in
35
Forming a Relational Data Structure
• Define a TABLE or “Relation” for each Entity type– Types of Entities: base/reference, dependent (“weak”),
association/intersection, event/transaction– Assumes mutually exclusive (non-overlapping) populations
• SINGLE-VALUED ITEMS (“flat” tables)– If multivalued or nested repeating group of items,
put into a separate table
• IDENTIFIER for every table (entity “integrity”)
• FOREIGN IDENTIFIERS to represent all relationships1:M - stored in the child / dependent entity1:1 - should probably merge into one tableM:N - must introduce an association/intersection table
• NORMALIZE to second and third normal form– important for good design– but not enforced by RDBMS... WHY?
RELSQL
Some rules:
36
Functional Dependency in Relationships
Basis for Database Normalization.
X Adetermines
is functionally dependent onA f (X)
X A …
X AA is dependent on X, and the Relationship is exclusive on A, multiple on X.
Clustered into a Record/table for entity of X:
There can only be one A for each X .
There can be multiple Xs for a given A .
There can be different As for the Xs .
RELSQL
37
Database Normalization Start with ENTITIES, their IDENTIFIERS (unique keys)
and their ATTRIBUTE FIELDS (facts about each entity).i.e., start with data items clustered into records/tables.
PROBLEM: we may do it wrong; cluster too much; some items in the wrong place, which can lead to redundancy & update anomalies.
Any Flat File is a Relation, but… not all Relations are “well-formed.”
• NORMALIZATION is the test– a set of rules to perform internal validation of a data model
• Record DECOMPOSITION is the remedy.– Removing attributes from the entity record, and placing them in
a different, often a new entity record
(1) First Normal Form: no multivalued items or rgroups.
(2) Second Normal Form: no partial dependencies.
(3) Third Normal Form: no transitive dependencies.“Every non-key data item must be single-valued, and dependent upon
the key, the whole key, and nothing but the key… so help me Codd.”
RELSQL
38
Anomalies
Resulting from (clues to) poor database design:
EMPLOYEE# EMPNAME SKILL PROFICIENCY … BOSSNAME DEPT# DEPTNAME
o DEPTNAME and BOSSNAME stored redundantly
• if EMPLOYEE moves to another DEPT#, DEPTNAME and BOSSNAME would also change, needing update.
• If a DEPTNAME (or BOSSNAME) for a DEPT changes, must update all occurrences, else inconsistency.
• To delete a DEPT you must also delete all its EMPLOYEEs (unless null foreign keys allowed!)
• If you delete the last EMPLOYEE in a DEPT, you also delete that DEPT (unless null keys allowed!…multiple?)
• No place to insert a DEPT# and its DEPTNAME, if there are no EMPLOYEEs there.
RELSQL
39
Summary of all Normal Forms
GIVEN:
– a set of attributes, clustered into tables/records with identifiers– all functional dependencies on the attributes
• No multi-valued, non-key attributes (1NF)
• No partial dependencies on non-key attributes (2NF)
• No transitive dependencies in non-key attributes (3NF)
• No partial or transitive dependencies within any key (EKNF, BCNF), i.e., consider all candidate keys.
• No multiple, independent multi-valued attributes in the same table (4NF)
• No join dependencies, i.e., a relation can be reconstructed without loss of information by joining some of its projections (5NF).
• No more than one table with the same key (“minimal”).
• No transitive dependencies across tables (“optimal”).
NOTE: number order is artificial, i.e., there is no necessary sequence to the normal forms.
RELSQL
40
Normalization – Testing your Understanding
Assuming that A is single valued with respect to X (i.e. 1NF).
GIVEN:
RELSQL
X A
X A B
X A B
MUST DISTINGUISH THE PRIMARY KEY .
X A B
2NF? 3NF? 4NF?
2NF? 3NF? 4NF?
2NF? 3NF? 4NF?
Could you have a violation of: (if not, why not?)
What does this diagram mean?How does this differ from diagram above, if any?
41
Representing a M:N Relationship
• If you cannot store multiple Projects (or Project IDs) in an Employee record, or multiple Employees (or Employee IDs) in a Project record (as is the case in a Relational Database), then …
DMOD
EMPLOYEE PROJECT
EMPLOYEE PROJECT
EMPL-ID PROJ-ID
• The Intersection Entity also provides the place to store additional attributes of the relationshipe.g., Hours Worked, Rate of Pay, …
What is the problem with this representation?
N
Another Pattern:
you must introduce an “Intersection Entity” between them to represent the Many-to-Many Relationship.
42
Representing a Ternary Relationship
While we can develop a consistent notation for binary relationships, ternary relationships are a problem.
DMOD
EMPLOYEE SKILL
PROFICIENCY
• If one of the entities is single valued, is it really ternary? Or “attributed” binary?
• What lends uniqueness to each instance of the relationship?
• How to verbalize the relationship? Which order?• How to represent Multiplicity / Exclusivity ?• How to represent Dependency? Must have all 3?
43
What’s Wrong with ER Modeling?
I will show you still
a more excellent way– PAUL, I Cor 12.31
ORMvER
N
44
Record-based Design B
WHAT DOES THIS “RECORD” REPRESENT?
X A B C
X A
X B
X C
Design minimal records with at most one non-key domain.
Now what do these “records” represent?Perhaps Codd was right in naming it a _________!
Avoids spurious associations, e.g., A – B …Could there be any violations of normal forms?
What about the representation of the entity X ?
What if A is related to other “entities”?
ORMvER
45
Transform Record-based (ER) Design
TO REALLY REPRESENT THE ENTITY DOMAINS
X A B C
X A
X B
X C
X
C
B
A
ObjectRoleModel:
ORMvER
46
Data Modeling
THINKING ABOUT ATTRIBUTES:
Record-Based (ER):
ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ...
Object-Role (ORM):
ENTITY(id)
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITIES have ATTRIBUTES / DESCRIPTORS by playing roles in relationships with other entities.
ORMvER
ENTITY
47
Record-Based Modeling
GIVEN TWO FACTS (conceptually):
• one about the CITY a PERSON lives in
• another about the CITY a PERSON works in
ASSUME:
• every person has to live and work in a city
• each person can live and work in only one city at a time
• not interested in anything more about persons or cities
EXAMPLE: • Gordon Everest lives in Falcon Heights and * works in Minneapolis
DIAGRAM A CONCEPTUAL DATA MODEL– to represent this information (a database to contain these facts)
ORMvER
48
Record-Based Data Model for PERSON lives in / works in a CITY
• What is the entity and what is the attribute?
• Would it make any sense to say (to a novice layperson - a user):– CITY was an "attribute" of PERSON?
• Doing more than is necessary at the conceptual level
PersonID [key]
LiveCity
WorkCity
• cannot have CITY and CITY as attributes of PERSON
• column/attribute name reflects " entity + role "
• CITY as an entity/object is lost (not its own table)
• what if there is a CITY where no one lives or works
• some add concept of a DOMAIN
ORMvER
PERSON
49
Object-Role Model for PERSON lives in / works in CITY
FORML language statements:
• PERSON lives in CITY
• Every PERSON lives in some CITY
• Each PERSON lives in at most one CITY
• ... for works in
PERSON
(id)
CITY(name)
lives in
works in
FACT
ORMvER
50
Record-Based Modelingfor an additional fact.
• A PERSON makes sales calls in multiple CITIES
DIAGRAM the extended conceptual data model
• can you add an attribute "SalesCallCities" to PERSON?
FLAT Record-Based Modeling is even worse:
• create a new table SALESCALLS with a compound key– Is this a real entity in the conceptual view?
EXTEND THE OBJECT-ROLE DATA MODEL
ORMvER
51
Record-Based Data Modeling DISADVANTAGES:
• no way to capture the conceptual view directly
• must mentally map from conceptual view to the "logical" (record-based) view
– by structural groupings of attributes and relationships
• must choose unique, arbitrary names– for attributes in a record; for spurious new "entities"
• cannot reuse attributes in the same table
• must do your own normalization
• hides or ignores inter-attribute relationships
• creates (implies) spurious inter-attribute relationships
ORMvER
52
Object-Role (ORM) Data Modeling
THE ESSENTIAL DIFFERENCE:
• Three main constructs ..rolled into.. Two main constructs
ENTITY
ATTRIBUTE
RELATIONSHIP
? ? ? ?
Role inRELATIONSHIP
What to call it?
OBJECTENTITYENTRIBUTE!
Record-based modeling: NIAM/ORM modeling:
ORMINTRO
53
Data Modeling Terminology
O-R("conceptual")
OBJECT
FACT SENTENCE
PREDICATE
CONSTRAINT
E-R("logical")
ENTITY (TYPE)
ATTRIBUTE
INSTANCE
IDENTIFIER
RELATIONSHIP
CHARACTERISTICS
COBOL/DBTG
RECORD TYPE
DATA ITEM (ELEMENT)
RECORD
"SET"
RELATIONAL
RELATION TABLE
COLUMN FIELD
ROW TUPLE
KEY
FOREIGN KEY
CONSTRAINT
("physical" implementation)
ORMINTRO
54
Fact Sentence - Verbalize
• A Fact = a Predicate + Object(s) => Sentence
• THINK: Objects playing Roles in a Relationship
• Naming: object instances versus object types– e.g. “Ann” is an instance of “Person”
• Arity - the number of object “holes” in the Predicate– UNARY: - “Ann smiles”
only 2 states: true/false, present/absent, yes/no making the closed world assumption
– BINARY: - “Ann likes to run” most common has an inverse - “Running is liked by Ann” Inverse name is never the same (else symmetric, handled differently)
– TERNARY: - “Ann married Bob in 1967” with types: - “PERSON married PERSON in YEAR”
verbalizing can be difficult with more than 2 (sequence problem)
ORMODLG
55
Symbolize: ORM Constructs
OBJECT1 OBJECT2
PREDICATE
role12 role21
Elementary Binary Fact Sentence:
PERSON DEPARTMENTworks in employs
“PERSON works in DEPARTMENT”“DEPARTMENT employs PERSON”
Binary Predicate:
ORMODLG
• OBJECT (ENTITY, CONCEPT) - NOUN … in an ellipse• PREDICATE (RELATIONSHIP) - verb = role name …in a box
– unary, binary, ternary, +++
Verbalization:
56
Adding ORM Constraints
PERSON DEPARTMENTworks in employs
“PERSON works in DEPARTMENT”
“DEPARTMENT employs PERSON”DEPENDENCY (MANDATORY):
“PERSON must work in some DEPARTMENT”EXCLUSIVITY (UNIQUENESS):
“PERSON works in at most one DEPARTMENT”
ORMODLG
Verbalization:
57
MethodologySteps in OR Modeling
• Familiarize with real world Universe of discourse
• Verbalize sentences of elementary facts
• Symbolize build the conceptual ORM model diagram
• Constrain the roles in predicates
• Validate the conceptual data model
• Map into neutral, record-based, logical tables
• Refine the table definitions
• Generate physical database definition for target DBMS
ORMINTRO
58
VisioModeler Architecture
DIAGRAMMER
CONCEPTUALDATA MODEL
VALIDATE(CHECK)BUILD
DICTIONARY
GENERATE
BROWSER
PHYSICAL DATABASESTRUCTURE & DEFINITION
for a target DBMS
refine
FORML fact sentences Population
Tables
"LOGICAL"DATA MODEL
(TABLES)DICTIONARY
"REPOSITORY"
VERBALIZER
FACT EDITOR
correct
ORMINTRO
Quick Facts
59
Levels of Abstraction in NIAM/ORM
REMOVING (generally in order of importance):
1. Lexical Object Types (LOTS); Value Object Types
2. “Terminal” Object types – equivalent to / become “attributes”IF: – play only functionally dependent roles (often only one role) i.e. One:Many relationships; (disjunctive) mandatory (implied)
3. Common Object Types - generic value domains / ref. modes
4. “Event” Object Types
5. Dependent (“weak”) Object Types- Subtypes, Objectified Facts
6. User-defined priority levels on Object Types
7. Constraints and Reference Modes
8. Predicates
DMODPRE
60
Sample, Simple ORM Data Model
BOSS
LIMITLIMIT
SKILL(code)
RATING
EMPLOYEE (number)
DEPT(number)works in employs
supervises is headed by
reports to superior to
may spend up to of spending for
with proficiency of assigned to
possesses possessed by
"EmployeeSkill!"
{ 1 .. 10 }
{ 1000 .. 9999 }
ac
SALARY(dollars) earns paid to
DESCRIPTION (name) has is of
<=5
DMODPRE
Remove "Terminal" (M:1) Objects
A major criticism of NIAM / ORM, both by protagonists and proponents, is that it is too detailed, a bottom-up design,
BUT… ER Diagrams usually omit the details of attributes and most constraints.
So, present the model using top-down abstractions.
61
ORM Abstractions
• Removing "Terminal" (M:1) Objects
BOSS
SKILL(code)
EMPLOYEE(number)
DEPT(number)
works in employs
supervises is headed by
reports to superior to
possesses possessed by
"EmployeeSkill!"
{ 1000 .. 9999 }
{ 2000 .. 2999 }
ac
<=5
DMODPRE
Remove Constraints and Reference Modes
62
ORM Abstractions
• Removing Constraints and Reference Modes
BOSS
SKILL
EMPLOYEE DEPTworks in employs
supervises is headed by
reports to superior to
possesses possessed by
DMODPRE
Remove Less Important Objects & Predicates– Subtypes, Objectified Predicates, Reflexive Relationships
63
ORM Abstractions• Removing Less Important Objects & Predicates
– Subtypes, Objectified Predicates, Reflexive Relationships
SKILL
EMPLOYEE DEPTworks in employs
supervises is headed by
DMODPRE
Remove Predicates
64
ORM Abstractions
• Removing Predicates
SKILL
EMPLOYEE DEPT
... Leaving BASE Entities!
A Top-Level Abstract Conceptual Data Model an ER Diagram ? ! ! !
DMODPRE
65
Language Design Criteria
• Semantic Strength, Expressiveness– Able to model all relevant details in the domain– The range of queries that can be expressed– The “100% Principle”
• Semantic Clarity– Ease of Understanding and Use; intuitive– Unambiguous, i.e., only one possible meaning
• Semantic Relevance– Only relevant information need be stated– Not dependent on artificial or spurious expressions
• Semantic Stability, Independence– How well the model/query retains its original intent
in the face of changes to the underlying application
ORMQURY See: Halpin, “Conceptual Queries”.
66
Conceptual Query Language
• ConQuer– Based on ORM– Need not be familiar with ORM or its notation
“user can construct a query without any prior knowledge of the schema” but…
– In the form of a textual outline Indentation is significant
– Implemented in Visio ActiveQuery Object pick list – drag to the query window Roles pick list – drag to the query window
– Projection – items to display marked with a tick ()– Mapping to SQL
ORMQURY See: Halpin, “Conceptual Queries”.
67
Sample ConQuer Query (1)
“List Employees who live in the City that is the Location of Branch 52”
Employee [number] +– lives in City+– is location of Branch [number =] 52
NOTE: City acts as a Join object type (the common “attribute”), i.e. Employee and Branch are joined through City.
ORMQURY See: Halpin, “Conceptual Queries”.
Employee(number) lives in / City
Branch(number) is located in / is location of
State(code) / is in
Semantic clarity (+), semantic relevance (+), semantic stability (+).
CityName / has
U
68
SQL for Sample ConQuer Query (1)
“List Employees who live in the City that is the Location of Branch 52”
ORMQURY See: Halpin, “Conceptual Queries”.
In SQL: (Where are the tables?)SELECT EmployeeNumber
FROM Employee, Branch
WHERE Employee.CityName = Branch.CityNameand Employee.StateCode = Branch.StateCode
and Branch.BranchNumber = 52
Could you do this in Access using the Query Form?
Semantic clarity (-), semantic relevance (-), semantic stability (-)
Suppose an Employee could live in more than one City???
Suppose we now wish to record the Population of Cities???
Employee(number) lives in / City
Branch(number) is located in / is location of
State(code) / is in
CityName / has
U
69
Problems with ER Modeling - Summary• Too much clustering; attributes in the wrong place
• Ignores (presumes) intra-record structure (that is, inter-attribute
relationships)• Human modeler is responsible for normalization
remedy is always record decomposition
• Attribute migration… to become an entity- modeler must distinguish attributes and entities
• Naming columns = domain + role, loses domain objects
• Modeling dilemma:– Complete representation of an entity object - more
clustering– Full normalization (1NF) – decomposition, less clustering
• Indirect representation of M:N relationships– Introduces artificial “new” entities
• Difficulty representing Ternary relationships
• Stability of the query language (SQL)
ORMvER
70
At the Root,
What’s wrong with
ER Modeling?
ORMvER
CLUSTERING
Gordon C. EverestCarlson School of Management
University of Minnesota
71
Why NIAM/OR Modeling?
• roots in both LOGIC & LINGUISTICS
• based on one modeling construct: the fact sentence
• more expressive, understandable - diagrams & verbalization
• diagrams can be populated with actual data samples
• abstraction levels equivalent to E-R modeling
• more, richer semantics (than E-R, EER, IDEF1X)
• capture and represent all functional dependencies
• avoids normalization problems with record-based modeling
• better meets criteria for good data modeling
• organizations that switched wouldn’t go back to E-R
• direction of Standards (SUMM, UDM, ...)
• now supported with a viable PC-based CASE tool
ORMINTRO
72
Resources on ORM
BOOK:
• Terry Halpin (now from Microsoft), Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design, Morgan Kaufmann Publishers, San Francisco, 2001, 763 pages.
WEB SITE for my course:
• http://webfoot.csom.umn.edu/faculty/everest/idsx431
– with ORM intro and further reading – InfoModeler software download– Usage Notes
SPRING CLASSES:• IDSc 6431 (for MBAs)• IDSc 4431 (for CSOM Undergrads)• IDSc 4131 (for CCE and others)
TRAINING and CONSULTING:
• InConcept, Inc., Lake Elmo, MN www.inconcept.com
ORMvER