72
DAMA, 2001 December. What’s Wrong With ER Modeling? Gordon C. Everest Carlson School of Management University of Minnesota ORMvER

DAMA, 2001 December

  • Upload
    audra

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

DAMA, 2001 December. ORMvER. What’s Wrong With ER Modeling ?. Gordon C. Everest Carlson School of Management University of Minnesota. Problems and Solutions. ORMvER. OBJECTIVES FOR THIS PRESENTATION: - PowerPoint PPT Presentation

Citation preview

Page 1: DAMA, 2001 December

1

DAMA, 2001 December.

What’s Wrong With

ER Modeling?

Gordon C. EverestCarlson School of Management

University of Minnesota

ORMvER

Page 2: DAMA, 2001 December

2

Problems and Solutions

OBJECTIVES FOR THIS PRESENTATION:

• Show several PROBLEMS with ER modeling schemes,(actually, any “record-based” modeling scheme).

• Identify the ROOT CAUSE of the problem

ORMvER

• Show you a better way – a SOLUTION using Object Role Modeling (ORM)

To stop there would be irresponsible, so…

• NOT asking you to abandon what you have learned about data modeling and are doing in practice

• BUT to defer thinking in terms of entity records, andto begin doing data modeling at a richer, more conceptual level

Page 3: DAMA, 2001 December

3

Data Modeling

BEFORE WE CAN ANSWER THAT:

Why Do Data Modeling?

How do we do Data Modeling?

Why do we need Normalization?

DMOD

What’s Wrong with ER Modeling?

What is the Dominant Data Modeling Scheme today?

Page 4: DAMA, 2001 December

4

Database Design

Objective: (WHAT we are trying to do)

TO ACCURATELY AND COMPLETELY MODEL

SOME PORTION OF THE REAL WORLD UNIVERSE OF DISCOURSE (UoD)

OF INTEREST TO SOME ORGANIZATION OR COMMUNITY OF USERS.

DMOD

Page 5: DAMA, 2001 December

5 · OBJECTIVE of LOGICAL DATABASE DESIGN:TO ACCURATELY AND COMPLETELY MODELSELECTED PORTIONS OF THE REAL WORLDOF INTEREST TO A COMMUNITY OF USERS.

Logical Database DesignObjective, Principles, Benefits

DMOD

• USERS (COLLECTIVELY) WILL ALWAYS KNOW MORE ABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS, OR THAN COULD BE DEFINED TO THE SYSTEM.

• WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM,THE SYSTEM CANNOT MANAGE . . . THE USERS MUST!

• THEREFORE, NEED TO CAPTURE RICH SEMANTICS WITH COMPREHENSIVE DATA MODELING and DEFINITION, INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS.

FOR ==> GREATER QUALITY & RELIABILITY IN DATA

==> GREATER USER CONFIDENCE.

==> HIGHER USER / DEVELOPER EFFICIENCY

Let the ‘system’ do it!

Page 6: DAMA, 2001 December

6

Purpose of Data Modeling (WHY we do it)

DUAL, CONFLICTING PURPOSES DRIVE THE PROCESS:

• Facilitate Human Communication, Understanding, & Validation– capture and present meaning, the semantics of a model– direct representation of only essential model semantics

PRESENTATION CHARACTERISTICS:

– scoping and presenting subparts of a Model– unfolding presentation at different levels of abstraction or detail– visual prominence in proportion to semantic importance

SECONDARY:

• Basis for Implementation - defining & creating a Database– complete in all the necessary details– construction/generation able to be fully automated

DMOD

USER

SCHEMA

DATABASE

Page 7: DAMA, 2001 December

7

Modeling

(Re).present.(ation)

Knowledgein the world

Knowledgeexternalized,formalized, shared.

Knowledgein the head(mental models)

Reality MODELMODELINGPROCESS

DMOD

pres

ent

Re.

pres

ent

What drives or guides the process?

Page 8: DAMA, 2001 December

8

The Modeling Process

Real WorldUniverse of Discourse

MODELINGPROCESS

MODELING SCHEMEContext

ConstructsCompositionConstraints

MODEL

perceptionselection/filtering

DMOD

METHODOLOGY:

Steps/Tasks + Milestones + Deliverables +

REPRESENTATIONAL FORMS:Narrative, Graphical Diagram,Formal Language Statements

(the Syntax)

Page 9: DAMA, 2001 December

9

A Data Modeling “Scheme”

DEFINES the:

• Context

• Constructs (ENTITIES, OBJECTS)

• Collections, Compositions, Connections (RELATIONSHIPS)

• Constraints, Characteristics

WE LOOK FOR IN THE “REAL” WORLD UoD or Domain of Interest

and

USE IN BUILDING A DATA MODEL.

DMOD

Page 10: DAMA, 2001 December

10

Data Modeling Constructs

ENTITY(OBJECT)

ATTRIBUTE

RELATIONSHIP

IDENTIFIER [ FOREIGN KEY ]

characteristics

characteristics

What to look for:Relative emphasis differentiates Data Modeling approaches

DMOD

Page 11: DAMA, 2001 December

11

Student-Course Database - Table Diagram

COURSECourse# TitleDescriptionCredits

INSTRUCTORSSNLastNameFirstNameAddressPhoneDept

STUDENTStudent IDNameAddressMajorGPA

COURSEOFFERINGCourse#YearTermSectionBuildingRoomDaysTime StartControlEnrollmentInstructor SSN

REGISTRATIONCourse IDStudent IDGrade

LEGEND:

ENTITY NAME (upper case)

Identifier (bold face)

Attributes (not bold face)

Foreign Key Identifier M:1 relationship

DMOD

Diagram of the Schema:

What if you move the arrow head to the other end of the arc?

Page 12: DAMA, 2001 December

12

Student-Course Database – PopulatedActual instances of data values:DMOD

ACC101 Intro Accounting 4ENG101 English Composition 4MIS101 Intro MIS 4MIS103 Intro Database 4MIS403 Advanced Database 2

COURSE:Course# Title Credits

33741 Allen, Lillian Eng85959 Boyd, Don ACC64578 Carlis, John CSci11248 Davis, Gordon IDS77004 Everest, Gordon IDS55432 Fine, Alan IDS

INSTRUCTOR:InstrID Name Dept

1111111 Able, Emma MIS 3.42222222 Bright, Sue MIS 3.93333333 Challenger, X ACC 2.74444444 Dummie, Noe ACC 3.25555555 Everest, Monty MIS 3.8

STUDENT:StudentID Name Major GPA

1004 MIS101 2000 Fall 001 1-142 11248 481017 MIS101 2001 Spr 002 2-224 55432 603001 MIS103 2000 Fall 001 2-207 77004 27

COURSE OFFERING:CRSO# Course# Year Term Sect Room InstrID Enroll

Secondary (Composite) Key

1004 4444444 B+

1017 33333333001 1111111 B3001 2222222 A3001 5555555 A3001 7777777 A-

REGISTRATION:CRSO# StudentID Grade

Page 13: DAMA, 2001 December

13

Data Modeling – Schema Diagram

THINKING ABOUT ATTRIBUTES:

Record-Based:

DMOD

ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ...

Page 14: DAMA, 2001 December

14

Essentials of ER Modeling / Diagramming

ENTITY1 RELATIONSHIP ENTITY2

ENTITY1============

IDentifier 1---------------------

Attribute 1.1Attribute 1.2Attribute 1.3

:

ENTITY2============

IDentifier 2---------------------

Attribute 2.1Attribute 2.2Attribute 2.3ForeignID 1

:

M1

DMOD

Adding Attributes, omitting the Diamond:

ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ...

ENTITY1

ENTITY2

Attribute

relation

ship>

identifier

Attribute1

Attribute2

Attribute3

Page 15: DAMA, 2001 December

15

What’s wrong with

ER Modeling?

________

ORMvER

Page 16: DAMA, 2001 December

16

ER / Record-based Modeling

VALUEVALUEDOMAINDOMAIN

VALUEVALUEDOMAINDOMAIN

VALUEVALUEDOMAINDOMAIN VALUEVALUE

DOMAINDOMAIN

TABLE:ID ATTRIBUTES . . .

... roles

CLUSTERING of ATTRIBUTES into RECORDS/RELATIONS

– NOT a necessary or desirable first step– gets us into trouble: if too much, must decompose to normalize

X A B C D

DMOD

Page 17: DAMA, 2001 December

17

X A B C

Record-based Design

WHAT SEMANTICS ARE PRESUMED BY THE FOLLOWING RECORD STRUCTURE?

• What does it say about X ?• What does it say about A ?

• What does it say about the relationship X – A ?

• What does it say about the relationship A – B ?

There are at least 14 distinct semantic statements you can make in answering these questions!

• Do we know it is in Third Normal Form (3NF)? How?

ORMvER

Page 18: DAMA, 2001 December

18

Record-based Design

WHAT DOES IT SAY ABOUT X ?

X A B C

ORMvER

Page 19: DAMA, 2001 December

19

Record-based Design

WHAT DOES IT SAY ABOUT A ?

X A B C

ORMvER

Page 20: DAMA, 2001 December

20

Record-based Design

WHAT DOES IT SAY ABOUT THE RELATIONSHIP X–A ?

X A B C

ORMvER

Page 21: DAMA, 2001 December

21

Record-based Design

REPRESENTING THE RELATIONSHIP X–A ?

X A B C

A D ...

ORMvER

N

Page 22: DAMA, 2001 December

22

Record-based Design

WHAT DOES IT SAY ABOUT THE RELATIONSHIP A–B ?

X A B C

ORMvER

Page 23: DAMA, 2001 December

23

Record-based Design

REPRESENTING COMPLEX RELATIONSHIPS AMONG X, A, & B .

X A B C

A ...

B A? ...

? Separately consider the relationshipbetween A and B.

What if it is many-to-many?

What if other information is functionally dependent on A–B ?

ORMvER

Page 24: DAMA, 2001 December

24

Record-based Design - Compound Key

WHAT IS PRESUMED BY THE FOLLOWING RECORD STRUCTURE?

X Y A B C

ORMvER

Page 25: DAMA, 2001 December

25

Major Data Modeling Schemes

(1) SINGLE FILE (E-A) FLAT FILE “TABLE” HIERARCHICAL - nested repeating groups

e.g., COBOL

(M) MULTIFILE (E-R → E-A-R) NETWORK - hierarchical records

RELATIONAL (E-A-[R]) - flat records

(O) NO FILE (O-R) (No Clustering of Data Items into Records)

NIAM/“Binary” Modeling ORM (Object-Role Modeling - Halpin)

RECORD-BASED(Clustered Data Items)

DMOD Everest-DM-4p.121.

Page 26: DAMA, 2001 December

26

Data Modeling Schemes

CLASSIFIED by Degree of Clustering:• No clustering

– NIAM/ORM - Nijssen, Halpin

• Clustering to One Level => Atomic Data Values– Relational Modeling - Codd– ER Modeling - Chen– Extended ER (EER) - Teorey– Information Engineering (IE) – Clive Finkelstein -> James Martin– Oracle (Designer*2000) - Barker– IDEF1X - Appleton, US Gov’t, ERwin (tool), Bruce (book)

• Nested Objects– Hierarchical data structure (single file; COBOL)– CODASYL Network (ANSI NDL)– Nested Relations– Semantic Object Modeling (SOM) – Kroenke, Salsa (tool)

– Object Modeling (UML) – Rational Rose (tool)– ANSI SQL:1999

DMOD

Page 27: DAMA, 2001 December

27

Data Modeling Schemes – ClusteredDMOD

ER

NETWORK

HIERARCHIC

- multifile, hierarchical record- defined relationships

- single file- nested repeating groups- implicit hierarchical relationships

- Focus on E & R, hidden record structure- Usually flat records [optionally with attributes]- Defined relationships (general M:N)- Usually restricted to binary relationships

RELATIONAL- Multifile; flat records only- Relationships as foreign keys

so no M:N relationships

special case

=> semantic/ OBJECT models

Page 28: DAMA, 2001 December

28

Taxonomy of “Clusterered” Data Structures

DMOD

Clustered

Intr

a-R

eco

rdS

tru

ctu

re

Flat

Nested

SINGLE FLATFILE (“TABLE”)

RELATIONAL(“TABLES”)

HIERARCHICALFILE

(CODASYL)NETWORK

SingleFile

MultipleFiles

Page 29: DAMA, 2001 December

29

Stages of Data Modeling

CONCEPTUAL

DMOD

USER

SCHEMADATABASE

CLUSTERED “LOGICAL”

RELATIONAL

PHYSICAL

DomainKnowledge

ORM• Objects• Obj. ID’s• Roles/Relships• (Fnl. Dep)NO clustering=> NO “attributes”

Attribs in RecordsMultiValued, Nested - - - - - ->

Ternaries - - - - - ->

M:N - - - - - - - - - ->

Normalized (2,3,4)Relationships - - ->

w/attributesSub/SupTypes

Flat (1NF)Binary only1:Many onlyPrimary KeysForeign Keys

• Implementation in/for a DBMS

• Denormalize (for performance)+ triggers, stored procedures

ER

Start at the highest Conceptual Level!

Page 30: DAMA, 2001 December

30

Data Modeling - Representation Stages A SECOND CUT:

• Conceptual (ORMHALPIN/NIJSSEN SUMMFULTON UDMCDMTG)

– only what the user knows or needs to know– functional dependencies fully represented– Elementary Facts - no clustering of “attributes” into “records”

• Clustered (ERCHEN EERTEOREY SDMMcLEOD SOMKROENKE SQL:99ANSI UML)– identifiers (attributes or dependent relationships)– keep: M:N, ternary relationships, super/subtypes,

attributed relationships, multi-valued items/rgroups

• “Logical” (RELATIONALCODD SQLANSI )– flat files/tables; – stored identifiers; – 3NF (decompose)

– resolve: M:N, ternary, super/subtype relationships– foreign keys to represent relationships

• Denormalize (Recluster) - for performance

• Physical (IMPLEMENTATION in a DBMS)– triggers, stored procedures, user code to

represent and enforce semantics beyond the DBMS.

USER

SCHEMA

DATABASE

DMOD

NEW

Page 31: DAMA, 2001 December

31

Data Modeling Schemes - ER

• ENTITIES, that have ATTRIBUTES, and participate in RELATIONSHIPS.

• Originated with Peter Chen, 1976, TODS (1:1)• Notation has evolved, many variations

– Drop diamond; attributes inside entity box or suppressed.• No standard syntax notation (but similar semantics)• Common: attributes clustered into entity records.• Most popular today• Weak entity - Association entity -• Relationship naming: one name, direction unstated,

thus ambiguous; need direction (>) or rule (eg. left to right).

DMOD

EMPLOYEE works in DEPT

EmpNo EmpName UnitNo Name… …

1M

Page 32: DAMA, 2001 December

32

Data Modeling Schemes - Oracle

• In Oracle Designer*2000 tool (R. Barker, A-W, 1990)

• A flavor of ER modeling

• ENTITY in rounded box; optionally ATTRIBUTES inside

• ATTRIBUTE flags: # - [part of] identifier

* - mandatoryo – optional

• RELATIONSHIPS: - binary only- two names at end from which to be

read- optional ---, mandatory —,

many- identifying ———, fixed ———

DMOD

DEPTworks in

employs

EMPLOYEEEmpNo (#)EmpName (*)Address (o)

Page 33: DAMA, 2001 December

33

Data Modeling Schemes - IE

• Information Engineering (1970’s)

• Due to Clive Finkelstein, adapted by James Martin

• Used in several tools: IEF, IEW/ADW/Cool, ICES, …

• Widely used, many variations, no single standard• ENTITIES: in boxes, optionally with ATTRIBUTES, in or out

• RELATIONSHIPS: - usually binary only- many ——— , at most one

———- optional ——— (at the “other”

end)- mandatory, at least one ———

DMOD

EMPLOYEE DEPT

Page 34: DAMA, 2001 December

34

Data Modeling Schemes – IDEF1X• U.S. Air Force/Defense (1970’s), Appleton eXtensions• NIST (U.S. Govt) standard – 1993; revised in IDEF1X97; IEEE -

1998• Book by T. Bruce, 1992; Used in ERwin (now from CA), Visio, …• Widely used in and for U.S. Govt work, some outside• Some Relational restrictions: Foreign Keys, thus no M:N• “Unnecessarily complex, confusing, and forgettable” - Halpin• ENTITY: independent - , dependent -• ATTRIBUTE flags: - Alternate Key - (AKi), Foreign Key -

(FK)- optional (O) – mandatory is default

• RELATIONSHIPS: - binary only, “child” ——— (may be

arbitrary) - First Name always read toward the

child- identifying —— , non identifying -----

- “cardinality” on child: P - one or more, Z - zero or one, n - exactly n ----- Parent is optional (some allow many parents)

DMOD

EmpNoEmpNameSS# (AK1)Address (O)UnitNo (FK)

EMPLOYEE

DeptNoDeptName :

DEPT

employs/works in

Page 35: DAMA, 2001 December

35

Forming a Relational Data Structure

• Define a TABLE or “Relation” for each Entity type– Types of Entities: base/reference, dependent (“weak”),

association/intersection, event/transaction– Assumes mutually exclusive (non-overlapping) populations

• SINGLE-VALUED ITEMS (“flat” tables)– If multivalued or nested repeating group of items,

put into a separate table

• IDENTIFIER for every table (entity “integrity”)

• FOREIGN IDENTIFIERS to represent all relationships1:M - stored in the child / dependent entity1:1 - should probably merge into one tableM:N - must introduce an association/intersection table

• NORMALIZE to second and third normal form– important for good design– but not enforced by RDBMS... WHY?

RELSQL

Some rules:

Page 36: DAMA, 2001 December

36

Functional Dependency in Relationships

Basis for Database Normalization.

X Adetermines

is functionally dependent onA f (X)

X A …

X AA is dependent on X, and the Relationship is exclusive on A, multiple on X.

Clustered into a Record/table for entity of X:

There can only be one A for each X .

There can be multiple Xs for a given A .

There can be different As for the Xs .

RELSQL

Page 37: DAMA, 2001 December

37

Database Normalization Start with ENTITIES, their IDENTIFIERS (unique keys)

and their ATTRIBUTE FIELDS (facts about each entity).i.e., start with data items clustered into records/tables.

PROBLEM: we may do it wrong; cluster too much; some items in the wrong place, which can lead to redundancy & update anomalies.

Any Flat File is a Relation, but… not all Relations are “well-formed.”

• NORMALIZATION is the test– a set of rules to perform internal validation of a data model

• Record DECOMPOSITION is the remedy.– Removing attributes from the entity record, and placing them in

a different, often a new entity record

(1) First Normal Form: no multivalued items or rgroups.

(2) Second Normal Form: no partial dependencies.

(3) Third Normal Form: no transitive dependencies.“Every non-key data item must be single-valued, and dependent upon

the key, the whole key, and nothing but the key… so help me Codd.”

RELSQL

Page 38: DAMA, 2001 December

38

Anomalies

Resulting from (clues to) poor database design:

EMPLOYEE# EMPNAME SKILL PROFICIENCY … BOSSNAME DEPT# DEPTNAME

o DEPTNAME and BOSSNAME stored redundantly

• if EMPLOYEE moves to another DEPT#, DEPTNAME and BOSSNAME would also change, needing update.

• If a DEPTNAME (or BOSSNAME) for a DEPT changes, must update all occurrences, else inconsistency.

• To delete a DEPT you must also delete all its EMPLOYEEs (unless null foreign keys allowed!)

• If you delete the last EMPLOYEE in a DEPT, you also delete that DEPT (unless null keys allowed!…multiple?)

• No place to insert a DEPT# and its DEPTNAME, if there are no EMPLOYEEs there.

RELSQL

Page 39: DAMA, 2001 December

39

Summary of all Normal Forms

GIVEN:

– a set of attributes, clustered into tables/records with identifiers– all functional dependencies on the attributes

• No multi-valued, non-key attributes (1NF)

• No partial dependencies on non-key attributes (2NF)

• No transitive dependencies in non-key attributes (3NF)

• No partial or transitive dependencies within any key (EKNF, BCNF), i.e., consider all candidate keys.

• No multiple, independent multi-valued attributes in the same table (4NF)

• No join dependencies, i.e., a relation can be reconstructed without loss of information by joining some of its projections (5NF).

• No more than one table with the same key (“minimal”).

• No transitive dependencies across tables (“optimal”).

NOTE: number order is artificial, i.e., there is no necessary sequence to the normal forms.

RELSQL

Page 40: DAMA, 2001 December

40

Normalization – Testing your Understanding

Assuming that A is single valued with respect to X (i.e. 1NF).

GIVEN:

RELSQL

X A

X A B

X A B

MUST DISTINGUISH THE PRIMARY KEY .

X A B

2NF? 3NF? 4NF?

2NF? 3NF? 4NF?

2NF? 3NF? 4NF?

Could you have a violation of: (if not, why not?)

What does this diagram mean?How does this differ from diagram above, if any?

Page 41: DAMA, 2001 December

41

Representing a M:N Relationship

• If you cannot store multiple Projects (or Project IDs) in an Employee record, or multiple Employees (or Employee IDs) in a Project record (as is the case in a Relational Database), then …

DMOD

EMPLOYEE PROJECT

EMPLOYEE PROJECT

EMPL-ID PROJ-ID

• The Intersection Entity also provides the place to store additional attributes of the relationshipe.g., Hours Worked, Rate of Pay, …

What is the problem with this representation?

N

Another Pattern:

you must introduce an “Intersection Entity” between them to represent the Many-to-Many Relationship.

Page 42: DAMA, 2001 December

42

Representing a Ternary Relationship

While we can develop a consistent notation for binary relationships, ternary relationships are a problem.

DMOD

EMPLOYEE SKILL

PROFICIENCY

• If one of the entities is single valued, is it really ternary? Or “attributed” binary?

• What lends uniqueness to each instance of the relationship?

• How to verbalize the relationship? Which order?• How to represent Multiplicity / Exclusivity ?• How to represent Dependency? Must have all 3?

Page 43: DAMA, 2001 December

43

What’s Wrong with ER Modeling?

I will show you still

a more excellent way– PAUL, I Cor 12.31

ORMvER

N

Page 44: DAMA, 2001 December

44

Record-based Design B

WHAT DOES THIS “RECORD” REPRESENT?

X A B C

X A

X B

X C

Design minimal records with at most one non-key domain.

Now what do these “records” represent?Perhaps Codd was right in naming it a _________!

Avoids spurious associations, e.g., A – B …Could there be any violations of normal forms?

What about the representation of the entity X ?

What if A is related to other “entities”?

ORMvER

Page 45: DAMA, 2001 December

45

Transform Record-based (ER) Design

TO REALLY REPRESENT THE ENTITY DOMAINS

X A B C

X A

X B

X C

X

C

B

A

ObjectRoleModel:

ORMvER

Page 46: DAMA, 2001 December

46

Data Modeling

THINKING ABOUT ATTRIBUTES:

Record-Based (ER):

ENTITY IDENTIFIER ATTRIBUTE ATTRIBUTE ...

Object-Role (ORM):

ENTITY(id)

ENTITY

ENTITY

ENTITY

ENTITY

ENTITY

ENTITIES have ATTRIBUTES / DESCRIPTORS by playing roles in relationships with other entities.

ORMvER

ENTITY

Page 47: DAMA, 2001 December

47

Record-Based Modeling

GIVEN TWO FACTS (conceptually):

• one about the CITY a PERSON lives in

• another about the CITY a PERSON works in

ASSUME:

• every person has to live and work in a city

• each person can live and work in only one city at a time

• not interested in anything more about persons or cities

EXAMPLE: • Gordon Everest lives in Falcon Heights and * works in Minneapolis

DIAGRAM A CONCEPTUAL DATA MODEL– to represent this information (a database to contain these facts)

ORMvER

Page 48: DAMA, 2001 December

48

Record-Based Data Model for PERSON lives in / works in a CITY

• What is the entity and what is the attribute?

• Would it make any sense to say (to a novice layperson - a user):– CITY was an "attribute" of PERSON?

• Doing more than is necessary at the conceptual level

PersonID [key]

LiveCity

WorkCity

• cannot have CITY and CITY as attributes of PERSON

• column/attribute name reflects " entity + role "

• CITY as an entity/object is lost (not its own table)

• what if there is a CITY where no one lives or works

• some add concept of a DOMAIN

ORMvER

PERSON

Page 49: DAMA, 2001 December

49

Object-Role Model for PERSON lives in / works in CITY

FORML language statements:

• PERSON lives in CITY

• Every PERSON lives in some CITY

• Each PERSON lives in at most one CITY

• ... for works in

PERSON

(id)

CITY(name)

lives in

works in

FACT

ORMvER

Page 50: DAMA, 2001 December

50

Record-Based Modelingfor an additional fact.

• A PERSON makes sales calls in multiple CITIES

DIAGRAM the extended conceptual data model

• can you add an attribute "SalesCallCities" to PERSON?

FLAT Record-Based Modeling is even worse:

• create a new table SALESCALLS with a compound key– Is this a real entity in the conceptual view?

EXTEND THE OBJECT-ROLE DATA MODEL

ORMvER

Page 51: DAMA, 2001 December

51

Record-Based Data Modeling DISADVANTAGES:

• no way to capture the conceptual view directly

• must mentally map from conceptual view to the "logical" (record-based) view

– by structural groupings of attributes and relationships

• must choose unique, arbitrary names– for attributes in a record; for spurious new "entities"

• cannot reuse attributes in the same table

• must do your own normalization

• hides or ignores inter-attribute relationships

• creates (implies) spurious inter-attribute relationships

ORMvER

Page 52: DAMA, 2001 December

52

Object-Role (ORM) Data Modeling

THE ESSENTIAL DIFFERENCE:

• Three main constructs ..rolled into.. Two main constructs

ENTITY

ATTRIBUTE

RELATIONSHIP

? ? ? ?

Role inRELATIONSHIP

What to call it?

OBJECTENTITYENTRIBUTE!

Record-based modeling: NIAM/ORM modeling:

ORMINTRO

Page 53: DAMA, 2001 December

53

Data Modeling Terminology

O-R("conceptual")

OBJECT

FACT SENTENCE

PREDICATE

CONSTRAINT

E-R("logical")

ENTITY (TYPE)

ATTRIBUTE

INSTANCE

IDENTIFIER

RELATIONSHIP

CHARACTERISTICS

COBOL/DBTG

RECORD TYPE

DATA ITEM (ELEMENT)

RECORD

"SET"

RELATIONAL

RELATION TABLE

COLUMN FIELD

ROW TUPLE

KEY

FOREIGN KEY

CONSTRAINT

("physical" implementation)

ORMINTRO

Page 54: DAMA, 2001 December

54

Fact Sentence - Verbalize

• A Fact = a Predicate + Object(s) => Sentence

• THINK: Objects playing Roles in a Relationship

• Naming: object instances versus object types– e.g. “Ann” is an instance of “Person”

• Arity - the number of object “holes” in the Predicate– UNARY: - “Ann smiles”

only 2 states: true/false, present/absent, yes/no making the closed world assumption

– BINARY: - “Ann likes to run” most common has an inverse - “Running is liked by Ann” Inverse name is never the same (else symmetric, handled differently)

– TERNARY: - “Ann married Bob in 1967” with types: - “PERSON married PERSON in YEAR”

verbalizing can be difficult with more than 2 (sequence problem)

ORMODLG

Page 55: DAMA, 2001 December

55

Symbolize: ORM Constructs

OBJECT1 OBJECT2

PREDICATE

role12 role21

Elementary Binary Fact Sentence:

PERSON DEPARTMENTworks in employs

“PERSON works in DEPARTMENT”“DEPARTMENT employs PERSON”

Binary Predicate:

ORMODLG

• OBJECT (ENTITY, CONCEPT) - NOUN … in an ellipse• PREDICATE (RELATIONSHIP) - verb = role name …in a box

– unary, binary, ternary, +++

Verbalization:

Page 56: DAMA, 2001 December

56

Adding ORM Constraints

PERSON DEPARTMENTworks in employs

“PERSON works in DEPARTMENT”

“DEPARTMENT employs PERSON”DEPENDENCY (MANDATORY):

“PERSON must work in some DEPARTMENT”EXCLUSIVITY (UNIQUENESS):

“PERSON works in at most one DEPARTMENT”

ORMODLG

Verbalization:

Page 57: DAMA, 2001 December

57

MethodologySteps in OR Modeling

• Familiarize with real world Universe of discourse

• Verbalize sentences of elementary facts

• Symbolize build the conceptual ORM model diagram

• Constrain the roles in predicates

• Validate the conceptual data model

• Map into neutral, record-based, logical tables

• Refine the table definitions

• Generate physical database definition for target DBMS

ORMINTRO

Page 58: DAMA, 2001 December

58

VisioModeler Architecture

DIAGRAMMER

CONCEPTUALDATA MODEL

VALIDATE(CHECK)BUILD

DICTIONARY

GENERATE

BROWSER

PHYSICAL DATABASESTRUCTURE & DEFINITION

for a target DBMS

refine

FORML fact sentences Population

Tables

"LOGICAL"DATA MODEL

(TABLES)DICTIONARY

"REPOSITORY"

VERBALIZER

FACT EDITOR

correct

ORMINTRO

Quick Facts

Page 59: DAMA, 2001 December

59

Levels of Abstraction in NIAM/ORM

REMOVING (generally in order of importance):

1. Lexical Object Types (LOTS); Value Object Types

2. “Terminal” Object types – equivalent to / become “attributes”IF: – play only functionally dependent roles (often only one role) i.e. One:Many relationships; (disjunctive) mandatory (implied)

3. Common Object Types - generic value domains / ref. modes

4. “Event” Object Types

5. Dependent (“weak”) Object Types- Subtypes, Objectified Facts

6. User-defined priority levels on Object Types

7. Constraints and Reference Modes

8. Predicates

DMODPRE

Page 60: DAMA, 2001 December

60

Sample, Simple ORM Data Model

BOSS

LIMITLIMIT

SKILL(code)

RATING

EMPLOYEE (number)

DEPT(number)works in employs

supervises is headed by

reports to superior to

may spend up to of spending for

with proficiency of assigned to

possesses possessed by

"EmployeeSkill!"

{ 1 .. 10 }

{ 1000 .. 9999 }

ac

SALARY(dollars) earns paid to

DESCRIPTION (name) has is of

<=5

DMODPRE

Remove "Terminal" (M:1) Objects

A major criticism of NIAM / ORM, both by protagonists and proponents, is that it is too detailed, a bottom-up design,

BUT… ER Diagrams usually omit the details of attributes and most constraints.

So, present the model using top-down abstractions.

Page 61: DAMA, 2001 December

61

ORM Abstractions

• Removing "Terminal" (M:1) Objects

BOSS

SKILL(code)

EMPLOYEE(number)

DEPT(number)

works in employs

supervises is headed by

reports to superior to

possesses possessed by

"EmployeeSkill!"

{ 1000 .. 9999 }

{ 2000 .. 2999 }

ac

<=5

DMODPRE

Remove Constraints and Reference Modes

Page 62: DAMA, 2001 December

62

ORM Abstractions

• Removing Constraints and Reference Modes

BOSS

SKILL

EMPLOYEE DEPTworks in employs

supervises is headed by

reports to superior to

possesses possessed by

DMODPRE

Remove Less Important Objects & Predicates– Subtypes, Objectified Predicates, Reflexive Relationships

Page 63: DAMA, 2001 December

63

ORM Abstractions• Removing Less Important Objects & Predicates

– Subtypes, Objectified Predicates, Reflexive Relationships

SKILL

EMPLOYEE DEPTworks in employs

supervises is headed by

DMODPRE

Remove Predicates

Page 64: DAMA, 2001 December

64

ORM Abstractions

• Removing Predicates

SKILL

EMPLOYEE DEPT

... Leaving BASE Entities!

A Top-Level Abstract Conceptual Data Model an ER Diagram ? ! ! !

DMODPRE

Page 65: DAMA, 2001 December

65

Language Design Criteria

• Semantic Strength, Expressiveness– Able to model all relevant details in the domain– The range of queries that can be expressed– The “100% Principle”

• Semantic Clarity– Ease of Understanding and Use; intuitive– Unambiguous, i.e., only one possible meaning

• Semantic Relevance– Only relevant information need be stated– Not dependent on artificial or spurious expressions

• Semantic Stability, Independence– How well the model/query retains its original intent

in the face of changes to the underlying application

ORMQURY See: Halpin, “Conceptual Queries”.

Page 66: DAMA, 2001 December

66

Conceptual Query Language

• ConQuer– Based on ORM– Need not be familiar with ORM or its notation

“user can construct a query without any prior knowledge of the schema” but…

– In the form of a textual outline Indentation is significant

– Implemented in Visio ActiveQuery Object pick list – drag to the query window Roles pick list – drag to the query window

– Projection – items to display marked with a tick ()– Mapping to SQL

ORMQURY See: Halpin, “Conceptual Queries”.

Page 67: DAMA, 2001 December

67

Sample ConQuer Query (1)

“List Employees who live in the City that is the Location of Branch 52”

Employee [number] +– lives in City+– is location of Branch [number =] 52

NOTE: City acts as a Join object type (the common “attribute”), i.e. Employee and Branch are joined through City.

ORMQURY See: Halpin, “Conceptual Queries”.

Employee(number) lives in / City

Branch(number) is located in / is location of

State(code) / is in

Semantic clarity (+), semantic relevance (+), semantic stability (+).

CityName / has

U

Page 68: DAMA, 2001 December

68

SQL for Sample ConQuer Query (1)

“List Employees who live in the City that is the Location of Branch 52”

ORMQURY See: Halpin, “Conceptual Queries”.

In SQL: (Where are the tables?)SELECT EmployeeNumber

FROM Employee, Branch

WHERE Employee.CityName = Branch.CityNameand Employee.StateCode = Branch.StateCode

and Branch.BranchNumber = 52

Could you do this in Access using the Query Form?

Semantic clarity (-), semantic relevance (-), semantic stability (-)

Suppose an Employee could live in more than one City???

Suppose we now wish to record the Population of Cities???

Employee(number) lives in / City

Branch(number) is located in / is location of

State(code) / is in

CityName / has

U

Page 69: DAMA, 2001 December

69

Problems with ER Modeling - Summary• Too much clustering; attributes in the wrong place

• Ignores (presumes) intra-record structure (that is, inter-attribute

relationships)• Human modeler is responsible for normalization

remedy is always record decomposition

• Attribute migration… to become an entity- modeler must distinguish attributes and entities

• Naming columns = domain + role, loses domain objects

• Modeling dilemma:– Complete representation of an entity object - more

clustering– Full normalization (1NF) – decomposition, less clustering

• Indirect representation of M:N relationships– Introduces artificial “new” entities

• Difficulty representing Ternary relationships

• Stability of the query language (SQL)

ORMvER

Page 70: DAMA, 2001 December

70

At the Root,

What’s wrong with

ER Modeling?

ORMvER

CLUSTERING

Gordon C. EverestCarlson School of Management

University of Minnesota

Page 71: DAMA, 2001 December

71

Why NIAM/OR Modeling?

• roots in both LOGIC & LINGUISTICS

• based on one modeling construct: the fact sentence

• more expressive, understandable - diagrams & verbalization

• diagrams can be populated with actual data samples

• abstraction levels equivalent to E-R modeling

• more, richer semantics (than E-R, EER, IDEF1X)

• capture and represent all functional dependencies

• avoids normalization problems with record-based modeling

• better meets criteria for good data modeling

• organizations that switched wouldn’t go back to E-R

• direction of Standards (SUMM, UDM, ...)

• now supported with a viable PC-based CASE tool

ORMINTRO

Page 72: DAMA, 2001 December

72

Resources on ORM

BOOK:

• Terry Halpin (now from Microsoft), Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design, Morgan Kaufmann Publishers, San Francisco, 2001, 763 pages.

WEB SITE for my course:

• http://webfoot.csom.umn.edu/faculty/everest/idsx431

– with ORM intro and further reading – InfoModeler software download– Usage Notes

SPRING CLASSES:• IDSc 6431 (for MBAs)• IDSc 4431 (for CSOM Undergrads)• IDSc 4131 (for CCE and others)

TRAINING and CONSULTING:

• InConcept, Inc., Lake Elmo, MN www.inconcept.com

ORMvER