Statistical databases in theory and practice Part III: Designing statistical databases

Preview:

DESCRIPTION

Statistical databases in theory and practice Part III: Designing statistical databases. Bo Sundgren 2008-02-11. Conceptual data model and relational data model in normalised form. Concept modelling. Define concepts and relations between them Conceptual models and data models - PowerPoint PPT Presentation

Citation preview

Statistical databases in theory and practice

Part III: Designing statistical databases

Bo Sundgren

2008-02-11

 Conceptual data model and relational data model in normalised form.

PERSON

IdentifierHouseholdIdentifier*SexAgeEducationOccupationIncomeWealthHealth

ESTABLISHMENT

EstablishmentIdentifierOrganisationIdentifier*LocationKindOfActivityNumberOfEmployeesNetProfit

ORGANISATION

OrganisationIdentifierLocationOfHQ

BELONGS TO

HOUSEHOLD

IdentifierDwellingIdentifier*SizeStructureIncome

BELONGS TO

DWELLING

IdentifierLocationSizeStandardRent

LIVES IN

MIGRATIONEVENT

IdentifierPersonIdentifier*FromDwellingId*ToDwellingId*Time

OF

FROM TO

PERSONESTABLISH-

MENT

ORGANISATION

BELONGS TO

HOUSEHOLD

BELONGS TO

DWELLINGLIVES IN

MIGRATIONEVENT

OF

FROM TO

WORKS AT

EMPLOYMENT

EMPLOYMENT

PersonIdentifier*EstablishmentIdentifier*PercentOfFullTimeSalary

Concept modelling

• Define concepts and relations between them• Conceptual models and data models• Visualise models graphically

Rent-A-Video: first object graph

VideoFilm CustomerIsRentedBy

Rents

Rent-A-Video: elaborated object graph

VideoFilm CustomerIsRentedBy

Rents

FilmTitle

Rep

rese

nts

IsRep

resented

By

FilmId

Title

Category

Price

Actor*

Story

NumberOfCopies=

NumberOfRents=

FilmId

CopyNr

Rented?

NumberOfRents

CustomerId

Name

Address

Discount

Rental

CustomerId

FilmId

CopyNr

RentalNr

RentalDate

AgreedReturnDate

Returned?

ActualReturnDate

Rent-A-Video: further aspects

VideoFilm CustomerIsRentedBy

Rents

FilmTitle

Rep

rese

nts

IsRep

resented

By

IsRentedBy

Ren

ts

IsReservedBy

Res

erve

s

Relations between two object types

• one-to-one, symbolised by “arrow-to-arrow”

• one-to-many, symbolised by “arrow-to-fork”

• many-to-one, symbolised by “fork-to-arrow”

• many-to-many, symbolised by “fork-to-fork”

Note: The relation is usually not a flow relation!

(But you should tell what kind of relation it is.)

Object graphs: another example

PERSON

PersonId

HOUSEHOLD

HouseholdId

NumberOfPersons=

Income=

Sex

CO

NS

IST

S O

F

BE

LO

NG

S T

O

Income

Age

HomeMunicipality

PostalCode

HighestEducation

IS FATHER OF

IS MOTHER OF

Concept modelling: Exercises

• B2B– The customers of companies are companies– Companies have employees (persons)

• B2C– The customers of companies are consumers (persons)– Companies have employees (persons)

• B2B+B2C– The customers of companies are companies or

consumers (persons)– Companies have employees (persons)

Hint: There are two basic object types, COMPANY and PERSON in all three examples

Different roles of concept modelling

• Clarifying a small number of related concepts• Information model for an application

– defining meaning– basis for data design

• Corporate information model– for more efficient communication between people– basis for system integration

Concept model ---> Data model

FilmCopy CustomerIsRentedBy

Rents

FilmTitle

Re

pre

se

nts

IsR

ep

res

en

ted

By

FilmId

Title

Category

Price

Actor*

Story

NumberOfCopies=

NumberOfRents=

FilmId

CopyNr

Rented?

NumberOfRents

CustomerId

Name

Address

Discount

Rental

Rental date

AgreedReturnDate

Returned?

ActualReturnDate

CustomerId

FilmId

CopyNr

RentalNr

CopyNr Rented?FilmId

FilmId CopyNr CustomerId RentalNrRentals

FilmCopies

Name AddressCustomerIdCustomers

NumberOfRents

RentalDate

AgreedReturnDate

Returned?Actual

ReturnDate

Discount

FilmId Title Category Price StoryAgreed

ReturnDateReturned?

ActualReturnDate

ActorNameFilmIdActorsInFilms ActorsRoleInFilm

FilmTitles

CopyNr Rented?FilmId

FilmId CopyNr CustomerId RentalNrRentals

FilmCopies

Name AddressCustomerIdCustomers

NumberOfRents

RentalDate

AgreedReturnDate

Returned?Actual

ReturnDate

Discount

FilmId Title Category Price StoryAgreed

ReturnDateReturned?

ActualReturnDate

ActorNameFilmIdActorsInFilms ActorsRoleInFilm

FilmTitles

Concept model ---> Star/cube model

FilmCopy CustomerIsRentedBy

Rents

FilmTitle

Re

pre

se

nts

IsR

ep

res

en

ted

By

FilmId

Title

Category

Price

Actor*

Story

NumberOfCopies=

NumberOfRents=

FilmId

CopyNr

Rented?

NumberOfRents

CustomerId

Name

Address

Discount

Rental

Rental date

AgreedReturnDate

Returned?

ActualReturnDate

CustomerId

FilmId

CopyNr

RentalNr

OBJECT IN FOCUS

FilmTitle

Customer

PriceGroup

NumberOfRentsPerCopy

FilmId

Title

Category

NumberOfRents

CustomerId

Category

Area

Rental

Rental date

Delayed?

CustomerId

FilmId

RentalNr

FILMCATEGORY

CUSTOMER DIS-COUNT CATEGORY

Number of rentals of film copiesduring the year t by customer

discount category and film category

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

Star model for Data Warehouse

OBJECT IN FOCUS

FilmTitle

Customer

PriceGroup

NumberOfRentsPerCopy

FilmId

Title

Category

NumberOfRents

CustomerId

Category

Area

Rental

Rental date

Delayed?

CustomerId

FilmId

RentalNr

Multidimensional model (cube model)

FILMCATEGORY

CUSTOMER DIS-COUNT CATEGORY

Number of rentals of film copiesduring the year t by customer

discount category and film category

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

Modelling the contents and structure of official statistics

Or: How to design ”correct” and globally consistent SDMX Data Structure Definitions

Or: Navigating in a space of statistical surveys of society

Or: Reality as a statistical construction

Bo Sundgren, Statistics SwedenICES-III, Montreal, June 18-21, 2007

What can a statistical agency do, in order to help a user - find potentially relevant statistical data? - judge the relevance of data retrieved?

• Provide overviews of available data

• Provide search tools

• Provide informative metadata

Conceptual navigation: contents exploration and searching for statistics

• A conceptual model of societyas reflected by official statistics

SOCIO-ECONOM IC PROCESSES,SUBPROCESSES, ACTIVITIES

ACTORSwith properties

- persons, households- organisations, institutions- enterprises ,establishm ents- ...

involvem entin different

roles

UTILITIES with properties

- resources: real, financial- products: com m odities , services , inform ation, ...- assets and liabilities- ...

SOCIETY

ACTORLIFE HISTORIES

(actor eigenprocesses)SUBHISTORIES,

”CASES”

UTILITYLIFE HISTORIES

(utility eigenprocesses)SUBHISTORIES,

”CASES”

BY AREA:

- agriculture- manufacturing- service industr- trade- transports- energy- construction- education- R&D- financial- information- culture&leisure- health- social services- judicial services- ...

BY SECTOR:

- public . central . regional . local- private . business . household

involvem entin different

roles

BY FUNCTION:

- production- consumption- investment- order/delivery- sales- supply- employment- environment side-effects- ...

SOCIO-ECONOMIC PROCESSES,SUBPROCESSES, ACTIVITIES

ACTORSwith properties

- persons, households- organisations, institutions- enterprises,establishments- ...

involvementin different

roles

UTILITIES with properties

- resources: real, financial- products: commodities, services, information, ...- assets and liabilities- ...

SOCIETY

ACTORLIFE HISTORIES

(actor eigenprocesses)SUBHISTORIES,

”CASES”

UTILITYLIFE HISTORIES

(utility eigenprocesses)SUBHISTORIES,

”CASES”

countable/measurable

COMPLEXOBJECTS

with properties

events, transactions,relationships, ”cases”, ...

DATA COLLECTIONAND AGGREGATION

PROCESSES

- done by respondents- done by agency

STATISTICAL DATA(micro, macro)

organised in multidimensionalstructures:

balance sheets,crosstabulations, cubes, etc

FURTHERPROCESSING

ANALYTICALPRODUCTS

BY AREA:

- agriculture- manufacturing- service industr- trade- transports- energy- construction- education- R&D- financial- information- culture&leisure- health- social services- judicial services- ...

BY SECTOR:

- public . central . regional . local- private . business . household

involvementin different

roles

BY FUNCTION:

- production- consumption- investment- order/delivery- sales- supply- employment- environment side-effects- ...

OBJECTRELATIONS

OBJECTRELATIONS

countable/measurable

BASIC OBJECTS(ACTORS)

with properties

countable/measurable

BASIC OBJECTS(UTILITIES)

with properties

Statistics Canada: Agents, Events, Things

Contents By Example (based on a simple generic model)

PERSONVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

x

m

>0

x

ORGANISATIONVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

x

p

<5

x

RESOURCEVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

g

PRODUCTVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

x

ACTIVITYVARIABLE

VARIABLE

VARIABLE

EVENTVARIABLE

VARIABLE

VARIABLE

RELATIONVARIABLE

VARIABLE

VARIABLE

x x

xx

Actors Utilities

Complexobjects

Everything ”clickable”OBJECT

VARIABLE

Lefthand click Righthand click

Select:- object- variable

Retrieve metadata:- definition- value set, classification- questionnaire- quality declaration- survey documentation

ProgramExecution.countunique(Provider.Id,

Program.Type,Program.Level,

Program.Orientation)

EducationProvider(Institution)

x Sector (Public/Private)

Teacher

x Sex

Provides

TeacherEngagement.count

x TeacherEdStatusx PartTimeStatus- PartTimeFraction.sum

EducationSystem(Utility)

- Country- Currency- CompulsoryEdBegAge- CompulsoryEdEndAge- CompulsoryEdLength- AcadYearBegMonth- AcadYearEndMonth

IsEngagedIn

EducationProgram(Utility)

- Name - Year - EntranceAge - Duration x Type x Level (ISCED97) x Grade x Orientation x PositionInDegreeStructure x FieldOfEducation

BelongsTo

Of

Pupil

x Sexx Agex CountryOfOriginx AttendedPrePrimary

PupilEnrolment.count

x PartTimeStatusx Repeaterx Completer/DropOutx CumulatedTime- PartTimeFraction.sum

IsEnrolledIn

Expenditure

x EducationalStatusx Sourcex Nature- Amount.sum

Funder(Actor)

x Sector (Public/Private/...)

Pays For

For

For

LEGEND:

one-to-many relationship

many-to-one relationship

one-to-one relationship

many-to-many relationship

x Variable: indicates that the ”Variable” variable has a classifying role

Object.count – indicates that ”Object” objects are counted

Variable.sum – indicates that the ”Variable” variable is summarised

reading direction

For

UNESCOmodelversion 1(to be revised)

Recommended