65
Statistical databases in theory and practice Part III: Designing statistical databases Bo Sundgren 2008-02-11

Statistical databases in theory and practice Part III: Designing statistical databases

  • Upload
    phiala

  • View
    38

  • Download
    4

Embed Size (px)

DESCRIPTION

Statistical databases in theory and practice Part III: Designing statistical databases. Bo Sundgren 2008-02-11. Conceptual data model and relational data model in normalised form. Concept modelling. Define concepts and relations between them Conceptual models and data models - PowerPoint PPT Presentation

Citation preview

Page 1: Statistical databases in theory  and practice Part III: Designing statistical databases

Statistical databases in theory and practice

Part III: Designing statistical databases

Bo Sundgren

2008-02-11

Page 2: Statistical databases in theory  and practice Part III: Designing statistical databases

 Conceptual data model and relational data model in normalised form.

PERSON

IdentifierHouseholdIdentifier*SexAgeEducationOccupationIncomeWealthHealth

ESTABLISHMENT

EstablishmentIdentifierOrganisationIdentifier*LocationKindOfActivityNumberOfEmployeesNetProfit

ORGANISATION

OrganisationIdentifierLocationOfHQ

BELONGS TO

HOUSEHOLD

IdentifierDwellingIdentifier*SizeStructureIncome

BELONGS TO

DWELLING

IdentifierLocationSizeStandardRent

LIVES IN

MIGRATIONEVENT

IdentifierPersonIdentifier*FromDwellingId*ToDwellingId*Time

OF

FROM TO

PERSONESTABLISH-

MENT

ORGANISATION

BELONGS TO

HOUSEHOLD

BELONGS TO

DWELLINGLIVES IN

MIGRATIONEVENT

OF

FROM TO

WORKS AT

EMPLOYMENT

EMPLOYMENT

PersonIdentifier*EstablishmentIdentifier*PercentOfFullTimeSalary

Page 3: Statistical databases in theory  and practice Part III: Designing statistical databases

Concept modelling

• Define concepts and relations between them• Conceptual models and data models• Visualise models graphically

Page 4: Statistical databases in theory  and practice Part III: Designing statistical databases

Rent-A-Video: first object graph

VideoFilm CustomerIsRentedBy

Rents

Page 5: Statistical databases in theory  and practice Part III: Designing statistical databases

Rent-A-Video: elaborated object graph

VideoFilm CustomerIsRentedBy

Rents

FilmTitle

Rep

rese

nts

IsRep

resented

By

FilmId

Title

Category

Price

Actor*

Story

NumberOfCopies=

NumberOfRents=

FilmId

CopyNr

Rented?

NumberOfRents

CustomerId

Name

Address

Discount

Rental

CustomerId

FilmId

CopyNr

RentalNr

RentalDate

AgreedReturnDate

Returned?

ActualReturnDate

Page 6: Statistical databases in theory  and practice Part III: Designing statistical databases

Rent-A-Video: further aspects

VideoFilm CustomerIsRentedBy

Rents

FilmTitle

Rep

rese

nts

IsRep

resented

By

IsRentedBy

Ren

ts

IsReservedBy

Res

erve

s

Page 7: Statistical databases in theory  and practice Part III: Designing statistical databases

Relations between two object types

• one-to-one, symbolised by “arrow-to-arrow”

• one-to-many, symbolised by “arrow-to-fork”

• many-to-one, symbolised by “fork-to-arrow”

• many-to-many, symbolised by “fork-to-fork”

Note: The relation is usually not a flow relation!

(But you should tell what kind of relation it is.)

Page 8: Statistical databases in theory  and practice Part III: Designing statistical databases

Object graphs: another example

PERSON

PersonId

HOUSEHOLD

HouseholdId

NumberOfPersons=

Income=

Sex

CO

NS

IST

S O

F

BE

LO

NG

S T

O

Income

Age

HomeMunicipality

PostalCode

HighestEducation

IS FATHER OF

IS MOTHER OF

Page 9: Statistical databases in theory  and practice Part III: Designing statistical databases

Concept modelling: Exercises

• B2B– The customers of companies are companies– Companies have employees (persons)

• B2C– The customers of companies are consumers (persons)– Companies have employees (persons)

• B2B+B2C– The customers of companies are companies or

consumers (persons)– Companies have employees (persons)

Hint: There are two basic object types, COMPANY and PERSON in all three examples

Page 10: Statistical databases in theory  and practice Part III: Designing statistical databases

Different roles of concept modelling

• Clarifying a small number of related concepts• Information model for an application

– defining meaning– basis for data design

• Corporate information model– for more efficient communication between people– basis for system integration

Page 11: Statistical databases in theory  and practice Part III: Designing statistical databases

Concept model ---> Data model

FilmCopy CustomerIsRentedBy

Rents

FilmTitle

Re

pre

se

nts

IsR

ep

res

en

ted

By

FilmId

Title

Category

Price

Actor*

Story

NumberOfCopies=

NumberOfRents=

FilmId

CopyNr

Rented?

NumberOfRents

CustomerId

Name

Address

Discount

Rental

Rental date

AgreedReturnDate

Returned?

ActualReturnDate

CustomerId

FilmId

CopyNr

RentalNr

CopyNr Rented?FilmId

FilmId CopyNr CustomerId RentalNrRentals

FilmCopies

Name AddressCustomerIdCustomers

NumberOfRents

RentalDate

AgreedReturnDate

Returned?Actual

ReturnDate

Discount

FilmId Title Category Price StoryAgreed

ReturnDateReturned?

ActualReturnDate

ActorNameFilmIdActorsInFilms ActorsRoleInFilm

FilmTitles

Page 12: Statistical databases in theory  and practice Part III: Designing statistical databases

CopyNr Rented?FilmId

FilmId CopyNr CustomerId RentalNrRentals

FilmCopies

Name AddressCustomerIdCustomers

NumberOfRents

RentalDate

AgreedReturnDate

Returned?Actual

ReturnDate

Discount

FilmId Title Category Price StoryAgreed

ReturnDateReturned?

ActualReturnDate

ActorNameFilmIdActorsInFilms ActorsRoleInFilm

FilmTitles

Page 13: Statistical databases in theory  and practice Part III: Designing statistical databases

Concept model ---> Star/cube model

FilmCopy CustomerIsRentedBy

Rents

FilmTitle

Re

pre

se

nts

IsR

ep

res

en

ted

By

FilmId

Title

Category

Price

Actor*

Story

NumberOfCopies=

NumberOfRents=

FilmId

CopyNr

Rented?

NumberOfRents

CustomerId

Name

Address

Discount

Rental

Rental date

AgreedReturnDate

Returned?

ActualReturnDate

CustomerId

FilmId

CopyNr

RentalNr

OBJECT IN FOCUS

FilmTitle

Customer

PriceGroup

NumberOfRentsPerCopy

FilmId

Title

Category

NumberOfRents

CustomerId

Category

Area

Rental

Rental date

Delayed?

CustomerId

FilmId

RentalNr

FILMCATEGORY

CUSTOMER DIS-COUNT CATEGORY

Number of rentals of film copiesduring the year t by customer

discount category and film category

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

Page 14: Statistical databases in theory  and practice Part III: Designing statistical databases

Star model for Data Warehouse

OBJECT IN FOCUS

FilmTitle

Customer

PriceGroup

NumberOfRentsPerCopy

FilmId

Title

Category

NumberOfRents

CustomerId

Category

Area

Rental

Rental date

Delayed?

CustomerId

FilmId

RentalNr

Page 15: Statistical databases in theory  and practice Part III: Designing statistical databases

Multidimensional model (cube model)

FILMCATEGORY

CUSTOMER DIS-COUNT CATEGORY

Number of rentals of film copiesduring the year t by customer

discount category and film category

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

number ofrentals

Page 16: Statistical databases in theory  and practice Part III: Designing statistical databases

Modelling the contents and structure of official statistics

Or: How to design ”correct” and globally consistent SDMX Data Structure Definitions

Or: Navigating in a space of statistical surveys of society

Or: Reality as a statistical construction

Bo Sundgren, Statistics SwedenICES-III, Montreal, June 18-21, 2007

Page 17: Statistical databases in theory  and practice Part III: Designing statistical databases

What can a statistical agency do, in order to help a user - find potentially relevant statistical data? - judge the relevance of data retrieved?

• Provide overviews of available data

• Provide search tools

• Provide informative metadata

Page 18: Statistical databases in theory  and practice Part III: Designing statistical databases

Conceptual navigation: contents exploration and searching for statistics

• A conceptual model of societyas reflected by official statistics

Page 19: Statistical databases in theory  and practice Part III: Designing statistical databases

SOCIO-ECONOM IC PROCESSES,SUBPROCESSES, ACTIVITIES

ACTORSwith properties

- persons, households- organisations, institutions- enterprises ,establishm ents- ...

involvem entin different

roles

UTILITIES with properties

- resources: real, financial- products: com m odities , services , inform ation, ...- assets and liabilities- ...

SOCIETY

ACTORLIFE HISTORIES

(actor eigenprocesses)SUBHISTORIES,

”CASES”

UTILITYLIFE HISTORIES

(utility eigenprocesses)SUBHISTORIES,

”CASES”

BY AREA:

- agriculture- manufacturing- service industr- trade- transports- energy- construction- education- R&D- financial- information- culture&leisure- health- social services- judicial services- ...

BY SECTOR:

- public . central . regional . local- private . business . household

involvem entin different

roles

BY FUNCTION:

- production- consumption- investment- order/delivery- sales- supply- employment- environment side-effects- ...

Page 20: Statistical databases in theory  and practice Part III: Designing statistical databases

SOCIO-ECONOMIC PROCESSES,SUBPROCESSES, ACTIVITIES

ACTORSwith properties

- persons, households- organisations, institutions- enterprises,establishments- ...

involvementin different

roles

UTILITIES with properties

- resources: real, financial- products: commodities, services, information, ...- assets and liabilities- ...

SOCIETY

ACTORLIFE HISTORIES

(actor eigenprocesses)SUBHISTORIES,

”CASES”

UTILITYLIFE HISTORIES

(utility eigenprocesses)SUBHISTORIES,

”CASES”

countable/measurable

COMPLEXOBJECTS

with properties

events, transactions,relationships, ”cases”, ...

DATA COLLECTIONAND AGGREGATION

PROCESSES

- done by respondents- done by agency

STATISTICAL DATA(micro, macro)

organised in multidimensionalstructures:

balance sheets,crosstabulations, cubes, etc

FURTHERPROCESSING

ANALYTICALPRODUCTS

BY AREA:

- agriculture- manufacturing- service industr- trade- transports- energy- construction- education- R&D- financial- information- culture&leisure- health- social services- judicial services- ...

BY SECTOR:

- public . central . regional . local- private . business . household

involvementin different

roles

BY FUNCTION:

- production- consumption- investment- order/delivery- sales- supply- employment- environment side-effects- ...

OBJECTRELATIONS

OBJECTRELATIONS

countable/measurable

BASIC OBJECTS(ACTORS)

with properties

countable/measurable

BASIC OBJECTS(UTILITIES)

with properties

Page 21: Statistical databases in theory  and practice Part III: Designing statistical databases

Statistics Canada: Agents, Events, Things

Page 22: Statistical databases in theory  and practice Part III: Designing statistical databases

Contents By Example (based on a simple generic model)

PERSONVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

x

m

>0

x

ORGANISATIONVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

x

p

<5

x

RESOURCEVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

g

PRODUCTVARIABLE

VARIABLE

VARIABLE

VARIABLE

VARIABLE

x

ACTIVITYVARIABLE

VARIABLE

VARIABLE

EVENTVARIABLE

VARIABLE

VARIABLE

RELATIONVARIABLE

VARIABLE

VARIABLE

x x

xx

Actors Utilities

Complexobjects

Page 23: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 24: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 25: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 26: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 27: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 28: Statistical databases in theory  and practice Part III: Designing statistical databases

Everything ”clickable”OBJECT

VARIABLE

Lefthand click Righthand click

Select:- object- variable

Retrieve metadata:- definition- value set, classification- questionnaire- quality declaration- survey documentation

Page 29: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 30: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 31: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 32: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 33: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 34: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 35: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 36: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 37: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 38: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 39: Statistical databases in theory  and practice Part III: Designing statistical databases

ProgramExecution.countunique(Provider.Id,

Program.Type,Program.Level,

Program.Orientation)

EducationProvider(Institution)

x Sector (Public/Private)

Teacher

x Sex

Provides

TeacherEngagement.count

x TeacherEdStatusx PartTimeStatus- PartTimeFraction.sum

EducationSystem(Utility)

- Country- Currency- CompulsoryEdBegAge- CompulsoryEdEndAge- CompulsoryEdLength- AcadYearBegMonth- AcadYearEndMonth

IsEngagedIn

EducationProgram(Utility)

- Name - Year - EntranceAge - Duration x Type x Level (ISCED97) x Grade x Orientation x PositionInDegreeStructure x FieldOfEducation

BelongsTo

Of

Pupil

x Sexx Agex CountryOfOriginx AttendedPrePrimary

PupilEnrolment.count

x PartTimeStatusx Repeaterx Completer/DropOutx CumulatedTime- PartTimeFraction.sum

IsEnrolledIn

Expenditure

x EducationalStatusx Sourcex Nature- Amount.sum

Funder(Actor)

x Sector (Public/Private/...)

Pays For

For

For

LEGEND:

one-to-many relationship

many-to-one relationship

one-to-one relationship

many-to-many relationship

x Variable: indicates that the ”Variable” variable has a classifying role

Object.count – indicates that ”Object” objects are counted

Variable.sum – indicates that the ”Variable” variable is summarised

reading direction

For

UNESCOmodelversion 1(to be revised)

Page 40: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 41: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 42: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 43: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 44: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 45: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 46: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 47: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 48: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 49: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 50: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 51: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 52: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 53: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 54: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 55: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 56: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 57: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 58: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 59: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 60: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 61: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 62: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 63: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 64: Statistical databases in theory  and practice Part III: Designing statistical databases
Page 65: Statistical databases in theory  and practice Part III: Designing statistical databases