Upload
teige
View
68
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Construction of a database Per Weidenman PAR AB. Database A collection of data It belongs together It models the ”world” Database management system (DBMS) The database (a collection of interrelated data) Software to manage and access the data. User: Searching Reporting. - PowerPoint PPT Presentation
Citation preview
Construction of adatabase
Per WeidenmanPAR AB
Database•A collection of data•It belongs together•It models the ”world”
Database management system (DBMS)•The database (a collection of interrelated data)•Software to manage and access the data
DBMSInput:transactions
User:•Searching•Reporting
Statisticalanalysis
Organiseddata
DBMS requirements
•”Database”•Data Warehouse•etc.
Database management systems (DBMS)•Microsoft Access•Microsoft SQL Server•DB2•Oracle•MySQL•FirebirdSQL•etc.
SQL – Structured Query LanguageA computer language to define andsearch data
Relational databases
Tables containing data, organised inrows and columns
Keys, used for linking data in differenttables
Example
Simple database for collecting and organising statistical papers
Created in Microsoft Access
Paper nameand details
Link todokument(pdf file)
Autors
A database with four tables
Keys
One of the tables, containing paper name and details
One paper on each rowRows containing paper name and other details
Key
The keys are used to link data in the four tables
123456789
111…56
123…44
1234
AaaaBbbbCcccDddd
Table”artiklar”
Table”författare2”
Table”personer2”
Key:artikel_id
Key:artikel_id
Key:person_id
Key:person_id
One paper having 3 autors
One person being the autor of 2 papers
A query: the result of asking the database about papers and autors
One paper and the corresponding 3 autors
One autor and the corresponding 2 papers
DBMSInput:transactions
User:•Searching•Reporting
Statisticalanalysis
Organiseddata
DBMS requirements
•”Database”•Data Warehouse•etc.
IT Department
”Business” users
DBMS requirements from a statistical / analytical viewpoint
•Data quality•Data types•Performance•Maximun information•Historical data•Regulation and secrecy
DBMS requirements from a statistical / analytical viewpoint
DBMS
Data quality
Instead of entering text/data by typing…
Input:transactions
Sales System X
Enter customer name:
User:•Searching•Reporting
Sales System X
Choose customer name:
Volvo Personvagnar ABVolvo Lastvagnar ABVolvo Construction ABVolvo Bussar ABVolvo Logistics AB…
… use, if possible, selection from a list of valid values
DBMS requirements from a statistical / analytical viewpoint
DBMS
Data quality
Input:transactions
User:•Searching•Reporting
Sales System X
Enter customer age:
Define rules for valid input (values, intervals, etc.)
We dont want:•Negative values•40+•1982
DBMS requirements from a statistical / analytical viewpoint
DBMS
Data quality
Input:transactions
User:•Searching•Reporting
Handling of missing values …
Missing values should stored as ”null” in the database.Not as 0 (digit zero)
DBMS requirements from a statistical / analytical viewpoint
Data types
Text
Numeric
DBMS requirements from a statistical / analytical viewpoint
Performance
DBMSInput:transactions
User:•Searching•Reporting
Statisticalanalysis
Organiseddata
DBMS requirements
Searching for individual records
Creating ”prepared” reports by counting or summing
•Large datasets•Multivariate methods•Iterative estimation•Etc.
DBMS requirements from a statistical / analytical viewpoint
DBMS
Maximum information
Input:transactions
User:•Searching•Reporting
Sales System X
Enter customer age:
34
We need to report on age groups:20-2930-3940-49…
Thus we store age as an interval, not as a value!
The fallacy of beeing too user oriented!
DBMS requirements from a statistical / analytical viewpoint
DBMS
Historical data
Input:transactions
User:•Searching•Reporting
Sales System X
Customer name:
Customer address:
Order date: Order value:
Table: Orders
CustomerID
Orderdate
Ordervalue
Each new order for a specific customer …
… will be added to table Orders and stored as a ”new row”
DBMS requirements from a statistical / analytical viewpoint
DBMS
Historical data
Input:transactions
User:•Searching•Reporting
Sales System X
Customer name:
Customer address:
Order date: Order value:
Table: Customers
CustomerID
Customername
Customeraddress
But a new address …
… will probably UPDATE the existing record (row) for the specific customer
Thus, the old value of”customer address” will bedeleted and replaced withthe new value.
But this will do fine forusers focusing onsearching / reporting!
DBMS requirements from a statistical / analytical viewpoint
DBMS
Historical data
Input:transactions
User:•Searching•Reporting
CustomerID
Customername
Customeraddress
Table: Customers
Table: Customers_history
CustomerID
Customername
Customeraddress From To
Create av new table to contain historic records
Each time a value is UPDATED for a certain customer …
… the complete (previous)record is transfered to thetable Customers_history
DBMS requirements from a statistical / analytical viewpoint
DBMS
Historical data
Input:transactions
User:•Searching•Reporting
CustomerID
Customername
Customeraddress
Table: Customers
Table: Customers_history
CustomerID
Customername
Customeraddress From To
This structure will make analysisof processes possible
But not easy!
DBMS requirements from a statistical / analytical viewpoint
Regulation and sectrecy
DBMS requirements from a statistical / analytical viewpoint
Currentdata
Current + historicaldata
Operating onindividual records
Operating onmany records
Next on this channel…
DBMSInput:transactions
User:•Searching•Reporting
Statisticalanalysis
Organiseddata
DBMS requirements
A database containinghistoric transactions
Board data
PAR / Bisnode database
Tables
Basic company dataOne record per company.Contains name, address, startdate,enddate, line of business, etc.
Historic company dataMany records per company.Contains the accumulated historic records from table FTG
Balance sheet dataOne record per annual report(thus many records per company).Turnover, profit, key ratios, etc.
Board member dataMany records per company andperson.
FTG
FTG_H
BOKSLUT
FUNKTION_PERIOD
And manymore tables!
SerranoStatisticalanalysis
How?Historic names etc.
Sampling for times series
statistics
END
Basic company dataOne record per company.Contains name, address, startdate,enddate, line of business, etc.
Historic company dataMany records per company.Contains the accumulated historic records from table FTG
Balance sheet dataOne record per annual report(thus many records per company).Turnover, profit, key ratios, etc.
Board member dataMany records per company andperson.
SerranoBalance sheet data from different periodstransformed to yearly data records
SerranoHistoric transactions from FTG_Htransformed to yearly data records
Serrano Board DataBalance member data from any mix of startdate, enddata and period lengthtransformed to yearly data records
Summing up register data to annual figures
A
ÅR
Nu3 2 1
Exampel.Register containing balance sheet data:•Number of employes•Turnover•Profit•Tangible assets•Etc.
Exampel.Register containing balance sheet data:•Number of employes•Turnover•Profit•Tangible assets•Etc.
A
ÅR
Nu3 2 1
B
Brutet räkenskapsår
Summing up register data to annual figures
A
ÅR
Nu3 2 1
B
C
Omlagda räkenskapsår
Summing up register data to annual figures
A
ÅR
Nu3 2 1
B
C
D
Missing data
Summing up register data to annual figures
ÅR
Nu3 2 1
B
Förslag:Bryt ner flödesvariablerna (omsättning, vinst, etc.) till månadsvärden …
Förslag:Bryt ner flödesvariablerna (omsättning, vinst, etc.) till månadsvärden …
Summing up register data to annual figures
ÅR
Nu3 2 1
B
Förslag:… och summera månadsvärdena till ett ’fingerat’ kalenderårsvärde.
Förslag:… och summera månadsvärdena till ett ’fingerat’ kalenderårsvärde.
Förslag:… samt imputera för fulltäckning under sista året
Förslag:… samt imputera för fulltäckning under sista året
Summing up register data to annual figures
ÅR
Nu3 2 1
B
Summing up register data to annual figures
Database
First exampelRegister based transport statistics for SIKA:
Decreased response burden
Increased understanding of the transportingcompanies (as a complement to the ”usual”fokus on type of goods)
Time series describing economic status and change.
0
20000
40000
60000
80000
100000
120000
140000
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
An
stäl
lda
0
50000
100000
150000
200000
250000
Net
too
msä
ttn
ing
, M
KR
Anställda
Nettoomsättning
Objective:Describing economic status and change in transporting companiesduring the last ten years.Total number of employes and turnover …
80
100
120
140
160
180
200
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Transportbranscherna
BNP
Objective:Describing economic status and change in transporting companiesduring the last ten years.… or turnover growth compared to BNP
Objective:Describing economic status and change in transporting companiesduring the last ten years.… or profit development for different types of freight companies
Objective:Describing economic status and change in transporting companiesduring the last ten years.… or the number of employes in a cohort of new companies.
Aktiva företag Aktiva aktiebolag BNP
År Totalt
Därav aktie-bolag
Antal anställda
Nettoom-sättning
(Mkr)
Löpande
priser (Mkr)
1997 12912 10599 98259 120284 1927001
1998 12788 10626 100663 127745 2012091
1999 12547 10543 102531 133078 2123971
2000 12562 10704 106811 145496 2249987
2001 12383 10659 112685 163418 2326176
2002 12432 10741 114426 168214 2420761
2003 12616 10935 115135 178294 2515150
2004 12689 11067 118015 188913 2624964
2005 12709 11100 119387 209819 2735218
2006 12514 11012 121683 224225 2899653
Tables based on balance sheet data from each company
Aktiva företag Aktiva aktiebolag BNP
År Totalt
Därav aktie-bolag
Antal anställda
Nettoom-sättning
(Mkr)
Löpande
priser (Mkr)
1997 12912 10599 98259 120284 1927001
1998 12788 10626 100663 127745 2012091
1999 12547 10543 102531 133078 2123971
2000 12562 10704 106811 145496 2249987
2001 12383 10659 112685 163418 2326176
2002 12432 10741 114426 168214 2420761
2003 12616 10935 115135 178294 2515150
2004 12689 11067 118015 188913 2624964
2005 12709 11100 119387 209819 2735218
2006 12514 11012 121683 224225 2899653
What data is needed?Company data including
micro level history.Exactly which companies where active in transport
during each year?
Company data includingmicro level history.
Exactly which companies where active in transport
during each year?
Balance sheet data from all transporting companies for each
year
Balance sheet data from all transporting companies for each
year
Aktiva företag Aktiva aktiebolag BNP
År Totalt
Därav aktie-bolag
Antal anställda
Nettoom-sättning
(Mkr)
Löpande
priser (Mkr)
1997 12912 10599 98259 120284 1927001
1998 12788 10626 100663 127745 2012091
1999 12547 10543 102531 133078 2123971
2000 12562 10704 106811 145496 2249987
2001 12383 10659 112685 163418 2326176
2002 12432 10741 114426 168214 2420761
2003 12616 10935 115135 178294 2515150
2004 12689 11067 118015 188913 2624964
2005 12709 11100 119387 209819 2735218
2006 12514 11012 121683 224225 2899653
What data is needed?Company data including
micro level history.Exactly which companies where active in transport
during each year?
Company data includingmicro level history.
Exactly which companies where active in transport
during each year?
Balance sheet data from all transporting companies for each
year
Balance sheet data from all transporting companies for each
year
Faster access to ”last years” data compared to taxation
based registers
Faster access to ”last years” data compared to taxation
based registers
A
ÅR
Nu
B
C
D
3 2 1
Sampling companies for time series statistics
A
ÅR
Nu
B
C
D
3 2 1
Sampling companies for time series statistics
A
ÅR
Nu
B
C
D
3 2 1
Sampling companies for time series statistics
A
ÅR
Nu
B
C
D
3 2 1ACD
ABCD
ABC
Sampling companies for time series statistics