FPA for Data Warehousing

8/8/2019 FPA for Data Warehousing

1/25

FPA applied to DATA WAREHOUSING

Version 1.1.0

www.nesma.nl

PUBLICATION OF THE NETHERLANDS SOFTWARE METRICS USERS ASSOCIATION


2/25


3/25


3

Table of Contents

1 Introduction.....................................................................................................................51.1 Background to and version history of this document ................................................51.2 Overview..................................................................................................................51.3 Disclaimer................................................................................................................61.4 References ..............................................................................................................6

2 General features .............................................................................................................72.1 Estimated count .......................................................................................................72.2 Counting the project size..........................................................................................72.3 The end user............................................................................................................82.4 Count for each DWH component choice of architecture........................................8

3 The Data Staging Area....................................................................................................93.1 Counting guidelines for the Data Staging Area.......................................................103.2 Requirements design document for the Data Staging Area....................................10

4 The Star Schemas of the Data Warehouse...................................................................114.1 Counting guidelines for the Dimensions of a Star Schema.....................................134.2 Counting guidelines for the Facts of a Star Schema...............................................13 4.3 Requirements design document for Dimensions and Facts....................................13

5 The Data Marts in the Data Mart Area...........................................................................145.1 Counting guidelines for the Data Mart Area ...........................................................155.2 Requirements of the design document for the Data Mart Area...............................15

6 Reports .........................................................................................................................166.1 Counting guidelines for Reports.............................................................................176.2 Requirements of the design document for Reports ................................................17

7 Other elements of a Data Warehouse environment.......................................................187.1 Cleansing...............................................................................................................187.2 Meta Data relating to the logistic process...............................................................187.3 Meta Data relating to the meaning of the dataBusiness meta data.......................187.4 Importing FPA tables .............................................................................................18

8 Alternative architectures................................................................................................208.1 By-passing the Staging Area..................................................................................208.2 Reports from the Star Schema...............................................................................208.3 Multiple data groups in a single file delivery...........................................................208.4 Operational Data Store - ODS ...............................................................................208.4.1 Counting guidelines for the ODS............................................................................218.5 The federated Data Warehouse.............................................................................218.6 Inmon.....................................................................................................................218.7 BDWM...................................................................................................................21

Appendix A. Data Warehousing Architecture Reference Model.......................................23Appendix B. Summary Functions and Files in the DWH ...............................................24Appendix C. Requirements basic design document relating to function point counts....... 25


4/25


4

Changes and versions

version date change author(s)

1.1.0 5-9-2008 Version for publication via NESMA-website

Changes initiated after comments by IFPUG:Included relation with IFPUG terminology;Rephrased parts to clarify intentions for experts inthe field of counting

The counting guideline now is usable forestimating project size and product size.

New Guidelines for: Functionality to manage the DWH-processes

ODS

BDWM

Rob Eveleens,Theo Kersten,Nico Mak,Jolijn Onvlee.


5/25


5

1 Introduction

This document provides counting guidelines for performing an estimated Function PointAnalysis to determine the size of Data Warehouse projects - from the source up to (and

including) the reporting stage in an existing Data Warehouse environment.

The FPA count is based on version 2.2 of NESMAs Definitions and counting guidelines forthe application of Function Point Analysis (Definities en telrichtlijnen voor de toepassing vanfunctiepuntanalyse), paying particular attention to the key principles involved when applyingFPA to Data Warehouse projects. The functional architecture the distribution of thefunctionality across the Data Warehouse as a system is based on the ArchitectureReference Model of Atos Origin Nederland B.V.

1.1 Background to and version history of this document

These guidelines were developed collaboratively by Jos Hendriksen (Atos Origin BAS Oost)and Rob Eveleens (Atos Origin BI&CRM) from their experience of using the FPA guidelinesin data warehouse projects at KPN. Based on their work, Nico Mak (ABN AMRO Bank N.V.),Theo Kersten (ATOS Origin Software Development & Maintenance Center) and RobEveleens reviewed the architecture and functionality with a view to improving alignment withthe NESMA guidelines. From version 0.2. onwards, Jolijn Onvlee (Onvlee Opleidingen &Advies) and Freek Keijzer were brought in to refine the wording and the rationale. Version0.5 was reviewed internally by NESMA and was also discussed by Achmea, as a result ofwhich several suggestions were made (and incorporated) that take account of situations inwhich the core of the DWH has been developed more along the lines of Inmons approach.

1.2 Overview

This document assumes that the reader is familiar with FPA and its associated terminologyand abbreviations, and with Data Warehousing. As there are different ways of thinking aboutData Warehousing we have included in this document the approach used as the basis forthese counting guidelines. See Appendix A Data Warehousing Architecture ReferenceModel.

Each chapter on counting concludes with a brief summary of the results.

To be able to carry out an estimated count, it is preferable to have a design document thatincludes the following: the conceptual (logical) data model

a description of the functions and the inward and outward data flows

For each component of the Data Warehouse, the additional requirements needed to ensurethat the count can be carried out on the basis of the design document have been madeexplicit and have been included at the end of each chapter.

Please refer to the three appendices for a summary of counting guidelines, a framework forData Warehouses and an overview of the additional requirements.


6/25


7/25


7

2 General features

2.1 Estimated count

NESMA distinguishes three types of count:

indicative function point countgives an indication of the size of an information systemor project, based solely on a conceptual data model.

estimated or approximated function point countis a count of function points in whichthe number of functions is determined for each type of user function (usertransactions and logical files), and which uses a standard values for complexity:Average for the user transactions and Low for the logical files.

detailed function point count is the most precise count, identifying all thespecifications needed for a FPA in detail. This means that the user transactions arebroken down to the level of referred logical files and data element types, and the

logical files are broken down to the level of record types and data element types.Therefore, this type of count allows the complexity of each identifiable user functionto be determined.

These guidelines describe the implementation of an estimated function point count for DataWarehouses. In practice an estimated count is sufficiently accurate to allow confidentbudgeting and subsequent costing of projects.

We repeat, perhaps unnecessarily: The user functions that need to be counted are identifiedwithout examining the complexity of those functions. In an estimated function point count,transactional functions are counted as being of Average complexity and data functions asbeing of Lowcomplexity. 1.

2.2 Counting the project size

In applying FPA, NESMA distinguishes between product size and project size. The followingpurposes of function point counts can be distinguished:

determining the product size2:The number of function points is a measure of the extent of the functionality delivered, or tobe delivered, to a user by an information system. It is also a measure of the size of aninformation system that has to be maintained.

determining the project size3:The number of function points is a measure of the extent of the functionality either of a

complete (or partial) new information system to be created by a single project, or of theextent of changes to an existing system. In the latter case, changes may include one ormore of the following: the addition, modification or removal of user functions. The projectsize is an essential parameter for determining the effort needed for the project.

These counting guidelines are usable for both purposes.

1KGV:5FP, ILGV:7FP, IF:4FP, UF:5FP, OF:4FP.

2IFPUG: application count

3IFPUG: development count or enhancement count


8/25


8

2.3 The end user

These guidelines assess functionality from the perspective of the end user, managers, andstaff in those parts of the organisation for which the DWH is being developed.

Where functionality is being developed for managing the Data Warehouse per se, theperspective of the engineers who support the data warehouse becomes significant. Thecurrent document does not contain any guidelines on this area as that functionality can becounted as usual.

2.4 Count for each DWH component choice of architecture

The FP count for each component of the system can be carried out on the basis of afunctional design document, which itemises all the tables for each component of the systemand identifies the transformation processes between those tables in general terms. For thepurposes of this count we distinguish the following groups of functions of the Data

Warehouse as a system:1. Transporting data to the Data Warehouses Data Staging Area2. Populating the Data Warehouse core3. Preparing the Data Warehouses Data Marts4. Producing Reports

The information architecture we envisage is set out in Appendix A Data WarehousingArchitecture Reference Model.

The components of the DWH are not regarded as separate systems. Groups of data fromone component within the DWH are not regarded as External Interface Files (EIFs) for othercomponents.

KIMBALL AND INMONThere is little argument about the merit of structuring the data in the form of Star Schemas(with Facts and Dimensions) as far as Data Marts are concerned. However, in regard to thecore of the Data Warehouse there are two schools of thought:

On the one hand, there are those who favour the Kimball approach based on a core of StarSchemas [Kimball, 2008] whilst the others, prefer a core based on a relational model [Inmon,2005].

There has been plenty of discussion about the functional counting of Facts and Dimensionsfrom Star Schemas, but much less about counting in relation to relational Data Warehouses.

When a Data Warehouse is set up using Inmons philosophy, the guidelines on countingDimensions and Facts are equally relevant, albeit at somewhat later stage of the process,i.e. when setting up the Data Marts.

So please do not be misled: These guidelines are applicable to both Kimballs approach andInmons. This document also discusses the counting of the dimensions of the Facts from aStar Schema when populating the Data Warehouse: Kimballs approach. A clear distinctioncan thus be made in relation to the counting of Data Marts. Section Inmon, contains anumber of tips for the relational Data Warehouse.

REPORTS COVERED SEPARATELYReports are normally created using completely different tools from the other functions, with

productivity varying greatly between function points. Because of this variation in productivityit is desirable to keep the FP counted for reports separately so that those who have toassess the effort required can take account of that difference in productivity.


9/25


9

3 The Data Staging Area

In essence, the processes in the Data Staging Area (DSA or STA) provide the separateinterface with the various sources of data for the Data Warehouse. Data are imported into

the STA and, after a minimum of processing, stored until further processing (integration intothe Data Warehouse) is possible. Integrating the data into the DWH from the STA onlybegins when all the data required are available in the Data Warehouse or in the STA, orwhen all the preceding periods or deliveries (up to the period or deliveries to be integrated)have been processed (process dependency) or because the frequency of delivery is greaterthan the frequency of processing (timing).

The processes work on a single flat file, usually converting it into a single table, in which theinformation is supplemented to include identification of the source and the period to whichthe data relate. The data received is kept as received, so that when new functionality isintroduced, or in the event of recovery operations, data processing may be carried outsequentially again. These data are also frequently used to answer detailed queries about

reports, and to answer ad hocqueries.

This stage of the process is regarded as a technical solution for making data available to theData Warehouse. Technical solutions cannot be included in the functional count so the totalof functional points for this component is usually zero. The function is seen as a preparatorycomponent of the functions in the next chapter, which populate the Star Schema or Facts ofthe Data Warehouse. Onlyif user reports have been defined on the basis of data direct fromthe DSA are there any data that can be included in the functional count.

That could be all we mention here on the DSA. We expect a lot of comment on the viewpointand we therefore look at the technical mechanism in detail so that the reader may apply thecount to his/her individual situation.

In practice, two types of interface with sources are encountered:1. an interface using (flat) files2. a direct interface between the STA and the source database.

Ad 1. AN INTERFACE USING (FLAT) FILESWhere the interface uses (flat) files, the following functions and files can be distinguished:

1. a function that creates the (flat) file from the system supplying the data2. a flat file containing the data3. a function that transfers the (flat) file from the source environment to the DWH

environment4. a function which ensures sequential processing of the (flat) files and carries out a

number of substantive checks on the integrity of the data transport5. a function that takes the data in the (flat) file and puts them into a table in the STA,

carries out a number of checks and, if necessary, adds extra information such as thetime of processing

6. a table into which the data is placed in the STA7. a function that archives data supplied

Implementation of point 1 is usually outside the scope of the project, but could be counted asan EO (output function) of the source system. The (flat) file (2) is not counted as it is clearlythe output from the function in point 1. Points 3, 4 and 7 do not have to be redesigned andrebuilt each time. They are functions of the application that controls the input of the DWH

and its configuring is an element of the implementation plan. So we only have step 5 and 6to assess for the count.


10/25


10

If the functionality relates to only one or more of the points below, there is no reason to countthe data in the DSA as ILFs (internal logical files), as no user functionality is involved:

Translating source codes to DWH codes The buffer function for matching the frequency of data deliveries (per day as

opposed to per week or per month)

Adding the period to which the data relate Adding meta data relating to processing (e.g. source of the data)

When does a file actually have to be counted?

When a user report is generated directly from the data in the DSA.

If the latter (reporting) is the reason for counting the data in the DSA as ILFs (Internal LogicalFiles), along with associated EIs (External Inputs), the reports in question should, of course,also be counted as External Output functions of the system.

To summarise: in the case of an interface using (flat) files, the imported tables are not to be

counted as ILFs in the DSA.

Ad 2. A DIRECT INTERFACE BETWEEN STA AND THE SOURCE DATABASE

A direct interface allows IT tools to be used to give one database environment (in this case,the DWH) direct access to data in another database environment (in this case, a sourcesystem). In this way the source data remain in the source system no transfer takes place and the data are merely made available to the Data Warehouse.

When a direct interface is used each interfaced logical file in the source system is countedas an EIF (External Interface File). Count these available data groups as EIFs for the DataWarehouse.

For better understanding the counting result, report them as counted within the DSA.

3.1 Counting guidelines for the Data Staging Area

When importing data using (flat) files, the export functions from the other source systemshould be counted as EOs (External Output functions) for that source system.

The imported data are not counted as ILF within the DSA. The data groups and functions arenot counted till later, in the next component of the Data Warehouse.

When importing data using a direct interface with a logical data file in the source system,

each file is counted as an EIF (External Interface File).

3.2 Requirements design document for the Data Staging Area

None.


11/25


11

4 The Star Schemas of the Data Warehouse

This section follows the philosophy that the processing of data that takes place in thiscomponent of the Data Warehouse is determined primarily by the companys own corporate

rules. It is only in the next step, described in the next chapter, that data are processedaccording to requirements for the production of reports for departments, applications orpolicy/corporate management functions.

The Data Warehouses Star Schemas store historical data in a standardised manner inaccordance with a range of views. Two important types of data group can be discerned: theFacts (the basic events for which reports are wanted) and their Dimensions (the views,groupings by which the Facts will be reported), together making up the Star Schema(s) ofthe DWH. The tables in this component of the Data Warehouse are often referred to as theData Warehouse or theStar Schema. Facts are the core element of the star schema, whilethe dimensions are its rays. This component of the Data Warehouse usually consists of anumber of Star Schemas, in which as many dimensions as possible are re-used.

First, the data about the Dimensions for a specified period of time must be processed beforethe Facts for the same or earlier periods can be attached to the Dimensions.

DIMENSIONSWe use the structure of anorganisation as an illustration of thecounting dilemma that Dimensionsconfront us with. First, we discusshow the number of ILFs isdetermined, and then the number ofEIs for each Dimension.

Suppose that in analysinginformation, relationships aredescribed, i.e. those betweennatural persons (internal employeesand external relations), legal entitiesand organisational entities, bothinside and outside ones ownorganisation. The organisationalstructure is described as anorganisational entity which may ormay not be part of anotherorganisational entity or legal entity (non-mandatory recursive relationship also known as ahierarchical relation). Natural persons may belong to a legal entity or an organisational unit.In the case of natural persons who do not belong to ones own organisation it is necessary toknow if they are resident in the Netherlands or not. Only legal entities established in theNetherlands should be registered with the Chamber of Commerce. Internal employeesalways have a relationship with an internal organisational unit.

The designer of the source system will model this in detail leading to a numerous ILFs andEIs. The designer of the Data Warehouse has to simply this into a structure which is easierto query and navigate by users within while reporting. He may opt to translate this structureinto a single Dimension, relationships, using corporate rules for the relationships within the

Dimension and (optional or mandatory) attributes.

Supplementary information 1:Why are there Star Schemas in the Data Warehouse?

Without delving into data warehousing theoretical concepts,there are terms which are important to understand whencounting a data warehouse: facts and dimensions.

In the world of data transactions the database designerrelies on principles such as the normalisation of data, inwhich each relevant real event will result in just a singlechange to the data and which is the basis of many groups ofdata relevant to the user. The guiding principle for thedesigner of a Data Warehouse should be to minimise theeffort needed to carry out a query, given the expected usageof the data. A normalised data model is not entirelyappropriate for this purpose: when requests for managementinformation have to be answered in a normalisedenvironment, the relationships between many data groupshave to be recreated repeatedly. The designer of the DataWarehouse tries to avoid that situation by simplifying thelayout into a star schema: a table of facts, along with tablesof dimensions in which all relationships are summarised.


12/25


12

Does this mean there is just one ILF? Or as many as in the source system? Or as many ILFsas there are possible layers in the hierarchy of the dimension relationships? And how manyEI functions must be distinguished?

The guideline is as follows. For a dimension, count 1 ILF. Then, in order to determine the

number of input functions (EIs), determine how many record types can be distinguishedwithin the Dimension: Examine the number of levels in the Dimension, examine whetherlevels are handled differently (differences in the handling of attributes, for example), and setout the chosen options in the counting report.

Where such information about the levelsis not or not yet available, assume 3record element types (one for the highestlayer, one for the lowest layer and one forall the layers in between) if the system isknow to be hierarchical - otherwiseassume 1 record element type.

Now that the number of record elementtypes of the ILF for the Dimension isknown, the number of EIs can bedetermined:

1 EI for the addition of newoccurrences of the record-type

No EI for the processing ofchanges. A new occurrence isadded. Closing the validity ispart of the same logical unit of

processing and in exceptional cases only:

1 EI for the deletion of data(restricting the validity of anevent which, up until thatmoment, was valid inperpetuity).

So, for each record element type of the ILF for the Dimension, it is usual to count 1 EI.

FACTS OF A STAR SCHEMAFacts must be linked to Dimensions by functions. It will be straightforward to count the Facts,the core elements of Star Schemas. Facts are added to the Data Warehouse, never

changed or deleted.Count 1 ILF and 1 EI for each table of facts.4

MULTIPLE SYSTEMS AS SOURCESData Warehouses have multiple business systems as sources. Data for a specific file canalso be drawn from multiple business systems. Usually the processing by each businesssystem of a data file will be logically different from the processing by another. Therefore, foran ILF, 1 EI will be counted for each source.

4 The number of Dimensions to be linked is an obvious measure of the complexity of the function, as is the number of

attributes to be calculated. An estimated count does not take these into consideration.

Supplementary information 2:History in the Data Warehouse

In the Data Warehouse history is kept by registration ofthe period in which an record was valid. The situation atany given time is found in a record which has a startand end of validity either side of that time. The last(actual) situation can be identified by examining therecord with a validity till eternity.

When new information for the dimension is available,records are inserted with a eternal validity starting atinsertion time.

When changesare made to (relevant) attributes, then anew record is also added with perpetual validity (a), butalso (b) the validity of a record that, until that point, wasperpetual is cut off at that point in time (the record isclosed).

Given the purposes of the Data Warehouse, i.e. tomaintain a history, records are never deleted. If it

becomes necessary to delete a record, the validity iscut off (as in b). The latter type of change occurs rarelycompared to the other types.

The above is known as type 2 history. In the case oftype 1 history, only the current value is stored.


13/25


13

4.1 Counting guidelines for the Dimensions of a Star Schema

For each hierarchical dimension: 1 ILF.For each record element type in the hierarchical dimension: 1 EI for the inputting.If the number of record element types in the hierarchical dimension is not known, 1 ILF and 3EIs should be counted.For each non-hierarchical dimension: 1 ILF and 1 EI.Remember that a new EI is needed for each source.Only when deletion is described explicitly: For the relevant record type: 1 additional EI.

4.2 Counting guidelines for the Facts of a Star Schema

For each file containing Facts: 1 ILF en 1 EI.Remember that a new EI is needed for each source.

4.3 Requirements design document for Dimensions and FactsTo simplify counting it is preferred to have some documentation available. Look for thefollowing documentation.

For each dimension: A logical model of the layers and/or data groups in the dimension;

For each layer in the dimension:

the data that is drawn from (refer to the logical data model of the DSA for this purpose) the handling

the functionality to be realised, giving special attention to:o the processing of changes;o whether or not data is being deleted.o whether data is drawn from multiple source systems (when populating the DSA).

For each Fact: the logical data model surrounding the Fact (the star schema), insofar as it is relevant to

the processing the data that is drawn from (refer to the logical data model of the DSA for this purpose)

the processes by which the Facts are extracted.


14/25


15/25


15

5.1 Counting guidelines for the Data Mart Area

If Facts from the Data Warehouse are combined into new Facts (other than by aggregation)or if some of the data can no longer be drawn from the Facts in the Data Warehouse, thecount for each Data Mart should be: 1 ILF and 1 EI.

5.2 Requirements of the design document for the Data Mart Area

To support the ease of counting some documentation will help.

For each Data Mart: the logical data model of the Data Mart;

the data that are drawn from (refer to the data in the DWH for this purpose);

for each operation the functionality to be achieved, with special attention to:o the period of storage.


16/25


16

6 Reports

Reports are normally created using completely different tools from the other functions, withproductivity varying greatly between function points. Because of this variation in productivity

it is desirable to keep the FP counted for reports separately so that those who have toassess the effort required can take account of that difference in productivityThere are a large number of (functionally and technically) different report solutions on themarket. This counting guideline attempts to estimate the underlying functionality withoutregard to such differences.

In many DWH environments we find two types of report, each addressing specific functionswithin the organisation(s):

Fixed or canned reports.

OLAP reports.

Many fixed reports 6 from a Data Warehouse relate to a specific cross-section of the data

available in the Data Marts. As a result many reports of similar format and content arespecified, in many variants. According to the NESMA guidelines, simply counting each of therequired reports as an EOfunction is not right as too manyof the underlying logicalprocesses are the same to allowthe functions to be regarded asunique. But it is also not right toregard all the reports together asone external output function:there are logical data groupswhich sometimes are, and

sometimes are not, shown invariants of the reports. In suchcases, the NESMA guidelines laydown that a number of EOfunctions must be distinguished.

In OLAP environments7 the datato be reported, including all aggregations, are provided to the user via the requested viewsfor his ad hoc analyses. The user puts the views that are of current interest to him into hisreport as columns and rows and then looks for the desired level of detail within that report.All views are available constantly but if they have not been selected for the analysis they aredisplayed to the user as all items in this viewpoint. In the case above, with the cannedreports, there is an enormous number of possible reports, in OLAP environments everypossible report is actually available. The functional specifications in this environment talk ofthe measured data to be displayed (and their derivatives) followed by all possible groupingsof the Facts. In this environment the user sees this as one report and thus one is tempted tocount one EO.

This outlines the counting dilemma for us. In the fixed reports environment we may easilyincline to counting one EO function for each report designed. In the OLAP environment itwould be quite easy to count all the reports together as one EO function only. But thatmeans that the technical environment is determining the functions, rather than the functionsbeing defined by the functionality of the reports!

6E.g. Business Objects or Oracle Discoverer

7E.g. Cognos

Supplementary information 3: Report variants

Staff of the corporate sales department request reports to bedesigned that relate to the numer of sales (1) per week peremployee, (2) per month per team and (3) per year perdepartment. The content is similar for the three reports but for thelevel of detail and it will show either the name of the employee,the team or name of the department.

The user has to specify a selection meaningful to him from theenormous number of possible reports. From the initial report

requirements we identify time and organisation as Dimensions,each with three levels (week, month, year and employee, team,departement). Of the nine possiblecombinations the user is asking for three.

The guidelines suggest to count these three reports as one EO asthe logical proces of each report is the same.


17/25


17

The guidelines therefore advocates the grouping of fixed or canned reports into reportgroups based on a number of selection criteria, usually levels in dimensions, which makesthe functionality counted in the two reporting environments the same.

6.1 Counting guidelines for Reports

Group the reports required based on similar facts and layout but for selections withindimensions and count one EO function for each report group.

6.2 Requirements of the design document for Reports

To support the ease of counting, look for design documentation describing, for each reportgroup:

the reports that belong to that group;

one model or a model of each report; a logical data model of the data in the report;

the dimensions used in the DWH;

the Data Marts used; (if applicable) the Facts used in the DWH; the required functionality, including: selections and choices of the end user.

Here, too, it is important to state whether the report group in question already exists or not.


18/25


18

7 Other elements of a Data Warehouse environment

7.1 Cleansing

As already stated, usually no data is deleted from a DWH by Data Warehouse functions. Theavailable history may be limited by data cleansing functions, which limit the history in theStar Schema to (for instance) 25 months and the history in the Data Marts to (for instance)61 months. Such cleansing functions are normally incorporated into control of the datawarehouse, or a result from physical re-use of parts of the database system (the lattermeaning that data are physically overwritten with new data after, say, 24 months).

Count the cleansing function at the level of functions (i.e. do not count one input function foreach ILF).

7.2 Meta Data relating to the logistic process

The data used to manage the Data Warehouse may be, for example: dates on which aprocess has added data to a table from a source; the number of records added, amended orrejected at that time; or the parameters used for that processing operation. The processes tobe developed must read and edit these Meta data.

These functions can not be identified by users and must not be counted. However, thecontrol mechanism must be created when the Data Warehouse is set up. For the purposesof counting this functionality, we regard the administrators as users and count in accordancewith the standard NESMA guidelines.

7.3 Meta Data relating to the meaning of the dataBusiness metadata

Describing the meaning of data is of great importance in any environment, particularly so inthe case of the Data Warehouse.

These functions and the data required for them are not counted when estimating the projectsize. The assumption made is that describing the meaning is a business activity and that thedescription in question will be captured in a tool that has been set up on a once-only basisfor the Data Warehouse.

For the purpose of counting this functionality, the administration and presentation of the

business meta data, the end-users of the DWH should be regarded, as users and countingshould be carried out in accordance with the standard NESMA guidelines.

7.4 Importing FPA tables

If tables containing only codes and descriptions are imported from sources in order to be putinto the DWH tables or to transform the input received - doesnt that look very similar to thetransfer of so-called FPA tables? Do we need to these imports? If so, do we count just oncefor the whole DWH or once for each source system?


19/25


19

Count one 1 FPA table ILF for the whole Data Warehouse, plus the associated EI, EQ andEO function. If, when there is a subsequent increment of the Data Warehouse, a further FPAtable in a source system has to be accessed, we count one FPA-table ILF, one EI, one EQand one EO function.

REQUIREMENTS FOR THE DESIGN OF FPA TABLESFor each source: the code tables to be imported.


20/25


20

8 Alternative architectures

Data warehousing practice is constantly changing. We hope that the model offered here willbe easy to apply to your own practical situation. We will mention here a number of

alternative architectures or components and suggest a counting guideline.

8.1 By-passing the Staging Area

Data can be put directly into the Dimensions or Facts of a Star Schema. A DSA may notrequired or desired.

The count should be performed as for Facts and Dimensions.

8.2 Reports from the Star Schema

Reports can be based directly on the data in the Star Schema (without using the DataMarts).

The count should be performed as for reports.

8.3 Multiple data groups in a single file delivery

The frequency of delivery of data to Data Warehouses is increasing. The volume of eachdata delivery is consequently declining. Taking this to the extreme we find a situation inwhich the Data Warehouse is populated from real-time message queues. To enhance theefficiency of the process a single message may contain two or more data groups. The

functionality must place the data in the correct manner into a number of logical data groups.For example, a file for a fuel expenses claim: vehicle and owner, driver and registrationnumber, plus the number of litres purchased at each refuelling, the cost and the odometerreading. For such files XML is a good way of structuring the content flexibly and marking it.An XML file thus often contains more than one data group.

The count should be performed in the same way as for Facts and Dimensions, for each datagroup that results from the file, regardless of the frequency of delivery.

8.4 Operational Data Store - ODS

We are familiar with the use of an operational data store (ODS). Into the ODS a selection ofthe data from several transactional systems is copied, each item of data being replicatedone-to-one in a different environment. The reports are made available from that environmentvia the queues, action lists and signal lists current at that moment 8. The primary motivationfor doing this is integration of data residing in the transactional systems and to relieve thetransactional systems of the burden imposed by demanding reports.

When the data in the ODS is only a copy of the source, count it as a EF.Only where data is integrated a new ILF is created to hold the keys in the source, areference to the source, the algorithm used to create the relation, including a reference tothe corporate identification, when available. Count that ILF and count a EF as mentionedabove for each source/administration as well as 1 EI for the algorithm.

8 The ODS contains up-to-date information. The DWH contains history and is only current up to a certain point in time.


21/25


21

If an ODS is available in the environment of the Data Warehouse and the required data areavailable there, then a direct link to the ODS, instead of to the source system, would seeman obvious solution from the point of view both of relieving the burden on the transactionalsystem andof ease of accessibility.

One should regard the ODS as one of the sources of the DWH and count its functionality asdescribed in chapter 4.

8.4.1 Counting guidelines for the ODS

When data in the ODS is only a copy of the source, count it as a EF. When data is integratedand stored, count that ILF and count a 1 EF and 1 EI for each source/administration that isintegrated.

8.5 The federated Data Warehouse

Some organisations have a number of Data Warehouses existing in parallel and serving assource systems for one another. Usually one Data Warehouse supplying data will be seenas a source system by another.

There will probably be a direct interface from the DWH to the other component and the countshould be based on one EIF for each file interfaced from the ODS.

8.6 Inmon

As indicated in section 2.4 (Count for each DWH Component an Architecture Choice), thecore of the DWH may have been set up using a relational model. In that situation you must

take note of the following: Count an EI per record type of an ILF if a different logical handling is described for it; Count an EI for each other logical handling operation for the inputting of logical data

files;Expect different EIs for the same ILF for each source. Usually the handling operation islogically different for each source.

8.7 BDWM

As an example of an alternative structure in which the core of the Data Warehouse consistsof a (very highly) normalised model instead of a Star we would mention the Banking Data

Warehouse Model, IBMs model for a data warehouse for banks. When implementing thismodel sometimes the logical model is translated directly into a technical structure. As aresult of that approach data it becomes hard to see the difference between new data andnew descriptions of data. How to find the new data only? To illustrate, we will look atrecording of the item The job of the person with the name Rob is that of writer. Thetechnical recording can be represented as follows:

1. There is a table in which the existence of persons is recorded (BDWM: involvedparty). A new entry is created. The attribute Forename is given the value Rob.

2. There is a generic table structure containing possible values for domains. A newdomain, (for jobs), and its possible values (like writer) is described. Nothingchanges in these table but for new occurrences.


22/25


22

3. There is a generic table in which the relationship between various tables, includingthe one for persons and the aforementioned table, can be made with the descriptiveattribute the job of a person. A new entry, referring to the new person and his job, isthus created.

4. In a table the value for the relation in point 3 will refer to the involved party rob, the

relation the job of a person and the value writer.

The tables in 2. and 3. describe new possibilities, the table in 4. holds the new information.The item The colour of the eyes of the person with the name Rob is brown could be storedin the same way, in exactly the same tables. The value brown must appear in the generictable that contains the wide variety of possible values for colours. In the third tabledescribed above an occurrence would be created referring to involved parties and coloursto hold the eye colour of a person. The fourth will actually hold the information. If Robs jobchanges then only the event in the third table changes. If a job history is required this isachieved by using validity periods in that table. Is the information on job positions notrelevant any more, the relations period is limited.

As a result, the presentation of the logical design of the Data Warehouse contains anenumeration of the identified relationships, classifications etc. The function point analystsdilemma is now plain: what are the logical data files that should be maintained? Should eachattribute be counted as a logical data file? How to reconstruct the logical information modelfrom the technical implementation? Its hidden in the occurrences! Sure it is a flexibleapproach.

We suggest not to count each attribute described as logical data file, including a function, butto do one additional logical step: group information with similar identifying keys. job andeye-colour are relations between an involved Party (a identifying key) and a domain areeasily grouped into one logical data file. Rob has a son Bert will have two identifying keysand will be a separate logical data file. Refer to the FPA documentation of the NESMA ongrouping of data.

To summarize, for strongly normalized meta-data driven environments: do not count the structures for meta data

study the data to be inserted in the meta data looking for the identifying relations group based on these identifying relations count for each group 1 ILF and 1 EI for the logic to maintain it.


23/25


23

Appendix A. Data Warehousing Architecture Reference Model

The figure above shows the flow of data from source to user and a distribution of functionsacross architecture:

1. The left-hand layer in the diagram (Source Layer) relates to systems that supply theinformation or contain the organisations primary operational process. From thatlayer, data is made available to the second layer in the model.

2. In the Data Layer, firstly the data is imported (into the DSA), then modelled accordingto the companys corporate business rules (product definition, organisationalstructure, market channels) and stored in the Data Warehouse. Only after that aredata refined to make them suitable for use for specific queries from departments ordecision-supporting systems or for statistical analyses in preparation for making the

data available to the users. They are then stored in Data Marts.3. In the final two layers at the right-hand side the data are used for reports or

applications and subsequent presentation.

The leftmost and rightmost layers are outside the scope of this counting guideline for DWH.

Between those two layers a range of groups of functions can be identified: Populating the second layer is often done in two stages:

1. collecting the data into the Data Staging Area (See 3)2. integrating the data into the Data Warehouse (See 4)

Similarly, preparing the data for the users is done in two stages:3. aggregating and extracting data for the purpose of the applications up to Data

Marts (See 5)4. working the data up into reports (See 6)

The guideline for the ODS is stated in 8.4Operational Data Store - ODS.


24/25


24

Appendix B. Summary Functions and Files in the DWH

Staging

data

3 EIs

EI

EOs

EOs

Upload

file

Upload

file

Fact

Dimension

Dimension

EI

Uploadfile

Fact

Dimension

Dimension

EI

Star

Data Mart

Entity

Entity

Entity

Entity

Data Mart

Dimension

Fact

Dimension

Dimension Dimension

EIF

Data

Sources

Data

Staging Area

Data

Ware House

Data

MartReports

ILF

ILF

ILF

ILF

Staging

data

Staging

data

Staging

data

Data

store

1 ILF

EI

Technical

copy

5 ILFs

3 ILFs

Application

boundary

DWH

Relational DWH

Component of the DWH Summary of the counting guidelines

The Data Staging Area(see Guideline 3)

The imported data are not counted as ILF within the DSA. The datagroups and functions are not counted till later, in the next component ofthe Data Warehouse.

The Star Schemas of theData Warehouse(see Guideline 4)

For each hierarchical dimension: 1 ILF.For each record element type in the hierarchical dimension: 1 EI for theinputting.If the number of record element types in the hierarchical dimension is notknown, 1 ILF and 3 EIs should be counted.For each non-hierarchical dimension: 1 ILF and 1 EI.Remember that a new EI is needed for each source.Only when deletion is described explicitly: For the relevant record type: 1additional EI.

The Data Marts(see Guideline 5)

If Facts from the Data Warehouse are combined into new Facts (otherthan by aggregation) or if some of the data can no longer be drawn fromthe Facts in the Data Warehouse, the count for each Data Mart should be:1 ILF and 1 EI.

Reports(see Guideline 6)

Group the reports required based on similar facts and layout but forselections within dimensions and count one EO function for each reportgroup.

ODS(see Guideline 8.4)

When data in the ODS is only a copy of the source, count it as a EF.When data is integrated and stored, count that ILF and count a 1 EF and 1

EI for each source/administration that is integrated.


25/25


Appendix C. Requirements basic design document relating to function point counts

To enable efficient counting documentation is a preferred source. It should containinformation listed below.

Component of the DWH Summary of the requirements

The Data Staging Area None, however, given the follow-up to the process, it is importantto have a logical data model of the DSA.

The Star Schemas of theData Warehouse

For each dimension:

A logical model of the layers and/or data groups in thedimension;

For each layer in the dimension:

o the data that are drawn from (refer to the logical datamodel of the DSA for this purposes)

o handling; the functionality to be achieved, giving

special attention to:o handling of the changes;

o whether or not data are deleted.

o whether data are drawn from multiple source systems(when populating the DSA).

For each Fact:

the logical data model surrounding the Fact (the Star Schema)insofar as it is relevant to the processing;

the data that are drawn from (refer to the logical data model ofthe DSA for this purpose);

the processes by which the Facts are extracted.The Data Marts For each Data Mart:

the logical data model of the Data Mart;

the data that are drawn from (refer to the data in the DWH forthis purpose);

for each operation; the functionality to be achieved, givingspecial attention to:

the period of retention/storage.

Reports For each report group:

the reports that belong to that group;

one model or a model of each report;

a logical data model of the data in the report;

the Dimensions used in the DWH;

the Data Marts used;

(if applicable) the Facts used in the DWH;

the required functionality, including:

selections and choices made by the end user.

Documents

FPA for Data Warehousing