54
Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems Jan Janke Software Engineer CERN / GS-AIS October 25 - 29, 2010 JINR/CERN Grid and Management Information Systems

Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

  • Upload
    etenia

  • View
    38

  • Download
    1

Embed Size (px)

DESCRIPTION

Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems. Jan Janke Software Engineer CERN / GS-AIS. October 25 - 29, 2010 JINR/CERN Grid and Management Information Systems. Agenda. Data Warehouses in Administrative Computing - PowerPoint PPT Presentation

Citation preview

Page 1: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Data Warehouses and Analytical Data Processing

in CERN’s Administrative Decision Making Support SystemsJan Janke

Software EngineerCERN / GS-AIS

October 25 - 29, 2010JINR/CERN Grid and Management Information Systems

Page 2: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 2

Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS

◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards

Detailed Data Warehouse Example◦ Management Data Layer (MDL)

Agenda

Page 3: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 3

Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS

◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards

Detailed Data Warehouse Example◦ Management Data Layer (MDL)

Agenda

Page 4: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 4

Ca. 16,000 People

Page 5: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 5

Mankind’s Largest Machine

Page 6: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 6

Enormous Amount of Data

Page 7: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 7

Provides means to administrate CERN Enables physicists to focus on their work Allows management to make the right moves

Administrative Computing

Page 8: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 8

Heterogeneous computing landscape Various specialised OLTP systems Planning needs Legal Requirements

Why Data Warehouses?

Support administrative staff Enforce security and safety on site Allow management to make decisions

Page 9: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 9

Specialised Systems◦ Accounting, ERP for CERN stores◦ External contracts management◦ Payroll, treasury management, …

Example: Keep Finances Under Control

Specialised small user groups

Distinct databases

High availabilityand performance,real-time data

Systems only accessible to authorised specialists

Page 10: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 10

General Financial Information System◦ Single system◦ Access to data from multiple sources◦ Different levels of complexity

Example: Keep Finances Under Control

Specialised small user groups

Distinct databases

High availabilityand performance,real-time data

Systems only accessible to authorised specialists

Page 11: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 11

General Financial Information System◦ Single system◦ Access to data from multiple sources◦ Different levels of complexity

Example: Keep Finances Under Control

Users from all areas of CERN

Single data warehouse

High availabilityand performance,but no necessity for real-time data

Security is extremely important! System is accessible CERN wide.

Page 12: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 12

Keep data in sync with data providers Master complex data extraction process Ensure high query performance Base for detailed data analysis

AIS’ Financial Data Warehouse

Technologies:o ORACLE RAC databaseo Java Enterprise web applicationso In-house developed frameworkso Third-party BI and reporting tools

Page 13: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 13

Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS

◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards

Detailed Data Warehouse Example◦ Management Data Layer (MDL)

Agenda

Page 14: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 14

Find the Needle in the Hay …

Page 15: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 15

OLTP OLAPData source Operations OLTP (consolidated)

Data purpose Run the business Reporting, analysis

Inserts, updates High Periodic batch jobs

Query complexity Low High

DB design Normalized Star, snowflake

Availability Critical Less critical

Target Operational staff Middle/higher Mgmt.

OLTP vs OLAP

Page 16: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 16

OLTP OLAPData source Operations OLTP (consolidated)

Data purpose Run the business Reporting, analysis

Inserts, updates High Periodic batch jobs

Query complexity Low Depends …

DB design Normalized Snowflake and others

Availability Critical May be very critical

Target Operational staff Mgmt. + Operations

OLTP vs OLAP

That’s theory!

Real world is not that easy…

Page 17: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 17

1NF◦ 1 table = 1 relation, no repeating groups or duplicate rows

2NF◦ All non prime attributes depend on

all parts (attributes) of a composite key 3NF

◦ All non prime attributes depend only on the (whole) key

Normalisation (Codd/Boyce)

Course Category Winner OriginMonaco ‘10 Formula 1 M. Webber AustraliaJapan ‘10 Formula 1 S. Vettel GermanyJapan ‘10 Rally S. Ogier France

Not in 3NF, why ?

Page 18: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 18

Star Schema

Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010)

time_keyitem_key

branch_keylocation_keyunits_sold

dollars_soldavg_sales

Measures

branch_keybranch_namebranch_type

Branch

Sales Fact Tabletime_keydayday_of_the_weekmonthquarteryear

time item_keyitem_namebrandtypesupplier_type

item

locationlocation_keyStreetcitystate_or_provincecountry

Page 19: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 19

Snowflake Schema

Source: http://www.executionmih.com/data-warehouse/star-snowflake-schema.php (16/10/2010)

city_keycitystate_or_provincecountry

city

time_keyitem_key

branch_keylocation_keyunits_sold

dollars_soldavg_sales

Measures

branch_keybranch_namebranch_type

Branch

Sales Fact Tabletime_keydayday_of_the_weekmonthquarteryear

time item_keyitem_namebrandtypesupplier_key

item

locationlocation_keystreetcity_key

supplier_keySupplier_type

supplier

Page 20: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 20

From Operations to Reporting

Source: http://www.deakin.edu.au/ddw/what-is.php (16/10/2010)

ERP

FI

HR

Page 21: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 21

Data Mining Drilldown

◦ Finer detail granularity (e.g. add a group-by column) Slice & dice

◦ Play with the dimensions Combine different dimensions Remove/add a dimension Analyse fact changes

Analysis

Page 22: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 22

Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS

◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards

Detailed Data Warehouse Example◦ Management Data Layer (MDL)

Agenda

Page 23: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 23

CERN/AIS Business Map

Page 24: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 24

Common data layer for various AIS services Data interfaces for other CERN services Common applications (e.g. mgmt. of roles)

Foundation

HR Information System (HRT)

FI Information System (CET)

… more domain specific information systems

Operative systems

Page 25: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 25

ORACLE HR CERN Training Application Safety & access systems EDH (Electronic Document Handling) Accounting Application ERP system for CERN stores Contract follow-up …

Various Specialised Systems

Page 26: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 26

Source databases:◦ ORACLE 10g◦ Microsoft Excel

HR/FI Information Systems:◦ ORACLE 10g◦ Java Enterprise web applications◦ SAP Business Objects tool family

Technical Environment

Page 27: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 27

Nightly scheduled batch jobs Extractions organised in SQL scripts Run by self-developed “batch runner”

◦ Controls Order of execution (sequential, parallel) Criticality Logging Problem escalation (automatic emails)

Data Extractions

Page 28: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 28

Definition of Extraction Process (1)

General definitions

Page 29: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 29

Definition of Extraction Process (2)

Batches & commands

Page 30: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 30

Importance of Monitoring

New hardware for DEV databases (gain > 1h)

Page 31: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 31

Turtle or Leopard ?

The difference may be subtle …

Page 32: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 32

Pre-aggregated summaries Benefit from query rewrite

ORACLE Materialised Views

Source: ORACLE 10g Documentation / Data Warehousing Guide

Page 33: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 33

Don’t use remote tables if you need query rewrite Create materialized view log on all source tables

Materialised (Summary) Views

Page 34: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 34

Use snapshots to efficiently access remote tables◦ Syntax: CREATE SNAPSHOT … AS [Your Query]◦ Refresh options:

FAST COMPLETE FORCE

Snapshots

Page 35: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 35

PL/SQL is data source instead of a table May increase performance in environments with

heavy PL/SQL use

Pipelined Functions

CREATE OR REPLACE TYPE myTableFormat AS OBJECT( col_a NUMBER, col_b DATE, col_c VARCHAR2(25) )/

CREATE OR REPLACE TYPE myTableType   AS TABLE OF myTableFormat/

1

Page 36: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 36

Pipelined FunctionsCREATE OR REPLACE FUNCTION myFunc RETURN myTableType PIPELINED IS BEGIN  FOR i in 1 .. 5    LOOP      PIPE ROW ( myTableFormat( i, SYSDATE+i, 'Row '||i ) ); END LOOP;    RETURN;  END;END;/

2

Page 37: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 37

Pipelined Functions

SELECT * FROM TABLE( myFunc() );

col_a col_b col_c--------- ---------- ----------1 27/10/2010 Row 12 28/10/2010 Row 23 29/10/2010 Row 3 4 30/10/2010 Row 45 31/10/2010 Row 5

3

Use a pipelined function if you require a data source other than a table!

Page 38: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 38

Star schema like Highly de-normalised incl. duplication of data Use single-attribute keys wherever possible Performance matters!

◦ Be careful when extracting over database links◦ Certain tables from operational systems are copied◦ Deletion & recreation of indexes◦ Use partitions◦ Manual control of statistics collection◦ Optimizing execution plans very time-consuming

Database Design

Page 39: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 39

Column and ordering selection Sub reports Various output formats (e.g. HTML, PDF) Charts Self-service reporting Automated scheduled report execution Row and column based access control

Reporting Application Framework

Page 40: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 40

Data access

Name Unit Tel Salary Category

Meyer A 12345 $ 4,900 3Schmidt B 23456 $ 6,400 1Cook B 34567 $ 5,700 2

Which rows are visible to me? Unit leader of B only sees persons from Unit B.

Which data (columns) am I allowed to see? As a supervisor I may not be entitled to see the health insurance category. A safety or medical officer may not see the salary, etc.

Page 41: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 41

User Interface

Page 42: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 42

Use of Apache FOP library◦ Examples:

Employment & training attestations Swiss / French card application forms

Business Objects XI Enterprise◦ Direct use◦ Indirect use via Business Objects Java SDK◦ Examples:

Salary slips Car stickers Work orders

Pixel Perfect Forms

Page 43: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 43

Commercial tool family from SAP Advantages

◦ Rich reporting possibilities (interactive or via SDK)◦ Appealing dashboards using Xcelsius◦ Only a few users need the knowledge to design reports

Drawbacks◦ Two-way data storage (file system & database)◦ Sometimes stability problems◦ Time-intensive administration and maintenance◦ Expensive

Business Objects

Page 44: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 44

Management Dashboards

Designed locally using MS Office and Xcelsius.

Data comes from the MDL data warehouse.

Published as Flash to the BO Server.

Page 45: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 45

Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS

◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards

Detailed Data Warehouse Example◦ Management Data Layer (MDL)

Agenda

Page 46: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 46

KPI data warehouse Very extensible Fixed generic schema Feeds management dashboards

Management Data Layer (MDL)

Performance: Currently ca. 170 GB data in two tables

Generality: Different forms of data sources, new sources are added and removed all the time.

Integration with existing tools and development frameworks (ORACLE, Excel, BO, …)

Page 47: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 47

MDL Data Model

MDL_HEADERS MDL_DIMENSIONS

MDL_VALUES

MDL_RAW_DATA

MDL_SUMMARY_DATA

MDL_LOOKUP_INFO

MDL_LOOKUP_DATA

n

n

n

n n

ndescribes

describes

Page 48: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 48

MDL Data Model

MDL_HEADERS MDL_DIMENSIONS

MDL_VALUES

MDL_RAW_DATA

MDL_SUMMARY_DATA

MDL_LOOKUP_INFO

MDL_LOOKUP_DATA

n

n

n

n n

ndescribes

describes

Page 49: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 49

MDL Data Model

MDL_HEADERS MDL_DIMENSIONS

MDL_VALUES

MDL_RAW_DATA

MDL_SUMMARY_DATA

MDL_LOOKUP_INFO

MDL_LOOKUP_DATA

n

n

n

n n

ndescribes

describes

Page 50: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 50

MDL Data Model

MDL_HEADERS MDL_DIMENSIONS

MDL_VALUES

MDL_RAW_DATA

MDL_SUMMARY_DATA

MDL_LOOKUP_INFO

MDL_LOOKUP_DATA

n

n

n

n n

ndescribes

describes

Page 51: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 51

Fact Table Partitioning

… 2008Hash

Partitioning

Data Set 1

Data Set 2

Data Set n

2009Hash

Partitioning

Data Set 1

Data Set 2

Data Set n

2010Hash

Partitioning

Data Set 1

Data Set 2

Data Set n

Range Partitioning

Page 52: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 52

Keep it simple Redesign / add data source if required Use partitions and indexes

Query optimisation

SELECT dimension1, dimension3, sum( value2)FROM mdl_raw_dataWHERE data_id = 45 AND value_date > 20100000GROUP BY dimension1, dimension2ORDER BY 1, 2;

Page 53: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 53

High data volumes + analysis = data warehouse OLTP vs. OLAP Use the facilities the tool provides

◦ Materialized views, snapshots, pipelined functions Keep things extensible and simple! Partitions are very helpful

Remember:

Page 54: Data Warehouses and Analytical Data Processing in CERN’s Administrative Decision Making Support Systems

Jan Janke: "Data Warehouses and Analytical Data Processing ..." 54

Data Warehouses in Administrative Computing Recap: Data Warehouses Theory Data Warehouses and Information Systems in AIS

◦ Foundation, HR and FI Information Systems◦ Complex Data Extraction Processes◦ Pixel-Perfect Reporting◦ Dashboards

Detailed Data Warehouse Example◦ Management Data Layer (MDL)

Thank You!