20
APPLICATION OF COLUMNSTORE AND IN-MEMORY TECHNOLOGIES IN BUSINESS INTELLIGENCE PROJECTS Mads Brink Hansen Principal Consultant, Rehfeld Partners A/S Ekstern Lektor, Aarhus Universitet [email protected] / [email protected]

Mads Brink Hansen Rehfeld Partners

Embed Size (px)

DESCRIPTION

Application of Columnstore and In-Memory Technologies in Business Intelligence Projects Oplæg fra InfinIT temadag d. 28. oktober 2014: Modern Analytical Database Technology

Citation preview

Page 1: Mads Brink Hansen Rehfeld Partners

APPLICATION OF COLUMNSTORE

AND IN-MEMORY TECHNOLOGIES

IN BUSINESS INTELLIGENCE

PROJECTS

Mads Brink HansenPrincipal Consultant, Rehfeld Partners A/S

Ekstern Lektor, Aarhus Universitet

[email protected] / [email protected]

Page 2: Mads Brink Hansen Rehfeld Partners

03-11-2014

What is Business Intelligence

Data Information Analytics Knowledge Wisdom

Page 3: Mads Brink Hansen Rehfeld Partners

03-11-2014

The traditional Kimball Lifecycle

Business Requirements

Definition

BI App. DesignBI App.

Development

Dimensional Model Design

Physical Design

Technical Architecture

Design

Product Selection and

Installation

ETL Design & Development

Technical Deployment

Program / Project Planning

Growth

Maintenance

Business Development

Technical Development

Other roles

Organizational Deployment

Page 4: Mads Brink Hansen Rehfeld Partners

03-11-2014

BI Arhitecture

Page 5: Mads Brink Hansen Rehfeld Partners

• [Too long] Time-to-Market

• [Too poor] Data Quality

• [Too poor] User Adoption

• Lack of goals

• Complex technical solutions

03-11-2014

So Why does BI Fail?

Page 6: Mads Brink Hansen Rehfeld Partners

• Changed and New Processes

• Better training and/or practices

• New Technology

03-11-2014

What to do about it?

Page 7: Mads Brink Hansen Rehfeld Partners

03-11-2014

Agile BI [still Kimball]

Customer Representatives

Business Requirements

Definition

ETL Design & Development

Dimensional Model Design/

Development

BI App. Design/

Development

Business Requirements

Definition

Program / Project Planning

Technical Architecture

Design

Technical (Architecture) Deployment

Organizational Deployment

Sprint 0..a Sprint 1..n

Back-log

User Stories

Architeture Model

Developer StoriesCustomer

Representatives

Bus Architecture Matrix

Page 8: Mads Brink Hansen Rehfeld Partners

03-11-2014

Application of ColumnStore and

In-Memory technologies

In-Memory during development

In-Memory for Analysis

In-Memory for new oppertunities

ColumnStore for Analysis

Page 9: Mads Brink Hansen Rehfeld Partners

• Data Audit in the Kimball Lifecycle

– Data Identification

– Data Profiling

• Descriptive statistics for all relevant columns

– Insigth into data

• Verification of Business-logic

• Input for ETL-coding

03-11-2014

In-Memory during development 1#3

Page 10: Mads Brink Hansen Rehfeld Partners

• Traditional Approach

– Write and execute a Query for all relevant columns

→ Table Scan

→ Static Analysis

• Table-scan in a Oracle-database 4-5h/200 m. records → Data Profiling is very often not performed → Data QualityRisks

03-11-2014

In-Memory during development 2#3

Page 11: Mads Brink Hansen Rehfeld Partners

• Using in-memory engine

– Off-load the relevant table[s] to the in-

memory engine

→ Table scan on source Database

→ Dynamic Analysis [Pivot-table and Graphical]

• Off-load is performed during off-hours –analysis is done the next day → BetterData Quality

03-11-2014

In-Memory during development 3#3

Page 12: Mads Brink Hansen Rehfeld Partners

• Performing Analytics on large volumes

of data

03-11-2014

In-Memory for Analysis

Data Volume

In-Memory

Client

In-Memory

Server

ColumnStore

Server

Data Volumes

Limited by

Client-memory

Data Volumes

Limited by

Server-memory

Data Volumes

can exceed

Server-memory

[eg. SSD-memory-

Extension]10 mill. Records (10 GB)

After Load to Excel In-Memory: 2.9GB RAM

Response time (Count Distinct) < 1s

Page 13: Mads Brink Hansen Rehfeld Partners

• Client or Server Analysis

• Server

+ Larger Data Volumes

- Fixed Data Model

• Client

- Data Volumes limited by local RAM

+ Flexible Data Model → Self Service BI

03-11-2014

In-Memory for Analysis

Page 14: Mads Brink Hansen Rehfeld Partners

11/3/2014

Self Service BI

Personal BI

Team BIEnterprise BI

Non-Enterprise BI-data

Page 15: Mads Brink Hansen Rehfeld Partners

• Analysis in relational databases

– Existing tools

– Special tools for relational analysis

+ Data Volumes can exceed RAM [and still

perform using eg. SSD-disks og SSD-based

RAM-extension]

+ Flexible [within the Database]

- [Database] External data?

03-11-2014

ColumnStore for Analysis

Page 16: Mads Brink Hansen Rehfeld Partners

• The traditional Dimensional Model is

great – but has some limitations

– Requires PK-FK-relationships in the

database

→ issues in special situations

• Inventory [Periodic Snapshot Facts]

• Market Basket [non-key relationship for filtering]

03-11-2014

In-Memory for new oppertunities

Page 17: Mads Brink Hansen Rehfeld Partners

• Example – Stock-at-hand

03-11-2014

Inventory 1#4

Page 18: Mads Brink Hansen Rehfeld Partners

• Star-schema data model

• Problem – a record must exist for all products [and other dimensions] at all dates

03-11-2014

Inventory 2#4

Page 19: Mads Brink Hansen Rehfeld Partners

• The ETL-process or Query must ”fill

the gaps”

03-11-2014

Inventory 3#4

Page 20: Mads Brink Hansen Rehfeld Partners

• What does it take to make the calculation”on the fly”

– The transaction table must be scanned and a running total calculated

– Scanning a table in a traditional database takes too much time to be performed ”on the fly”

– The same scanning can be feasible in a In-Memory-table

The In-Memory technology provides a leaner, more efficient and elegant solution

03-11-2014

Inventory 4#4