36
City of Charlotte Data Warehousing and Business Intelligence and Building Mashups By Example by Rattapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock by Doris Phillips 6010 Data Integration UNC Charlotte

City of Charlotte Data Warehousing and Business Intelligence and Building Mashups By Example by Rattapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock

Embed Size (px)

Citation preview

City of Charlotte Data Warehousing and Business Intelligence

and Building Mashups By Example by

Rattapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock

by Doris Phillips6010 Data Integration

UNC Charlotte

Business Decision Making

ORG’sORG’sWhich?

(Raper, J., Building a Data Warehouse in a Heterogeneous Tool Environment)

Overview

• Describe current data warehousing/business intelligence methods at the City of Charlotte

• Present Building Mashups By Example byRattapoom Tuchinda, Pedro Szekely, and Craig A. Knoblock

• Compare data warehousing techniques to data integration methods for mashups

3

Focus on Processes Common to Data Warehousing and Data Integration

• Data Retrieval • Data Modeling• Data Cleaning• Data Integration

Common processes but not the same!

4

Data Warehousing at the City of Charlotte

5

DISCLAIMER

The views and opinions presented in this paper are solely those of the Author and do not necessarily reflect those of Business Support Services Information Technology Division or of the City of Charlotte. This material is provided for informational purposes only. City of Charlotte assumes no responsibility for accuracy of the information in this paper or from damages caused by implemented the techniques or methodologies presented herein.

City of CharlotteSimplified KBU Organization Chart

City Manager

Mayorand

City Council

Airport

Police

Fire

Roads, Streets,Bridges,

Intersections

Bus and Rail

Utilities

NeighborhoodDevelopment

Engineering andProperty

ManagementSolid Waste

HumanResources

Offices andCommissions

Business SupportServices

Budget

Finance

(Raper, J., Building a Data Warehouse in a Heterogeneous Tool Environment)

Sample Data Sources

• Accounts Payable• Accounts Receivable• Asset Center• Emerald• Faster• General Ledger

• Kronos• Hansen• PeopleSoft• Remedy• Unisys Helpdesk• Utility Billing System

HUB and Spoke DesignDB2

DB

MS/ SSDB

ORA

DB

FILES

UTLORA

MART

SWSMS/SSMART

FINMS/SSMART

OWB

OWB

OWB

OWB

OWB

DTS

DTS

SOURCES TARGET DATA MARTS

DVODS

DATA VAULT

Oracle

ETL

OWB

(Raper, J., Building a Data Warehouse in a Heterogeneous Tool Environment)

10

Data Vault Components• Hubs - Key Topics• Links – Relationships• Satellites – Details• Auxiliaries – Support• Implemented as

Relational Tables• Diagramed Using

Distinctive Shapes

HUB_XXXX

SAT_XXXX_YYYYYY

AUX_XXXX

Hub Record Shape

Auxiliary Record Shape

Satellite Record Shape

Link Record ShapeLNK_XXXX

(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets Oracle Warehouse Builder 10.2)

11

Hub Tables• Hub records contain the logical business and

physical keys to the business data and its context.

• Relatively Stable over Time

• Keys– Primary Key – Surrogate ID– Natural Key – Unique Logical Key Fields

(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets Oracle Warehouse Builder 10.2)

12

Link Tables• Link records provide information related to relationships

between Hubs and Links. • The information in link records can and does change over

time. • May have one or more Satellite Type records• Keys

– Primary Key – Surrogate ID– Natural Key – Unique Logical Key Fields– Foreign Key – HUB or LNK SIDs

(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets Oracle Warehouse Builder 10.2)

13

Satellite Tables• Satellite records provide the structure to hold the

context or descriptive type information from operational systems.

• Maintain these changes over time. • Related Directly to Hub or Link Records• Keys

– Primary Key – Surrogate ID– Foreign Key – HUB or LNK SIDs

(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets Oracle Warehouse Builder 10.2)

14

Auxiliary Tables• Auxiliary records contain a variety of cross reference and lookup

descriptions tied to logical business keys. • Standalone Support Tables Not Directly Linked to HUB, SAT, or LNK

tables.• Types of Auxiliaries

– Lookups– Cross References– External Tables– ETL Work Tables

• Keys– Primary Key is NK – Unique Logical Key Fields– May have FK

(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets Oracle Warehouse Builder 10.2)

15

Generic Data Vault Schema

HUB

SAT

AUX

HUB

SAT

SAT

SAT

SATSAT

SAT

AUXAUX

LNK

(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets Oracle Warehouse Builder 10.2)

16

Oracle Warehouse Builder MappingNew Code Capture

(Raper, J., Phillips, D., User-Managed Metadata…)

17

Oracle Warehouse Builder Mapping New Code Capture Detail

(Raper, J., Phillips, D., User-Managed Metadata…)

Data Cleaning

• Convert to UPPER case• Trim blank spaces• Replace NULL within UNK or UNKNOWN• Check for valid values using lookups• Remove duplicates

18

19

Metadata Capture - Cross Reference Organization Codes APEX Application

(Raper, J., Phillips, D., User-Managed Metadata…)

Data Warehouse Approach

• Extracted data from multiple heterogeneous sources

• Converted to Data Vault Architecture • Cleaned data and transformed into desired

format• Combined data from multiple sources• Data provided to users for reporting and data

visualizations

20

Building Mashups By Example

byRattapoom Tuchinda, Pedro Szekely,

and Craig A. Knoblock

21

Overview

• Mashup: A web application that integrates data from multiple web sources to provide a unique service

• Goal: Create a mashup building framework where an average Internet user with no programming experience can build Mashups easily

22

Current Solutions

• Widget Paradigm– Current Solutions involve selecting, customizing,

and connecting widgets together

• Disadvantages– As number of widgets gets large, locating the right

widget becomes confusing and time consuming – Connecting widgets required understanding

programming concepts

23

Widgets – Yahoo Pipes

24(Tuchinda, et. al, p. 140)

Microsoft PopflyUnited States Information Widget - EDIT

25

United States Information Widget - RUN

Mashup Building Process• Data Retrieval

– Extracting data from web pages into a structured source (table or XML)

• Source Modeling – Process of assigning the attribute name for each data

column• Data Cleaning

– Required to fix misspellings and transform extracted data into the appropriate format

• Data Integration– Specifies how to combine two or more data sources

together

26

Karma Solution

27

“The left window is an embedded web browser. The top right window contains a table that a user would interact with. The lower right window shows options that the user can select to get into different modes of operation.” (Tuchinda, et. al, p. 140)

Data Retrieval – Karma TableUser Selects Value from Page, List Automatically Populated

User Selects Address for Value, List Automatically Populated

28(Tuchinda, et. al, p. 140-141)

Source Modeling - Attributes

Karma Attributes• Karma compares extracted

data with existing data in its repository

• Automatically populates some attributes

• User specifying the correct attribute

• Users search existing attributes in data repository

Example

29

Data Cleaning

• Users specify what data to clean

• Karma tries to pick up desired transformations and populate the remaining columns

30

Data Integration

Karma• Analyzes attributes and

data to determine possible join conditions

• Suggests existing data sources in the repository that can be linked to the new data in the table

31

Data Integration

Karma Problems• Locating the related

sources from the repository

• Figuring out the query to combine the new source and existing valid sources

32

Data Integration

Karma Solution• Uses table constraints • Uses programming

methods and procedures induced from user interaction

33

Mashup Building Approach

• Combines most problem areas in Mashup building into a unified interactive framework that requires no widgets

• Allows users with no programming background to easily create Mashups by example

34

Conclusions

Data Warehouse• Data from enterprise

applications and maintained sources

• Historical data for trending and analysis

• Extract, Transform, and Load

• Often Strategic information

Data Integration / Mashup• Data from web that may

not be maintained or may contain errors

• Real time data for current information

• Extract, Model, and Clean• Often tactical information

35

Resources• Linstedt, D. http://www.danlinstedt.com/AboutDV.php. Nov.

9, 2008.• Linstedt, D., Garziano, K., Hultgren, H., May 2008. The New

Business Supermodel: The Business of Data Vault Modeling.• Raper, Jim., Dec. 2004. Building a Data Warehouse in a

Heterogeneous Tool Environment. • Raper, Jim., Dec. 2004. From Source to Loading Dock with

Oracle Warehouse Builder• Raper, Jim., Phillips, Doris., 2006. User-Managed Metadata:

Oracle Application Express Meets Oracle Warehouse Builder 10.2

• Tuchinda, R., Szekely, Pl., Knoblock, C., 2008. ACM., Building Mashups by Example.

36