16
(1) Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of CS 2016 – 2017 Data Mining

Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

(1)

Asso. Profe. Dr. Raed Ibraheem Hamed

University of Human Development,

College of Science and Technology

Department of CS

2016 – 2017

Data Mining

Page 2: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Points to Cover

Problem: Heterogeneous Information Sources

Data integration

Extract, Transform, Load (ETL)

Data Warehouse With The ETL Process

Problem: Data Management in Large Enterprises

Goals of Data Integration

Current Solutions

Three-layer Architecture: Conceptual View

Generic Warehouse Architecture

1-2Department of CS - DM - UHD

Page 3: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 3

Problem: Heterogeneous Information Sources

“Heterogeneities are everywhere”

Different interfaces

Different data representations

Duplicate and inconsistent information

PersonalDatabases

Digital Libraries

Scientific DatabasesWorldWideWeb

Page 4: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

4

• Data integration involves combining data residing

in different sources and providing users with a

unified view of these data.

Data integration

• This process becomes significant in a variety of

situations, which include both commercial (when

two similar companies need to merge their databases)

and scientific research results.

Department of CS - DM - UHD

Page 5: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 5

Goal: Unified Access to Data

Integration System

Collects and combines information

Provides integrated view, uniform user interface

Supports sharing among different applications

WorldWideWeb

Digital Libraries Scientific Databases

Personal

Databases

Page 6: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

6

Goals of Data Integration

Provide

1) Uniform (same query interface to all sources)

2) Access to (updates the database)

3) Multiple (we want many users at each time)

4) Heterogeneous (data models are different)

5) Distributed (over LAN, WAN, Internet)

6) Data Sources (not only databases).

Department of CS - DM - UHD

Page 7: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

7

Extract, Transform, Load (ETL) refers to a process

in database.

• Data extraction is where data is extracted from

heterogeneous data sources;

• Data transformation where the data is transformed

for storing in the proper format or structure.

• Data loading where the data is loaded into the final

target more specifically, an operational data

store, data mart, or data warehouse.

Extract, Transform, Load (ETL)

Department of CS - DM - UHD

Page 8: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

8

A simple schematic for a data warehouse with

the ETL process extracts information from the source

databases, transforms it and then loads it into the data

warehouse.

Data Warehouse With The ETL Process

Department of CS - DM - UHD

Page 9: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

9

Extract all the data into a single data source

Data Warehouse Extract all the data into a single data source

Data Warehouse

Query

DataSources

Clean the data

Load Periodically

Department of CS - DM - UHD

Page 10: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 10

Data Warehouse Architectures: Conceptual View

Single-layer

Every data element is stored once only

Virtual warehouse: is another term for a data

warehouse.

“Real-time data”

Operational

systems

Informational

systems

Page 11: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 11

Derived Data

Real-time data

Operational

systems

Informational

systems

Two-layerReal-time + derived data Most commonly used approach

in industry today

Data Warehouse Architectures: Conceptual View

Page 12: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 12

Three-layer Architecture: Conceptual View

Transformation of real-time data to derived

data really requires reconciliation step

Derived Data

Real-time data

Operational

systems

Informational

systems

Reconciliation

Reconciliation is the process

of ensuring that two sets of

records are in agreement.

Example:- Reconciliation is

used to ensure that the money

leaving an account matches the

actual money spent.

Page 13: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 13

Issues in Data Warehousing

Warehouse Design

Extraction

Integration

Cleansing & merging

Warehousing specification & Maintenance

Optimizations

Evolution

Page 14: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of CS - DM - UHD 14

Generic Warehouse Architecture

Extractor/

Monitor

Extractor/

Monitor

Extractor/

Monitor

Integration

Warehouse

Client Client

Design Phase

Maintenance

Loading

...

Metadata

Query & Analysis

Page 15: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

15

Problems with DW Approach

Data has to be cleaned – different formats

Needs to store all the data from different data sourcesin single system

Data needs to be updated periodically

Data sources are independent– content can change withoutnotice

Expensive because of the large quantities of data and datacleaning costs

Department of CS - DM - UHD

Page 16: Data Mining - raedh.weebly.com · Web. 4 • Data integration involves combining data residing in different sources and providing users with a unified view of these data. Data integration

Department of IT- DMDW - UHD Department of CS - DM - UHD 16