17
Data Warehousing Data Warehousing 1 Data Warehousing Data Warehousing Lecture-24 Lecture-24 Need for Speed: Parallelism Need for Speed: Parallelism Virtual University of Virtual University of Pakistan Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]

Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Embed Size (px)

Citation preview

Page 1: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

11

Data Warehousing Data Warehousing Lecture-24Lecture-24

Need for Speed: ParallelismNeed for Speed: Parallelism

Virtual University of PakistanVirtual University of Pakistan

Ahsan AbdullahAssoc. Prof. & Head

Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp

National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]

Page 2: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

22

BackgroundBackground

Page 3: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

33

When to parallelize?When to parallelize?

Useful for operations that access significant amounts Useful for operations that access significant amounts of data. of data.

Useful for operations that can be implemented Useful for operations that can be implemented independent of each other “Divide-&-Conquer”independent of each other “Divide-&-Conquer”

Parallel execution improves processing for:Parallel execution improves processing for:

Large table scans and joins Large table scans and joins Creation of large indexes Creation of large indexes Partitioned index scans Partitioned index scans Bulk inserts, updates, and deletes Bulk inserts, updates, and deletes Aggregations and copying Aggregations and copying

SizeSize

SizeSize

D&CD&C

SizeSize

D&CD&C

Page 4: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

44

Are you ready to parallelize?Are you ready to parallelize?Parallelism can be exploited, if there is…Parallelism can be exploited, if there is…

Symmetric multi-processors (SMP), clusters, or Massively Symmetric multi-processors (SMP), clusters, or Massively Parallel (MPP) systems Parallel (MPP) systems ANDAND

Sufficient I/O bandwidth Sufficient I/O bandwidth ANDAND

Underutilized or intermittently used CPUs (for example, Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%) systems where CPU usage is typically less than 30%) ANDAND

Sufficient memory to support additional memory-intensive Sufficient memory to support additional memory-intensive

processes such as sorts, hashing, and I/O buffersprocesses such as sorts, hashing, and I/O buffers

Word of cautionWord of caution

Parallelism can reduce system performance on over-utilized systems or systems with small I/O bandwidth.

Page 5: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

55

Scalability – Size is NOT everythingScalability – Size is NOT everything

Number of Concurrent Number of Concurrent UsersUsers

• Simple table retrieval

• Moderate complexity Join

• Propensity analysis

• Clustering

Complexity of TechniqueComplexity of Technique

• Hash based

• B-Tree

• Multiple

• Bitmapped

Index usageIndex usage

Amount of detailed dataAmount of detailed data

Complexity of Data ModelComplexity of Data Model

Page 6: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

66

Scalability- Speed-Up & Scale-UpScalability- Speed-Up & Scale-Up

Speed-UpSpeed-UpMore resources meansproportionally less timefor given amount of data.

Scale-UpScale-UpIf resources increased inproportion to increase indata size, time is constant.

Degree of Parallelism

Tra

nsac

tions

/Sec

Degree of Parallelism

Sec

s/T

rans

actio

n

Ideal

Ideal

Real

Page 7: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

77

Quantifying Speed-upQuantifying Speed-up

Sequential Execution IdealIdeal Parallel Execution

18 time units 6 time units

Task-1 Task-2 Task-3

Control work (“overhead”)

Speedup = 18 = 300% 6

Ts: Time on serial processor Tm: Time on multiple processors

Speedup = Ts

Tm

Tas

k-1

Tas

k-2

Tas

k-3

Page 8: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

88

Speed-Up & Amdahl’s LawSpeed-Up & Amdahl’s LawReveals maximum expected speedup from parallel algorithms given the proportion of task that must be computed sequentially. It gives the speedup S as

ff is the fraction of the problem that must be computed sequentiallyNN is the number of processors

As ff approaches 0, SS approaches NN

Example-1: f f = 5% and NN = 100 then SS = 16.8

Example-2: f f = 10% and NN = 200 then SS = 9.56

Not1:1

Ratio

Only formula and explanation will go to graphics

Page 9: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

99

Amdahl’s Law: Limits of parallelizationAmdahl’s Law: Limits of parallelization

For less than 80% parallelism, the speedup drastically drops.

At 90% parallelism, 128128 processors give performance of less than 1010 processors.

1

2

3

4

5

6

7

8

9

10

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

% sequential code (f)

Sp

eed

up

(S

)

N=2 N=4 N=8 N=16 N=32 N=64 N=128

Page 10: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1010

Parallelization OLTP Vs. DSSParallelization OLTP Vs. DSS

There is a big difference.

DSSParallelization of a SINGLE query

OLTPParallelization of MULTIPLE queriesOr Batch updates in parallel

Page 11: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1111

Brief Intro to Parallel ProcessingBrief Intro to Parallel Processing Parallel Hardware ArchitecturesParallel Hardware Architectures

Symmetric Multi Processing (SMP)Symmetric Multi Processing (SMP) Distributed Memory or Massively Parallel Processing (MPP)Distributed Memory or Massively Parallel Processing (MPP) Non-uniform Memory Access (NUMA)Non-uniform Memory Access (NUMA)

Parallel Software ArchitecturesParallel Software Architectures Shared MemoryShared Memory Shard DiskShard Disk Shared NothingShared Nothing

Types of parallelismTypes of parallelism Data ParallelismData Parallelism Spatial ParallelismSpatial Parallelism

Shared Everything

Page 12: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1212

Symmetrical Multi Processing (SMP)Symmetrical Multi Processing (SMP) A number of independent I/O and number of A number of independent I/O and number of

processors all sharing access to a single large memory processors all sharing access to a single large memory space.space.

Main Memory

I/O I/O I/O P1 P2 P3 P4

Typically each CPU executes its job independently.Typically each CPU executes its job independently. Supports both Multi-Tasking and Parallel Processing.Supports both Multi-Tasking and Parallel Processing. Have to deal with issues such as Cache Coherence, Processor Have to deal with issues such as Cache Coherence, Processor

Affinity and Hot Spots.Affinity and Hot Spots.

Yellow will not go to graphics

Page 13: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1313

Composed of a number of self-contained, self-controlled nodes Composed of a number of self-contained, self-controlled nodes connected through a network interface. connected through a network interface.

Each node contains its own CPU, processor, memory and I/O.Each node contains its own CPU, processor, memory and I/O.

Architecture better known as Massively Parallel Processing (MPP) Architecture better known as Massively Parallel Processing (MPP) or cluster computing.or cluster computing.

Memory is distributed across all nodes.Memory is distributed across all nodes.

Distributed Memory MachinesDistributed Memory Machines

Bus, Switch or Network

I/O P

Memory

I/O P

Memory

I/O P

Memory

Network has the tendency to become the bottleneck.Network has the tendency to become the bottleneck.

Issues fundamentally different from those in SMP. Issues fundamentally different from those in SMP. Yellow will not go to graphics

Node

Page 14: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1414

A little bit of both worlds !A little bit of both worlds !

Distributed Shared Memory MachinesDistributed Shared Memory Machines

Inte

rco

nn

ecti

on

N

etw

ork

Main Memory

I/O I/O I/O P1 P2 P3 P4

Main Memory

I/O I/O I/O P1 P2 P3 P4

Main Memory

I/O I/O I/O P1 P2 P3 P4

Main Memory

I/O I/O I/O P1 P2 P3 P4

Page 15: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1515

Shared disk RDBMS ArchitectureShared disk RDBMS Architecture

Clients/Users

Shared DiskInterconnect

AdvHigh level of fault tolerance

Dis AdvSerialization due to locking

Interconnect can become a bottleneck

Yellow ill not go to graphics

Page 16: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1616

Shared Nothing RDBMS ArchitectureShared Nothing RDBMS Architecture

Clients/Users

AdvData ownership changes infrequently

There is no locking

Dis AdvData availability low on failure

Very careful with data distribution

Redistribution is expensive

Yellow ill not go to graphics

Page 17: Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics

Data WarehousingData Warehousing

1717

Shared disk Vs. Shared Nothing RDBMSShared disk Vs. Shared Nothing RDBMS

Important note:Important note: Do not confuse RDBMS Do not confuse RDBMS architecture with hardware architecture.architecture with hardware architecture.

Shared nothing databases can run on shared Shared nothing databases can run on shared everything (SMP or NUMA) hardware. everything (SMP or NUMA) hardware.

Shared disk databases can run on shared Shared disk databases can run on shared nothing (MPP) hardware.nothing (MPP) hardware.

This slide will not go to graphics