Chapter 01-pdc Cloud Computing

11Copyright © 2012, Elsevier Inc. All rights reserved. 1 - 1 - 11

Distributed and Cloud ComputingDistributed and Cloud Computing

Enabling Technologies Enabling Technologiesand Distributed System Modelsand Distributed System Models

22

Centralized computingCentralized computing Parallel computingParallel computing Distributed computingDistributed computing Concurrent computing/programmingConcurrent computing/programming

33

Centralized computingCentralized computing Computer resources (processors, memory, and Computer resources (processors, memory, and

storage) are fully shared and tightly coupled storage) are fully shared and tightly coupled within one integrated OSwithin one integrated OS

All computer resources are centralized in one All computer resources are centralized in one physical systemphysical system

Scalability and cost issues under heavy load of Scalability and cost issues under heavy load of user requests at global scaleuser requests at global scale

44

Parallel computingParallel computing All processors are either tightly coupled with All processors are either tightly coupled with

centralized shared memory or loosely coupled centralized shared memory or loosely coupled with distributed memorywith distributed memory

Also called parallel processingAlso called parallel processing Interprocessor communication is accomplished Interprocessor communication is accomplished

through shared memory or via message passingthrough shared memory or via message passing A computer system capable of parallel A computer system capable of parallel

computing is commonly known as a parallel computing is commonly known as a parallel computercomputer

Programs running in a parallel computer are Programs running in a parallel computer are called parallel programscalled parallel programs

55

Distributed computingDistributed computing A field of computer science /engineering that A field of computer science /engineering that

studies distributed systemsstudies distributed systems A distributed system consists of multiple A distributed system consists of multiple

autonomous computers, each having its own autonomous computers, each having its own private memory, communicating through a private memory, communicating through a computer networkcomputer network

A distributed system is a collection of A distributed system is a collection of independent entities that cooperate to solve a independent entities that cooperate to solve a problem that cannot be individually solvedproblem that cannot be individually solved

A collection of independent computers that A collection of independent computers that appears to the users of the system as a single appears to the users of the system as a single coherent computer coherent computer

66

Concurrent computing/programmingConcurrent computing/programming Refer to the union of parallel computing and distributing Refer to the union of parallel computing and distributing

computing Biased practitioners may interpret them computing Biased practitioners may interpret them differentlydifferently

Ubiquitous computing Ubiquitous computing refers to computing with refers to computing with pervasive devices at any place and time using wired or pervasive devices at any place and time using wired or wireless communicationwireless communication

The Internet of Things (IoT) The Internet of Things (IoT) is a networked connection is a networked connection of everyday objects including computers, sensors, of everyday objects including computers, sensors, humans, ...humans, ...

The IoT is supported by Internet clouds to achieve The IoT is supported by Internet clouds to achieve ubiquitous computing with any object at any place and ubiquitous computing with any object at any place and timetime

The term The term Internet computing Internet computing is even broader and covers is even broader and covers all computing paradigms over the Internetall computing paradigms over the Internet


Data Deluge Enabling New Challenges

(Courtesy of Judy Qiu, Indiana University, 2011)

88

From Desktop/HPC/Grids to From Desktop/HPC/Grids to Internet Clouds in 30 YearsInternet Clouds in 30 Years

HPC moving from centralized supercomputers to geographically distributed desktops, clusters, and grids to clouds over last 30 years

R/D efforts on HPC, clusters, Grids, P2P, and virtual machines has laid the foundation of cloud computing that has been greatly advocated since 2007

Location of computing infrastructure in areas with lower costs in hardware, software, datasets, space, and power requirements – moving from desktop computing to datacenter-based clouds


Interactions among 4 technical challenges : Data Deluge, Cloud Technology, eScience,

and Multicore/Parallel Computing

(Courtesy of Judy Qiu, Indiana University, 2011)

1010

Data DelugeData Deluge

Refers to situation where the Sheer Volume of Refers to situation where the Sheer Volume of New Data being generated is overwhelming the New Data being generated is overwhelming the capacity of institution to manage it and capacity of institution to manage it and researchers to make use of it researchers to make use of it

1111

eScienceeScience

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid.

1212

VideoVideo

1313


Clouds and Internet of ThingsClouds and Internet of ThingsHPC: High-Performance Computing

HTC: High-Throughput Computing

P2P: Peer to Peer

MPP: Massively Parallel ProcessorsSource: K. Hwang, G. Fox, and J. Dongarra,

Distributed and Cloud Computing, Morgan Kaufmann, 2012.

1515

Computing Paradigm DistinctionsComputing Paradigm Distinctions Centralized ComputingCentralized Computing

All computer resources are centralized in one physical system.All computer resources are centralized in one physical system. Parallel ComputingParallel Computing

All processors are either tightly coupled with central shard All processors are either tightly coupled with central shard memory or loosely coupled with distributed memorymemory or loosely coupled with distributed memory

Distributed ComputingDistributed Computing Field of CS/CE that studies distributed systems. A distributed Field of CS/CE that studies distributed systems. A distributed

system consists of multiple autonomous computers, each with system consists of multiple autonomous computers, each with its own private memory, communicating over a network.its own private memory, communicating over a network.

Cloud ComputingCloud Computing An Internet cloud of resources that may be either centralized or An Internet cloud of resources that may be either centralized or

decentralized. The cloud apples to parallel or distributed decentralized. The cloud apples to parallel or distributed computing or both. Clouds may be built from physical or computing or both. Clouds may be built from physical or virtualized resources.virtualized resources.

1616

Technology Convergence toward HPC for Science and HTC for Business: Utility Computing

(Courtesy of Raj Buyya, University of Melbourne, 2011)

Copyright © 2012, Elsevier Inc. All rights reserved.

1717

2011 Gartner “IT Hype Cycle” for Emerging Technologies2011 Gartner “IT Hype Cycle” for Emerging Technologies

2007

2008

2009

2010

2011


1818

33 year Improvement in Processor and Network 33 year Improvement in Processor and Network TechnologiesTechnologies

Technologies for Network-based Systems

1919

Multi-threading ProcessorsMulti-threading Processors Four-issue Four-issue superscalarsuperscalar (e.g. Sun Ultrasparc I) (e.g. Sun Ultrasparc I)

Implements instruction level parallelism (ILP) within a single Implements instruction level parallelism (ILP) within a single processor. processor.

Executes more than one instruction during a clock cycle by Executes more than one instruction during a clock cycle by sending multiple instructions to redundant functional units. sending multiple instructions to redundant functional units.

Fine-grain multithreaded processorFine-grain multithreaded processor Switch threads after each cycleSwitch threads after each cycle Interleave instruction executionInterleave instruction execution If one thread stalls, others are executedIf one thread stalls, others are executed

Coarse-grain multithreaded processorCoarse-grain multithreaded processor Executes a single thread until it reaches certain situationsExecutes a single thread until it reaches certain situations

Simultaneous multithread processor (SMT)Simultaneous multithread processor (SMT) Instructions from more than one thread can execute in any Instructions from more than one thread can execute in any

given pipeline stage at a time. given pipeline stage at a time.

2020

5 Micro-architectures of CPUs5 Micro-architectures of CPUs

Each row represents the issue slots for a single execution cycle: •A filled box indicates that the processor found an instruction to execute in that issue slot on that cycle;•An empty box denotes an unused slot.


33 year Improvement in Memory and Disk 33 year Improvement in Memory and Disk TechnologiesTechnologies


Architecture of A Many-Core Architecture of A Many-Core Multiprocessor GPU interacting Multiprocessor GPU interacting

with a CPU Processorwith a CPU Processor

2323

Interconnection NetworksInterconnection Networks

• SAN (storage area network) - connects servers with disk arrays• LAN (local area network) – connects clients, hosts, and servers• NAS (network attached storage) – connects clients with large storage

systems


Datacenter and Server Cost DistributionDatacenter and Server Cost Distribution

2525

Virtual MachinesVirtual Machines Eliminate real machine constraintEliminate real machine constraint

Increases portability and flexibilityIncreases portability and flexibility Virtual machine adds software to a physical Virtual machine adds software to a physical

machine to give it the appearance of a different machine to give it the appearance of a different platform or multiple platforms. platform or multiple platforms.

BenefitsBenefits Cross platform compatibilityCross platform compatibility Increase SecurityIncrease Security Enhance PerformanceEnhance Performance Simplify software migrationSimplify software migration

2626

Initial Hardware ModelInitial Hardware Model All applications access hardware resources (i.e. All applications access hardware resources (i.e.

memory, i/o) through system calls to operating memory, i/o) through system calls to operating system (privileged instructions)system (privileged instructions)

AdvantagesAdvantages Design is decoupled (i.e. OS people can develop Design is decoupled (i.e. OS people can develop

OS separate of Hardware people developing OS separate of Hardware people developing hardware)hardware)

Hardware and software can be upgraded without Hardware and software can be upgraded without notifying the Application programsnotifying the Application programs

DisadvantageDisadvantage Application compiled on one ISA will not run on Application compiled on one ISA will not run on

another ISA.. another ISA.. Applications compiled for Mac use different Applications compiled for Mac use different

operating system calls then application designed operating system calls then application designed for windows.for windows.

ISA’s must support old softwareISA’s must support old software Can often be inhibiting in terms of performanceCan often be inhibiting in terms of performance

Since software is developed separately from Since software is developed separately from hardware… Software is not necessarily optimized hardware… Software is not necessarily optimized for hardware.for hardware.

2727

Virtual Machine BasicsVirtual Machine Basics Virtual software placed Virtual software placed

between underlying machine between underlying machine and conventional softwareand conventional software Conventional software sees Conventional software sees

different ISA from the one different ISA from the one supported by the hardwaresupported by the hardware

Virtualization process Virtualization process involves:involves: Mapping of virtual resources Mapping of virtual resources

(registers and memory) to (registers and memory) to real hardware resourcesreal hardware resources

Using real machine Using real machine instructions to carry out the instructions to carry out the actions specified by the actions specified by the virtual machine instructionsvirtual machine instructions

2828

Three VM ArchitecturesThree VM Architectures

2929

System Models for Distributed and Cloud System Models for Distributed and Cloud ComputingComputing


A Typical Cluster ArchitectureA Typical Cluster Architecture

3131

Computational or Data GridComputational or Data Grid


A Typical Computational GridA Typical Computational Grid

3333

Peer-to-Peer (P2P) NetworkPeer-to-Peer (P2P) Network A distributed A distributed system architecture Each computer in the network can act as a client or Each computer in the network can act as a client or

server for other netwpork computers.server for other netwpork computers. No centralized controlNo centralized control Typically many nodes, but unreliable and Typically many nodes, but unreliable and

heterogeneousheterogeneous Nodes are symmetric in functionNodes are symmetric in function Take advantage of distributed, shared resources Take advantage of distributed, shared resources

(bandwidth, CPU, storage) on peer-nodes(bandwidth, CPU, storage) on peer-nodes Fault-tolerant, self-organizingFault-tolerant, self-organizing Operate in dynamic environment, frequent join and Operate in dynamic environment, frequent join and

leave is the normleave is the norm

3434

Peer-to-Peer (P2P) NetworkPeer-to-Peer (P2P) Network

Overlay network - computer network built on top of another network. •Nodes in the overlay can be thought of as being connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network. •For example, distributed systems such as cloud computing, peer-to-peer networks, and client-server applications are overlay networks because their nodes run on top of the Internet.


3636

The Cloud Historical roots in todayHistorical roots in today’’s s

Internet appsInternet apps Search, email, social networksSearch, email, social networks File storage (Live Mesh, MobileFile storage (Live Mesh, Mobile

Me, Flicker, …)Me, Flicker, …) A cloud infrastructure provides a A cloud infrastructure provides a

framework to manage scalable, reliable, framework to manage scalable, reliable, on-demand access to applicationson-demand access to applications

A cloud is the A cloud is the ““invisibleinvisible”” backend to backend to many of our mobile applicationsmany of our mobile applications

A model of computation and data A model of computation and data storage based on storage based on ““pay as you gopay as you go”” access to access to ““unlimitedunlimited”” remote data center remote data center capabilitiescapabilities



Basic Concept of Internet CloudsBasic Concept of Internet Clouds• Cloud computing is the use of computing resources (hardware and

software) that are delivered as a service over a network (typically the Internet).

• The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams.

• Cloud computing entrusts remote services with a user's data, software and computation.

3838

Service-oriented architecture (SOA)Service-oriented architecture (SOA)

SOA is an evolution of distributed computing based on the request/reply design paradigm for synchronous and asynchronous applications.

An application's business logic or individual functions are modularized and presented as services for consumer/client applications.

Key to these services - their loosely coupled nature; i.e., the service interface is independent of the implementation.

Application developers or system integrators can build applications by composing one or more services without knowing the services' underlying implementations. For example, a service can be implemented either in .Net or

J2EE, and the application consuming the service can be on a different platform or language.


Cloud Computing Challenges: Dealing with too many issues (Courtesy of R. Buyya)

Virtualization

QoS

Service Level

Agreements

Resource Metering

Billing

Pricing

Provisioning on DemandUtility & Risk Management

Scalability

Reliability

Energy Efficiency

Security

Privacy

Trust

Legal &Regulatory

Software Eng. Complexity

Programming Env. & Application Dev.


The Internet of ThingsThe Internet of Things (IoT)(IoT)

InternetClouds

Internet of Things

The Internet

Smart Earth

Smart Earth:

An IBM

Dream

4141

Video Video

4242


Opportunities of IoT in 3 DimensionsOpportunities of IoT in 3 Dimensions

(courtesy of Wikipedia, 2010)

4444

Transparent Cloud Computing Environment

Separates user data, application, OS, and space – good for cloud computing.


Parallel and Distributed Programming Parallel and Distributed Programming

4646

Message-Passing Interface (MPI)Message-Passing Interface (MPI) This is the primary programming standard used This is the primary programming standard used

to develop parallel and concurrent programs to to develop parallel and concurrent programs to run on a distributed systemrun on a distributed system

MPI is essentially a library of subprograms that MPI is essentially a library of subprograms that can be called from C or FORTRAN to write can be called from C or FORTRAN to write parallel programs running on a distributed parallel programs running on a distributed systemsystem

The idea is to embody clusters, grid systems, The idea is to embody clusters, grid systems, and P2P systems with upgraded web services and P2P systems with upgraded web services and utility computing applicationsand utility computing applications

4747

MapReduceMapReduce This is a web programming model for scalable data processing on This is a web programming model for scalable data processing on

large clusters over large data setslarge clusters over large data sets The model is applied mainly in web-scale search and cloud The model is applied mainly in web-scale search and cloud

computing applicationscomputing applications The master node specifies a Map function to divide the input into The master node specifies a Map function to divide the input into

subproblemssubproblems Applies a Reduce function to merge all intermediate values with the Applies a Reduce function to merge all intermediate values with the

same intermediate keysame intermediate key Highly scalable to explore high degrees of parallelism at different Highly scalable to explore high degrees of parallelism at different

job levelsjob levels A typical MapReduce computation process can handle terabytes of A typical MapReduce computation process can handle terabytes of

data on tens of thousands or more client machinesdata on tens of thousands or more client machines Thousands of MapReduce jobs are executed on Google’s clusters Thousands of MapReduce jobs are executed on Google’s clusters

every dayevery day

4848

Hadoop LibraryHadoop Library Offers a software platform that was originally developed Offers a software platform that was originally developed

by a Yahoo! groupby a Yahoo! group The package enables users to write and run applications The package enables users to write and run applications

over vast amounts of distributed dataover vast amounts of distributed data Users can easily scale Hadoop to store and process Users can easily scale Hadoop to store and process

petabytes of data in the web spacepetabytes of data in the web space Economical:Economical:Comes with an open source version of Comes with an open source version of

MapReduce that minimizes overhead in task spawning MapReduce that minimizes overhead in task spawning and massive data communicationand massive data communication

Efficient: Efficient: Processes data with a high degree of Processes data with a high degree of parallelism across a large number of commodity nodesparallelism across a large number of commodity nodes

Reliable: Reliable: Automatically keeps multiple data copies to Automatically keeps multiple data copies to facilitate redeployment of computing tasks upon facilitate redeployment of computing tasks upon unexpected system failuresunexpected system failures

4949

VideoVideo

5050

5151

Dimensions of ScalabilityDimensions of Scalability Size – increasing performance by increasing Size – increasing performance by increasing

machine sizemachine size Software – upgrade to OS, libraries, new apps.Software – upgrade to OS, libraries, new apps. Application – matching problem size with Application – matching problem size with

machine sizemachine size Technology – adapting system to new Technology – adapting system to new

technologiestechnologies

5252

Performance metricsPerformance metrics CPU speed: CPU speed: Million Instructions Per Second(MIPS)Million Instructions Per Second(MIPS) Network bandwidth: Network bandwidth: Megabits Per Second (Mbps)Megabits Per Second (Mbps) Distributed systems throughputDistributed systems throughput Tera floating-point operations per second (Tflops) Tera floating-point operations per second (Tflops) Transactions per second (TPS)Transactions per second (TPS)

Job response time and network latencyJob response time and network latency Low latency and high bandwidth interconnection is Low latency and high bandwidth interconnection is

preferredpreferred System overhead is often attributed to OS boot time, System overhead is often attributed to OS boot time,

compile time, I/O data rate, and the runtime support compile time, I/O data rate, and the runtime support systemsystem

QoS for Internet and web services, system availability QoS for Internet and web services, system availability and dependability, securityand dependability, security

5353

Amdahl’s LawAmdahl’s Law Let us suppose a uniprocessor workstation executes a Let us suppose a uniprocessor workstation executes a

given program in time given program in time T T minutesminutes Now the same program is partitioned for parallel Now the same program is partitioned for parallel

execution on a cluster of many nodesexecution on a cluster of many nodes We assume that a fraction We assume that a fraction α α of the code must be of the code must be

executed sequentially. Therefore, executed sequentially. Therefore, (1 − (1 − αα) ) of the code of the code can be compiled for parallel execution by can be compiled for parallel execution by n n processorsprocessors

The total execution time of the program is calculated by:The total execution time of the program is calculated by: ααT T + (1 − + (1 − αα))T T //nn where, the first term is the sequential execution time on where, the first term is the sequential execution time on

a single processor and the second term is the parallel a single processor and the second term is the parallel execution time on execution time on n n processing nodesprocessing nodes

5454

Amdahl’s Law: Speedup factorAmdahl’s Law: Speedup factor The speedup factor of using the n-processor system over the use of The speedup factor of using the n-processor system over the use of

a single processor is expressed by:a single processor is expressed by:

Speedup Speedup = = S= T S= T (old)(old)/T/T(new(new))= = T T //[[ααT T + (1 − + (1 − αα))T T //nn]] = 1= 1//[[α α + (1 − + (1 − αα))//nn]]

The maximum speedup of The maximum speedup of n n is achieved only if the code is fully is achieved only if the code is fully parallelizable with parallelizable with α α = 0= 0

As the cluster becomes sufficiently large, that is, As the cluster becomes sufficiently large, that is, n n → ∞→ ∞, , SS approaches approaches 11/α/α, an upper bound on the speedup , an upper bound on the speedup SS The sequential bottleneck is the portion of the code that cannot be The sequential bottleneck is the portion of the code that cannot be

parallelizedparallelized if if α α = 0= 0..25 → 1 − 25 → 1 − α α = 0= 0..75, Max. speedup=475, Max. speedup=4

5555

Fixed workload system efficiencyFixed workload system efficiency Amdahl’s law assumes the same amount of workload for Amdahl’s law assumes the same amount of workload for

both sequential and parallel execution of the program both sequential and parallel execution of the program with a fixed problem size or data set (fixed-workload with a fixed problem size or data set (fixed-workload speedup)speedup)

To execute a fixed workload on n processors, parallel To execute a fixed workload on n processors, parallel processing may lead to a system efficiency:processing may lead to a system efficiency:

E E = = S S //n n = 1= 1//[[ααn n + 1 + 1 − − αα]]

Low efficiency with large cluster size For Low efficiency with large cluster size For α α = 0= 0..25 and 25 and n n = 256, = 256, E E = 1= 1..5%5%

Low usage of resources: Majority of the processors Low usage of resources: Majority of the processors remain idleremain idle

5656

Gustafson’s LawGustafson’s Law Scaling the problem size to match the cluster capability (scaled-Scaling the problem size to match the cluster capability (scaled-

workload speedup)workload speedup) Let W be the workload in a given programm. When using an n-Let W be the workload in a given programm. When using an n-

processor system, the user scales the workload toprocessor system, the user scales the workload to

W W ′ ′ = = ααW W + (1 − + (1 − αα))nWnW The parallel execution time of a scaled workload W’ on n The parallel execution time of a scaled workload W’ on n

processors is defined by scaled-workload speedupprocessors is defined by scaled-workload speedup

S S ′ ′ = = W W ′(′(new)new)//WW(old)(old)= [= [ααW W + (1 − + (1 − αα))nW nW ]]//WW = = α α + (1 − + (1 − αα))nn Thus efficiency is Thus efficiency is E E ′ ′ = = S S ′′//n n = = α/α/n n + (1 − + (1 − αα) )

For For α α = 0= 0..25 and 25 and n n = 256, = 256, E E = 75%= 75%

5757

A driving metaphor Amdahl's Law approximately suggestsAmdahl's Law approximately suggests Suppose a car is traveling between two cities 60 Suppose a car is traveling between two cities 60

miles apart, and has already spent one hour miles apart, and has already spent one hour traveling half the distance at 30 mph. No matter traveling half the distance at 30 mph. No matter how fast you drive the last half, it is impossible how fast you drive the last half, it is impossible to achieve 90 mph average before reaching the to achieve 90 mph average before reaching the second city. Since it has already taken you 1 second city. Since it has already taken you 1 hour and you only have a distance of 60 miles hour and you only have a distance of 60 miles total; going infinitely fast you would only achieve total; going infinitely fast you would only achieve 60 mph60 mph

5858

A driving metaphor Gustafson's Law approximately states Gustafson's Law approximately states Suppose a car has already been traveling for Suppose a car has already been traveling for

some time at less than 90mph. Given enough some time at less than 90mph. Given enough time and distance to travel, the car's average time and distance to travel, the car's average speed can always eventually reach 90mph, no speed can always eventually reach 90mph, no matter how long or how slowly it has already matter how long or how slowly it has already traveled. For example, if the car spent one hour traveled. For example, if the car spent one hour at 30 mph, it could achieve this by driving at 120 at 30 mph, it could achieve this by driving at 120 mph for two additional hours, or at 150 mph for mph for two additional hours, or at 150 mph for an hour, and so onan hour, and so on