51
An Overview of Cloud Computing: My Other Computer is a Data Center Robert Grossman Open Data Group & University of Illinois at Chicago IEEE New Technologies Conference August 6, 2009

An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)

Embed Size (px)

Citation preview

An Overview of Cloud Computing:My Other Computer is a Data Center

Robert GrossmanOpen Data Group &

University of Illinois at Chicago

IEEE New Technologies ConferenceAugust 6, 2009

Part 1What is a Cloud?

2

What is a Cloud?

3

Software as a Service

What Else is a Cloud?

4

Platform as a Service

Is Anything Else a Cloud?

5

Infrastructure as a Service

Are There Other Types of Clouds?

6

Large Data Cloud Services

ad targeting

One Definition Clouds provide on-demand resources or

services over a network, often the Internet, with the scale and reliability of a data center.

No standard definition. Cloud architectures are not new. What is new:

– Scale– Ease of use– Pricing model.

7

8

Scale is new.

Elastic, Usage Based Pricing Is New

9

1 computer in a rack for 120 hours

120 computers in three racks for 1 hour

costs the same as

Elastic, usage based pricing turns capex into opex. Clouds can be used to manage surges in computing needs.

Simplicity Offered By the Cloud is New

10

+ .. and you have a computer ready to work.

A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.

Part 2Varieties of Clouds

11

Varieties of Clouds Architectural Model

– On-demand computing instances vs large data cloud services

Payment Model– Elastic, usage based pricing,

lease/own, … Management Model

– Private vs Public; Single vs Multiple Tenant; …

Programming Model– Queue Service, MPI,

MapReduce, Distributed UDF12

Computing instances vs large data cloud services

Private internal vspublic external

Elastic, usage-based pricing or not

All combinations occur.

Architectural Models:How Do You Fill a Data Center?

Cloud Storage Services

Cloud Compute Services (MapReduce & Generalizations)

Cloud Data Services (BigTable, etc.)

Quasi-relational Data Services

App App App App App

App App

App App

large data cloud services

App App App…

on-demand computing instances

Payment Models

Buying racks, containers and data centers Leasing racks containers and data centers Utility based computing (pay as you go)

– Moves cap ex to op ex– Handle surge requirements (use 1000 servers for 1

hour vs 1 server for 1000 hours)

14

Management Models

Public, private and hybrid models Single tenant vs multiple tenant (shared vs

non-shared hardware) Owned vs leased Manage yourself vs outsource management All combinations are possible

15

Programming Models

Amazon’s Simple Queue Service

MPI, sockets, FIFO

16

MapReduce Distributed UDF

on-demandcomputing instances

large data cloud services

DryadLINQ Azure services

Part 3. Cloud Computing Industry

“Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions

17

Cloud computing is approaching the top of the Gartner hype cycle.

IaaS, PaaS and SaaS Point of ViewSaaS

PaaS

IaaSInfrastructure as a Service

PRODUCT: Compute power, storage and networking infrastructure over the internet, provided as a virtual machine image

USERS: Developers

Platform as a ServicePRODUCT: storage, compute and other services to simplify application development, especially of web applications.

USERS: Application Developers

Software as a ServicePRODUCT: Finished application available on demand to end user

USERS: Software consumer

Building Data Centers

Sun’s Modular Data Center (MD)

Formerly Project Blackbox

Containers used by Google, Microsoft & others

Data center consists of 10-60+ containers.

19

Data Center Operating Systems

Data center services include: VM management services, business continuity services, security services, power management services, etc.

20

workstation

VM 1 VM 5

…VM 1 VM 50,000

Data Center Operating System

Berkeley View of Cloud Computing

21

Providers of Cloud Services

Consumers of Cloud Services

Providers of Software as a Service

Consumers of Software as a Service

Berkeley Report on cloud computing divides industry into these layers & concentrates on public clouds.

Data Centers

Transition Taking Place A hand full of players are building multiple data

centers a year and improving with each one. This includes Google, Microsoft, Yahoo, … A data center today costs $200 M – $400+ M Berkeley RAD Report points out analogy with

semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B

22

Mindmeister Map of Cloud Computing

Dupont’s Mindmeister Map divides the industry:– IaaS, PaaS, Management, Community

http://www.mindmeister.com/maps/show_public/15936058

23

Part 4

Virtualization

24

Virtualization Virtualization separates logical infrastructure

from the underlying physical resources to decrease time to make changes, improve flexibility, improve utilization and reduce costs

Example - server virtualization. Use one physical server to support multiple logical virtual machines (VMs), which are sometimes called logical partitions.

Technology pioneered by IBM in 1960s to better utilize mainframes

25

Idea Dates Back to the 1960s

26

IBM Mainframe

IBM VM/370

CMS

App

Native (Full) VirtualizationExamples: Vmware ESX

MVS

App

CMS

App

Two Types of Virtualization

Using the hypervisor, each guest OS sees its own independent copy of the CPU, memory, IO, etc.

27

Physical Hardware

Hyperviser

Unmodified Guest OS 1

Unmodified Guest OS 2

Native (Full) VirtualizationExamples: Vmware ESX

Apps

Physical Hardware

Hyperviser

Modified Guest OS 1

Modified Guest OS 2

Para VirtualizationExamples: Xen

Apps

Four Key Properties

1. Partitioning: run multiple VMs on one physical server; one VM doesn’t know about the others

2. Isolation: security isolation is at the hardware level.

3. Encapsulation: entire state of the machine can be copied to files and moved around

4. Hardware abstraction: provision and migrate VM to another server

28

Managing Virtual Machines

Provision VM Schedule VM Monitor VM Self-service portal for VM

29

Large Data Clouds

30

Part 5

The Google Data Stack

The Google File System (2003) MapReduce: Simplified Data Processing… (2004) BigTable: A Distributed Storage System… (2006)

31

Map-Reduce Example

Input is file with one document per record User specifies map function

– key = document URL– Value = terms that document contains

(“doc cdickens”, “it was the best of times”)

“it”, 1“was”, 1“the”, 1“best”, 1

map

Example (cont’d) MapReduce library gathers together all pairs

with the same key value (shuffle/sort phase) The user-defined reduce function combines all

the values associated with the same key

key = “it”values = 1, 1

key = “was”values = 1, 1

key = “best”values = 1key = “worst”values = 1

“it”, 2“was”, 2“best”, 1“worst”, 1reduce

Generalization: Apply User Defined Functions (UDF) to Files in Storage Cloud

34

map/shuffle reduce

UDFUDF

Google’s Layered Cloud Services

Storage Services

Table Services

Compute Services

35

Google’s Stack

Applications

Google File System (GFS)

Google’s MapReduce

Google’s BigTable

Hadoop’s Layered Cloud Services

Storage Services

Table Services

Compute Services

36

Hadoop’s Stack

Applications

Hadoop Distributed File System (HDFS)

Hadoop’s MapReduce

Sector’s Layered Cloud Services

Storage Services

Table Services

Compute Services

37

Sector’s Stack

Applications

Sector’s Distributed File System (SDFS)

Sphere’s UDF

Routing & Transport Services

UDP-based Data Transport Protocol (UDT)

Hadoop & Sector

Hadoop SectorStorage Cloud Block-based file

systemFile-based

Programming Model

MapReduce UDF & MapReduce

Protocol TCP UDP-based protocol (UDT)

Replication At time of writing PeriodicallySecurity Not yet HIPAA capableLanguage Java C++

38

MalStone Benchmark

Benchmark developed by Open Cloud Consortium for clouds supporting data intensive computing.

Code to generate synthetic data required is available from code.google.com/p/malgen

Stylized analytic computation that is easy to implement in MapReduce and its generalizations.

39

MalStone B

time40

dk-2 dk-1 dk

sites entities

MalStone B Benchmark

41

MalStone BHadoop v0.18.3 799 minHadoop Streaming v0.18.3 142 minSector v1.19 44 min# Nodes 20 nodes# Records 10 BillionSize of Dataset 1 TB

Trading Functionality for ScalabilityDatabases Data Clouds

Scalability 100’s TB 100’s PBFunctionality

Full SQL-based queries, including joins

Optimized access to sorted tables (tables with single keys)

Optimized Databases are optimized for safe writes

Clouds optimized for efficient reads

Consistency model

ACID (Atomicity, Consistency, Isolation & Durability) – database always consist

Eventual consistency – updates eventually propagate through system

Parallelism Difficult because of ACID model; shared nothing is possible (Graywolf)

Basic design incorporates parallelism over commodity components

Scale Racks Data center

42

Not Everyone Agrees

David J. DeWitt and Michael Stonebraker, MapReduce: A Major Step Backwards, Database Column, Jane 17, 2008

43

Part 6. Standards Efforts

44

Change of gauge at Ussuriisk (near Vladivostok) at the Chinese –Russian border

Train gauge in China is 1435 mm

Train gauge in Russia is 1520 mm

How can a cloud application move from one cloud storage service to another?

Standards Efforts for Clouds

Cloud Computing Interoperability Forum (CCIF) Open Cloud Consortium (OCC) Open Grid Forum (OGF) Distributed Management Task Force (DMTF) Storage Network Industrial Association (SNIA) Plus several others…

45

www.opencloudconsortium.org

1. Supports the development of standards.2. Supports reference implementations for

cloud computing, preferably open source. 3. Manages a testbed for cloud computing

called the Open Cloud Testbed.4. Supports the development of benchmarks.5. Sponsors workshops and other events related

to cloud computing.

46

Activities Currently Focused Around Five Use Cases

1. Moving an existing cloud application from Cloud 1 to Cloud 2 without changing the application.

2. Providing surge capacity for an application on Cloud 1 using any of the Clouds 2, 3, … (without changing the application).

Cloud 1 Cloud 2

1. Migrate / port2. Surge / burst

Large Data Cloud Use Cases3. Moving a large data cloud application from

one large data cloud storage service to another.

4. Moving a large data cloud application from one large data cloud compute service to another.

Large Data Cloud Storage Services

Large Data Cloud Compute Services

App 1 App 2

Inter-Cloud Use Case5. Inter-cloud communication between two

HIPAA compliant clouds.

Cloud 1 Cloud 2

OCC Welcomes New Members

Companies and organizations are welcome to join the Open Cloud Consortium (OCC)www.opencloudconsortium.org/membership.html

Join one of our working groups– Large Data Clouds Working Group– Standard Cloud Performance Measurement

(SCPM) Working Group– Information Sharing & Security Working Group

For More Information

Contact information: Robert Grossman

[email protected] blog.rgrossman.com Web sites

– www.opendatagroup.com– www.ncdm.uic.edu– www.opencloudconsortium.org

51