Upload
robert-grossman
View
4.634
Download
2
Tags:
Embed Size (px)
Citation preview
An Overview of Cloud Computing:My Other Computer is a Data Center
Robert GrossmanOpen Data Group &
University of Illinois at Chicago
IEEE New Technologies ConferenceAugust 6, 2009
One Definition Clouds provide on-demand resources or
services over a network, often the Internet, with the scale and reliability of a data center.
No standard definition. Cloud architectures are not new. What is new:
– Scale– Ease of use– Pricing model.
7
Elastic, Usage Based Pricing Is New
9
1 computer in a rack for 120 hours
120 computers in three racks for 1 hour
costs the same as
Elastic, usage based pricing turns capex into opex. Clouds can be used to manage surges in computing needs.
Simplicity Offered By the Cloud is New
10
+ .. and you have a computer ready to work.
A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.
Varieties of Clouds Architectural Model
– On-demand computing instances vs large data cloud services
Payment Model– Elastic, usage based pricing,
lease/own, … Management Model
– Private vs Public; Single vs Multiple Tenant; …
Programming Model– Queue Service, MPI,
MapReduce, Distributed UDF12
Computing instances vs large data cloud services
Private internal vspublic external
Elastic, usage-based pricing or not
All combinations occur.
Architectural Models:How Do You Fill a Data Center?
Cloud Storage Services
Cloud Compute Services (MapReduce & Generalizations)
Cloud Data Services (BigTable, etc.)
Quasi-relational Data Services
App App App App App
App App
App App
large data cloud services
App App App…
on-demand computing instances
Payment Models
Buying racks, containers and data centers Leasing racks containers and data centers Utility based computing (pay as you go)
– Moves cap ex to op ex– Handle surge requirements (use 1000 servers for 1
hour vs 1 server for 1000 hours)
14
Management Models
Public, private and hybrid models Single tenant vs multiple tenant (shared vs
non-shared hardware) Owned vs leased Manage yourself vs outsource management All combinations are possible
15
Programming Models
Amazon’s Simple Queue Service
MPI, sockets, FIFO
16
MapReduce Distributed UDF
on-demandcomputing instances
large data cloud services
DryadLINQ Azure services
Part 3. Cloud Computing Industry
“Cloud computing has become the center of investment and innovation.”Nicholas Carr, 2009 IDC Directions
17
Cloud computing is approaching the top of the Gartner hype cycle.
IaaS, PaaS and SaaS Point of ViewSaaS
PaaS
IaaSInfrastructure as a Service
PRODUCT: Compute power, storage and networking infrastructure over the internet, provided as a virtual machine image
USERS: Developers
Platform as a ServicePRODUCT: storage, compute and other services to simplify application development, especially of web applications.
USERS: Application Developers
Software as a ServicePRODUCT: Finished application available on demand to end user
USERS: Software consumer
Building Data Centers
Sun’s Modular Data Center (MD)
Formerly Project Blackbox
Containers used by Google, Microsoft & others
Data center consists of 10-60+ containers.
19
Data Center Operating Systems
Data center services include: VM management services, business continuity services, security services, power management services, etc.
20
workstation
VM 1 VM 5
…VM 1 VM 50,000
…
Data Center Operating System
Berkeley View of Cloud Computing
21
Providers of Cloud Services
Consumers of Cloud Services
Providers of Software as a Service
Consumers of Software as a Service
Berkeley Report on cloud computing divides industry into these layers & concentrates on public clouds.
Data Centers
Transition Taking Place A hand full of players are building multiple data
centers a year and improving with each one. This includes Google, Microsoft, Yahoo, … A data center today costs $200 M – $400+ M Berkeley RAD Report points out analogy with
semiconductor industry as companies stopped building their own Fabs and starting leasing Fabs from others as Fabs approached $1B
22
Mindmeister Map of Cloud Computing
Dupont’s Mindmeister Map divides the industry:– IaaS, PaaS, Management, Community
http://www.mindmeister.com/maps/show_public/15936058
23
Virtualization Virtualization separates logical infrastructure
from the underlying physical resources to decrease time to make changes, improve flexibility, improve utilization and reduce costs
Example - server virtualization. Use one physical server to support multiple logical virtual machines (VMs), which are sometimes called logical partitions.
Technology pioneered by IBM in 1960s to better utilize mainframes
25
Idea Dates Back to the 1960s
26
IBM Mainframe
IBM VM/370
CMS
App
Native (Full) VirtualizationExamples: Vmware ESX
MVS
App
CMS
App
Two Types of Virtualization
Using the hypervisor, each guest OS sees its own independent copy of the CPU, memory, IO, etc.
27
Physical Hardware
Hyperviser
Unmodified Guest OS 1
Unmodified Guest OS 2
Native (Full) VirtualizationExamples: Vmware ESX
Apps
Physical Hardware
Hyperviser
Modified Guest OS 1
Modified Guest OS 2
Para VirtualizationExamples: Xen
Apps
Four Key Properties
1. Partitioning: run multiple VMs on one physical server; one VM doesn’t know about the others
2. Isolation: security isolation is at the hardware level.
3. Encapsulation: entire state of the machine can be copied to files and moved around
4. Hardware abstraction: provision and migrate VM to another server
28
The Google Data Stack
The Google File System (2003) MapReduce: Simplified Data Processing… (2004) BigTable: A Distributed Storage System… (2006)
31
Map-Reduce Example
Input is file with one document per record User specifies map function
– key = document URL– Value = terms that document contains
(“doc cdickens”, “it was the best of times”)
“it”, 1“was”, 1“the”, 1“best”, 1
map
Example (cont’d) MapReduce library gathers together all pairs
with the same key value (shuffle/sort phase) The user-defined reduce function combines all
the values associated with the same key
key = “it”values = 1, 1
key = “was”values = 1, 1
key = “best”values = 1key = “worst”values = 1
“it”, 2“was”, 2“best”, 1“worst”, 1reduce
Generalization: Apply User Defined Functions (UDF) to Files in Storage Cloud
34
map/shuffle reduce
UDFUDF
Google’s Layered Cloud Services
Storage Services
Table Services
Compute Services
35
Google’s Stack
Applications
Google File System (GFS)
Google’s MapReduce
Google’s BigTable
Hadoop’s Layered Cloud Services
Storage Services
Table Services
Compute Services
36
Hadoop’s Stack
Applications
Hadoop Distributed File System (HDFS)
Hadoop’s MapReduce
Sector’s Layered Cloud Services
Storage Services
Table Services
Compute Services
37
Sector’s Stack
Applications
Sector’s Distributed File System (SDFS)
Sphere’s UDF
Routing & Transport Services
UDP-based Data Transport Protocol (UDT)
Hadoop & Sector
Hadoop SectorStorage Cloud Block-based file
systemFile-based
Programming Model
MapReduce UDF & MapReduce
Protocol TCP UDP-based protocol (UDT)
Replication At time of writing PeriodicallySecurity Not yet HIPAA capableLanguage Java C++
38
MalStone Benchmark
Benchmark developed by Open Cloud Consortium for clouds supporting data intensive computing.
Code to generate synthetic data required is available from code.google.com/p/malgen
Stylized analytic computation that is easy to implement in MapReduce and its generalizations.
39
MalStone B Benchmark
41
MalStone BHadoop v0.18.3 799 minHadoop Streaming v0.18.3 142 minSector v1.19 44 min# Nodes 20 nodes# Records 10 BillionSize of Dataset 1 TB
Trading Functionality for ScalabilityDatabases Data Clouds
Scalability 100’s TB 100’s PBFunctionality
Full SQL-based queries, including joins
Optimized access to sorted tables (tables with single keys)
Optimized Databases are optimized for safe writes
Clouds optimized for efficient reads
Consistency model
ACID (Atomicity, Consistency, Isolation & Durability) – database always consist
Eventual consistency – updates eventually propagate through system
Parallelism Difficult because of ACID model; shared nothing is possible (Graywolf)
Basic design incorporates parallelism over commodity components
Scale Racks Data center
42
Not Everyone Agrees
David J. DeWitt and Michael Stonebraker, MapReduce: A Major Step Backwards, Database Column, Jane 17, 2008
43
Part 6. Standards Efforts
44
Change of gauge at Ussuriisk (near Vladivostok) at the Chinese –Russian border
Train gauge in China is 1435 mm
Train gauge in Russia is 1520 mm
How can a cloud application move from one cloud storage service to another?
Standards Efforts for Clouds
Cloud Computing Interoperability Forum (CCIF) Open Cloud Consortium (OCC) Open Grid Forum (OGF) Distributed Management Task Force (DMTF) Storage Network Industrial Association (SNIA) Plus several others…
45
www.opencloudconsortium.org
1. Supports the development of standards.2. Supports reference implementations for
cloud computing, preferably open source. 3. Manages a testbed for cloud computing
called the Open Cloud Testbed.4. Supports the development of benchmarks.5. Sponsors workshops and other events related
to cloud computing.
46
Activities Currently Focused Around Five Use Cases
1. Moving an existing cloud application from Cloud 1 to Cloud 2 without changing the application.
2. Providing surge capacity for an application on Cloud 1 using any of the Clouds 2, 3, … (without changing the application).
Cloud 1 Cloud 2
1. Migrate / port2. Surge / burst
Large Data Cloud Use Cases3. Moving a large data cloud application from
one large data cloud storage service to another.
4. Moving a large data cloud application from one large data cloud compute service to another.
Large Data Cloud Storage Services
Large Data Cloud Compute Services
App 1 App 2
Inter-Cloud Use Case5. Inter-cloud communication between two
HIPAA compliant clouds.
Cloud 1 Cloud 2
OCC Welcomes New Members
Companies and organizations are welcome to join the Open Cloud Consortium (OCC)www.opencloudconsortium.org/membership.html
Join one of our working groups– Large Data Clouds Working Group– Standard Cloud Performance Measurement
(SCPM) Working Group– Information Sharing & Security Working Group
For More Information
Contact information: Robert Grossman
[email protected] blog.rgrossman.com Web sites
– www.opendatagroup.com– www.ncdm.uic.edu– www.opencloudconsortium.org
51