Upload
dangbao
View
213
Download
0
Embed Size (px)
Citation preview
Raouf Boutaba
Research Challenges in Cloud Computing
D. Cheriton School of Computer Science University of Waterloo
CS856 W’17
Outline • Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security and Enterprise Applications
Data Center Networks • Data center networks form the backbones of data centers • Connecting tens of thousands of servers that may host
millions of applications
• Characteristics • Very large scale • Single administrative domain • Bandwidth is often the performance bottleneck
3 Research Issues and Current Trends
Conventional Architecture
4 Research Issues and Current Trends – Data Center Networks
Source: VL2: A Scalable and Flexible Data Center Network, SIGCOMM 2009
Limitations of Conventional Architectures
• High oversubscription ratio (i.e. creating bandwidth bottleneck) • Typically 1:5, 1:80 or even 1:240 at root
• Poor reliability and utilization
• Static network addresses assignment • Fragmentation of resources • Difficult to support VM migration due to address
reconfiguration
5 Research Issues and Current Trends – Data Center Networks
Design Objectives • Scalability
• Scale to millions of servers without compromising performance
• Economics • Built using commodity switches and servers
• Performance • Low network diameter • Large bisection bandwidth
• Reliability • Multiple forwarding paths for host-to-host communication
• Application Support • Support address reconfiguration and VM migration
6 Research Issues and Current Trends – Data Center Networks
Architectural Proposals • Switch-Centric
• Forwarding using only switches • E.g. Portland, VL2
• Server-Centric • Forwarding using both switches and servers • E.g. DCell, Bcube, CamCube
7 Research Issues and Current Trends – Data Center Networks
Portland • Uses a fat-tree topology for path diversity and large bisection bandwidth • Operates on Layer 2
• Using Pseudo-MAC address in the format of pod.position.port.vmid for forwarding
• Using a centralized fabric manager to manage actual to pseudo MAC mapping
8 Research Issues and Current Trends – Data Center Networks – Switch-Centric
BCube • Targeting container-based datacenters • Using a generic hypercube topology
• Overlay routing at layer 2.5 • Efficient support for communication patterns such as one-to-one, one-to-
many, many-to-many using source routing
9 Research Issues and Current Trends – Data Center Networks – Switch-Centric
• BCube0 = n servers + one mini-switch (n<=8) • Bcubek= n Bcubek-1 + nk n-port switches Connection Rule: The level-k port of the i-th server in the j-th Bcubek-1 to the j-th port of the i-th level-k switch
Research Challenges • Understanding the trade-off between different architectures
• Switch centric vs. Server centric
• Comparison criteria • Network capacity • Robustness • Capital and Operational Cost
• Managing and upgrading existing data center networks over time
10 Research Issues and Current Trends – Data Center Networks
Outline • Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
11
Network Management Issues • Naming and addressing
• Address configuration and management
• Flow control and management • Congestion Control • Flow Scheduling
12 Research Issues and Current Trends
Address Configuration • ID/Locator separation is a design principle of data center
networks. • E.g. Portland maintains the mapping between physical MAC and
hierarchical PMAC addresses, • E.g. BCube assigns virtual addresses to individual host
• Automatic address reconfiguration is a requirement • Manual configuration is costly and error prone
13 Research Issues and Current Trends – Network Management
Congestion Control • Data center traffic typically consists of
• (>80%) Low latency short flows (i.e. user facing requests) • (<20%) throughput sensitive long flows (i.e. backend operations such
as MapReduce)
• Current transport protocol (TCP) is not suitable for such type of traffic pattern • network buffers in switches and servers are often overwhelmed by
long flows, resulting in high latency for short flows
• Achieving fairness by dynamically adjusting congestion window in servers
14 Research Issues and Current Trends – Network Management
Flow Scheduling • Given path diversity provided by data center networks, route
network flows to minimize congestion
• Current Approaches • Equal Cost Multipath (ECMP)
• Determining path using a hash function (called flow-hashing) • Valiant Load Balancing (VLB)
• Bouncing packet off of random intermediary nodes (switches or servers)
• Limitation: • Inefficient for non-uniform traffic patterns
• Two heavy weight flows may collide, resulting in congestion
15 Research Issues and Current Trends – Network Management
Flow Scheduling (cont) • Flow scheduling
• Separate flows into large and small flows
• For small flows, use ECMP or VLB
• For large flows, use centralized scheduling • A variant of NP-hard multi-commodity flow problem
• Implementation • Monitor network flows • Dynamically inserting forwarding entries for large flows
16 Research Issues and Current Trends – Network Management
Research Directions • Configuration Management
• Reducing the complexity of management tasks such as address configuration
• Traffic Management • Support various usage patterns of cloud applications
• Leveraging new network management paradigms such as SDN
17 Research Issues and Current Trends – Network Management
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
18
Resource and Performance Management • A cloud computing environment hosts myriads of
applications with diverse performance objectives
• How to effectively allocate resources to applications to satisfy their performance objectives?
• Sub-problems • Performance modeling and management for each individual
application • Run-time resource management
19 Research Issues and Current Trends
Application Performance Management • An application owner needs to understand the performance model of the
application, and adjust resource requirement according to workload condition • E.g. Increase number of web server replicas to mitigate flash crowd effect
20 Research Issues and Current Trends – Resource & Performance Mgmt
Demand Prediction Controller Application
Performance Model
Output
Input
Application Performance Management (cont)
• Using probabilistic / statistical methods • Queuing Models • Machine learning
• Proactive vs. reactive Control • Proactive control uses predicted demand to allocate resources before
they are needed • Reactive control respond to immediate demand fluctuations when
prediction is not available.
21 Research Issues and Current Trends – Resource & Performance Mgmt
Data Center Resource Management • Objectives
• Mitigating performance bottleneck (i.e. hotspot) • Improving application schedulability • Improving server utilization • Improve resource sharing among applications • Reducing energy cost
• Current approach: using various virtualization techniques • Dynamically adjusting resource allocation of applications • Virtual machine migration
22 Research Issues and Current Trends – Resource & Performance Mgmt
Data Center Resource Management (cont) • Optimal placement problem is a general case of multi-
dimensional bin packing problem • NP-hard to solve
• Additional Factors • Job arrival process • Job duration • Reconfiguration procedure and cost
• E.g. cost of migration
23 Research Issues and Current Trends – Resource & Performance Mgmt
Research Directions
• Understanding application resource requirements • e.g. workload characterization, application performance analysis
• Resource management framework for data-center wide workloads
• Multi-tenancy issues • Application owner and cloud owner may have potentially conflicting
objectives
24 Research Issues and Current Trends – Resource & Performance Mgmt
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
25
Energy Management • Reducing energy consumption is a critical objective of cloud
computing
• Power and cooling cost constitutes a large potion of datacenter expenditure • 25%-30% total data center operational cost
• Government regulations call for environment friendly (i.e. Green) data centers
26 Part 2- Research Issues and Current Trends
Cost of Consumption • Power and Cooling cost millions of dollars monthly
27 Part 2- Research Issues and Current Trends – Energy Management
Estimated Monthly Operational Expenditure of a 50k machine Data Center Source: http://perspectives.mvdirona.com/
Reducing Energy Cost • Server Consolidation
• Reducing number of servers used by turning off unused servers
• Energy-Aware scheduling • Scheduling jobs to reduce power and cooling costs
• Energy Efficient Networks • Dynamically adjust active network elements to reduce power
cost
28 Part 2- Research Issues and Current Trends – Energy Management
Server Consolidation • Consolidating application workloads on a smaller number of
servers to save server power cost
• However, consolidation increases resource contention among applications, which may hurt their performance
• Challenges • Understanding the energy and performance impact of consolidation • Devising effective policies for achieving good trade-off between
performance and power cost
29 Part 2- Research Issues and Current Trends – Energy Management
Energy-aware Workload Scheduling • Power-aware scheduling
• Schedule jobs to minimize server power consumption • E.g. leveraging Dynamic Voltage and Frequency Scaling (DVFS) to
reduce server power consumption • Thermal-aware scheduling
• Scheduling jobs to minimize overall data center temperature • E.g. scheduling jobs to reduce server temperature so as
to reduce cooling cost
30 Research Issues and Current Trends – Energy Management
Energy Efficient Networks • Objective: making data center networks energy-proportional
• Make energy cost proportional to network utilization
• Approach: Given the current network condition, dynamically adjust active network elements to reduce power cost • Powering down unneeded switches and links • Adjusting link rate
• Many modern switch models (e.g. infiniBand) can specify more than one operation range
31 Research Issues and Current Trends – Energy Management
Research Directions • Effectively leveraging latest hardware, software technologies
to achieve high energy cost reduction
• Achieving a good trade-off between performance and energy cost • E.g. Reducing CPU rate using DVFS slows down job execution
32 Research Issues and Current Trends – Energy Management
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
33
Pricing and Economics • Cloud computing is a realization of utility computing
• Provide storage and computing resources using a usage based pricing model
• Demand is highly volatile in Cloud environments • Low resource demand causes low server utilization • High resource demand results in unsatisfied demands, which causes
customer dissatisfaction
34 Research Issues and Current Trends
Pricing and Economics (cont) • Approach: using market economy to shape demand
• Dynamically adjust resource supply and price
• Increase price when demand spikes • Ensure resources are allocated to most needing users • Provide incentive for customers to reduce demand
• Reduce price when demand is low • Incentivize customers to increase demand
35 Part 2- Research Issues and Current Trends
Amazon EC2 Spot Market • Amazon EC2 launched spot
instance service in Dec. 2009
• Price of resources fluctuates with supply and demand • Customers specify their bids in
their resource requests • A market-based mechanism
decides the final price and assign resources to customers
36 Part 2- Research Issues and Current Trends – Pricing and Economics
Price of m1.small linux spot instance in US-West-1 from Sept. 24-Sept. 30, 2010
(Source: www.cloudexchange.org)
Market-Oriented Resource Allocation • Objectives
• Truthful, fair and revenue maximizing
• Additional considerations • Support price discovery
• Providing historical prices • Easy to compute
• Solving NP-hard problems in real-time is not preferred
37 Part 2- Research Issues and Current Trends – Pricing and Economics
Research Directions • Designing and analyzing pricing schemes for cloud
computing • Satisfy all previous objectives is difficult • Most of the existing work use auction mechanisms but mostly focus
on single-round auctions • Need to understand dynamics for multi-round repeat auctions
• More general pricing scheme • Packaging • Volume discount
38 Research Issues and Current Trends – Pricing and Economics
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
39
Security Management • Cloud customers are concerned about privacy and
confidentiality of their data and applications in the Cloud
• Security risks • Information leaking and stealing by
• Adversarial users in the cloud • Cloud providers
• Attacks within data centers • Performance interference and disruption • Denial of Service (DoS) attack
40 Research Issues and Current Trends
Security Management (cont) • Security in traditional environments
• Application owners can modify the security settings of the underlying fabric
• Security in cloud computing environment • Underlying fabric is operated by the cloud infrastructure provider • Individual application owners cannot directly modify security settings • Different stakeholders may have potentially conflicting interests
41 Research Issues and Current Trends
Security in the Cloud • Establishing trust between Cloud providers and customers
• Cloud provider continuously monitor and audit customer’s VMs • Customer Privacy enforcement through attestation
• Relying on trusted platform module (TPM) • Using non-forgeable hardware signatures to prove no non-privileged
memory access has been done
• Auditability must be mutual between providers and customers • Since both sides can be malicious
42 Research Issues and Current Trends – Security Management
Research Directions • Supporting fine grained security requirements
• Different users will have different security needs
• Eliminating source of information leakage • E.g. side channels through memory cache
• Minimizing impact of auditing on performance
43 Research Issues and Current Trends – Security Management
Outline
• Data Center Networks • Network Management • Resource and Performance Management • Energy Management • Pricing and Economics • Security Management • Migrating Enterprise Applications to the Cloud
44
Migration of Enterprise Applications • Outsourcing (or partially outsourcing) enterprise
infrastructure to the cloud is a growing trend in the industry • Reducing capital investment and maintenance cost
• Challenges • Find a cost-effective strategy for outsourcing • Integration with existing business infrastructure • Security and privacy
45 Research Issues and Current Trends
Research Directions • Selecting cloud services among multiple providers
• Evaluating service offerings in terms of performance, reliability, security and cost
• Outsourcing strategies • Determine components to be outsourced • Migration plan and policy configuration
46 Research Issues and Current Trends – Migrating Enterprise Apps
Summary • The advent of cloud computing not only brings significant benefits, but also research challenges
• Cloud computing is an active research area in networks and distributed systems • Many key issues to be resolved • Many research opportunities to be discovered
47 Research Topics in Cloud Computing