Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
15-319 / 15-619Cloud Computing
Recitation 8
October 16, 2018
Overview● Last week’s reflection
○ Project 3.2
○ OLI Unit 3 - Module 13
○ Quiz 6
● This week’s schedule○ Project 3.3
○ OLI Unit 4 - Module 14 (Storage)
○ Quiz 7 - Thursday, Oct 18
○ Intro. to Scala Primer and Intro. to Apache Spark Primer
● Team Project, Phase 1○ Checkpoint 1 report is due on Sunday!
○ Q1 early bird bonus is due on Sunday
2
Last Week● OLI : Module 13 - Storage and network virtualization
○ Quiz 6
● Project 3.2
○ Social Networking Timeline with Heterogenous Backends
■ MySQL
■ Neo4j
■ MongoDB
■ Choosing Databases
● Consistency Programming Exercise on Cloud9
3
This Week● OLI : Module 14 - Cloud Storage
● Quiz 7 - Thursday, Oct 18 (Not Friday!)
● Project 3.3 - Sunday, October 21
○ Task 1: Implement a Strong Consistency Model for
distributed data stores
○ Task 2: Implement a Strong Consistency Model
cross-region data stores
○ Bonus task: Implement an Eventual Consistency Model
● Primers released this week
○ Introduction to Scala Primer
○ Introduction to Apache Spark
4
Conceptual Topics - OLI Content
● OLI UNIT4 - Module 14: Cloud Storage○ File Systems and Databases○ Scalability and Consistency○ NoSQL, NewSQL and Object Storage
● Quiz 7○ DUE on Thursday, October 18
■ Remember to hit submit before the deadline!
5
Individual Projects
● Done
○ P3.1: Files v/s Databases - comparison and Usage of flat
files, MySQL, Redis, and HBase
○ NoSQL Primer, HBase Basics Primer
● Done
○ P3.2: Social networking with heterogeneous backends
○ MongoDB Primer
● Now
○ P3.3: Replication and Consistency models
○ Intro. to Java Multithreading Primer
○ Thread-safe Programming Primer
○ Intro. to Consistency Models Primer6
Scale of Data is Growing
International Data Corporation's (IDC) Digital Universe Study predicts an increase in the amount of data created globally from ● 16 zettabytes in 2016
to ● 160 zettabytes in 2025.
7
Guo H. Big Earth data: A new frontier in Earth and information sciences[J]. Big Earth Data, 2017, 1(1-2): 4-20.
Users are Global
8
~26ms
~14ms
● Speed of Light (≈3.00×108 m/s)● Inherent latencies
Pittsburgh
Moscow
San Francisco
● Typical end-to-end latency
○ The client sends the request to the server
■ Network latency
○ The backend processes the request and sends
the response
■ Overhead of fetching and processing data
from backend
■ Network latency
○ The client receives the response
Typical End-To-End Latency
9
Latency with a Single Backend
10
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage
~20ms ~40ms
~320ms
Client Statistics:Min Latency: 20msMax Latency: 320msAverage Latency: 126ms
Replicate the Data Globally
11
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
~20ms
Backend Storage 2: Europe Central
~40ms
~20ms
Client Statistics:Min Latency: 20msMax Latency: 40msAverage Latency: 26.6ms
Replicate the Data Close to Users
12
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
~20ms
Backend Storage 2: Europe Central
~20ms
~20ms
Client Statistics:Min Latency: 20msMax Latency: 20msAverage Latency: 20ms
Backend Storage 3: USA East
Demo
Run:• ping www.cmu.edu• ping www.google.com• ping www.berkeley.edu• ping www.nus.edu.sg
Compare the latencies of these global webpages!
13
● As you can see, by adding replicas to strategic
locations in the world, we can significantly reduce
the latency seen by our global clients
● Each added datacenter decreases the average
latency
● But how about the cost?
Replication
14
What If We Continue to Replicate?
15
Client Statistics:Min Latency: ??Max Latency: ??Average Latency: ??
Cost: ?????
We have to consider cost as well as data consistency across replicas, which increases the latency for writes.
Replication READ
16
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
~20ms
Backend Storage 3: Europe Central
~20ms
~20ms
Read Operation:
Min Latency: 20msMax Latency: 20msAverage Latency: 20ms
Backend Storage 2: USA East
Replication WRITE
17
Client 2:Pittsburgh
Client 3:Moscow
Client 1: San Francisco
Backend Storage 1: USA West
Backend Storage 3: Europe Central
~20ms
Write Operation:
Latency for Client 2 = 20ms +MAX(40ms, 240ms)= 260ms
All the clients suffer fromlong latency
Backend Storage 2: USA East
~40ms~240ms
● Read operations are very fast! ○ All clients have a replica close to them to
access● Write requests are quite slow
○ Write requests must update all the replicas○ If multiple write requests for a certain key,
then they may have to wait for each other to complete
Replication Reads and Writes
18
● Duplicate the data across multiple instances● Advantages
○ Low latency for reads○ Reduce the workload of a single backend server
(Load balance for hot keys) ○ Handle failures of nodes (High availability)
● Disadvantages○ Requires more storage capacity and cost○ Updates are slower○ Changes must reflect on all datastores either
instantly or eventually (Data Consistency)
Pros and Cons of Replication
19
Data Consistency Becomes Necessary
● Data consistency across replicas is important○ Five consistency levels:
Strict, Strong (Linearizability), Sequential, Causal
and Eventual Consistency
● This week’s task: Implement Strong Consistency○ All datastores must return the same value for a key
at all times
○ The order in which the values are updated must
be preserved at all replicas
● Bonus: Implement Eventual Consistency20
Choosing a Consistency LevelBad Example
21
Account Balance
xxxxx-4437 $100
Choosing a Consistency LevelBad Example
22
Account Balance
xxxxx-4437 $100
Withdraw $100
Withdraw $100
Choosing a Consistency LevelBad Example
23
Account Balance
xxxxx-4437 $0
$100
$100
Bank lost $100
Choosing a Consistency LevelGood Example
24
Account Balance
xxxxx-4437 $100
Withdraw $100
Withdraw $100
Choosing a Consistency LevelGood Example
25
Account Balance
xxxxx-4437 $100
Withdraw $100
Withdraw $100
Choosing a Consistency LevelGood Example
26
Account Balance
xxxxx-4437 $0
$100
$0
P3.3: Consistency Models
27
Tradeoff: Consistency vs. Latency● Strict● Strong● Sequential● Causal● Eventual
vs.
P3.3 Task 1: Strong Consistency
28
Coordinator:
● A request router that
routes the web requests
from the clients to
datacenter
● Preserves the order of
both READ&WRITE
requests
Datastore:
● The actual backend
storage that persists
collections of data
P3.3 Task 1: Strong Consistency
29
Single PUT request for key ‘X’
● Block all GET for key ‘X’
until all datastores are
updated
● GET requests for a
different key ‘Y’ should
not be blocked
Multiple PUT requests for ‘X’
● Resolved in order of their
timestamp when received
by Coordinator.
● Any GET request in
between 2 PUTs must
return the first PUT value
P3.3 Task 2: ArchitectureGlobal Coordinators and Data Stores
us-westus-east
Singapore
DCI
coordinator datacenter
DCI
coordinator datacenterDCI
coordinator datacenter
30
P3.3 Tasks 1 & 2: Strong Consistency
31
● Note: Every request has a global timestamp order
○ In task 1, the timestamp is issued by the
coordinator
○ In task 2, the timestamp is issued by the TrueTime
Server
● Operations must be ordered by the timestamps
Requirement: At any given point of time, all clients
should read the same data from any datacenter
replica
P3.3 Task 2: Architecture
3dd
Task 2 Workflow and Example
• Launch a total of 8 machines (3 data centers, 3 coordinators, 1
truetime server and 1 client).
• All machines should be launched in US East region.
We will simulate global latencies for you.
• The “US East” here has nothing to do with
the simulated location of datacenters
and coordinators in the project.
• Implement the code
for the Coordinators and Datastores
33
PRECOMMIT
34
● This API method will contact the datastores of a given region, and notify it that a PUT request is being serviced for the specified key, starting at the specified timestamp.
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
35
TrueTime Server
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
36
TrueTime Server
put?key=X&value=1
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
37
TrueTime Server
put?key=X&value=1
KeyValueLib.getTime()
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
38
TrueTime Server
put?key=X&value=1
precommit?key=X×tamp=1
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
39
TrueTime Server
put?key=X&value=1
PUT(REGIONAL-DNS, "X", "1", 1, "strong")
US-EAST DC
US-WEST DC
SINGAPORE DC
US-EAST COORDINATOR
US-WEST COORDINATOR
SINGAPORECOORDINATOR
Client
P3.3 Task 2: Complete KeyValueStore.java (in DCs) and Coordinator.java (in Coordinators)
40
TrueTime Server
put?key=X&value=1
Response back
P3.3: Eventual Consistency (Bonus)
41
● Write requests are performed in the order received by local coordinator○ Operations may not be blocked for replica
consensus (no communication between servers across region)
● Clients that request data may receive multiple versions of the data, or stale data○ Problems left for the application owner to
resolve
More Hints● In strong consistency, “PRECOMMIT” should be
useful to help you lock requests because they are
able to communicate with datastores
● Don’t wait for the PRECOMMIT messages that
might be sent from other coordinators halfway,
or you cannot pass all the test cases
● Lock by the key across all the datacenters in
strong consistency
● Remember to update both KeyValueStore.java
and Coordinator.java in Eventual Consistency 42
● Read all three primers (PLEASE!)
● Consider the differences between the 2
consistency models before writing code
● Think about possible race conditions
● Read the hints in the writeup carefully
● Don’t modify any class except
Coordinator.java and KeyValueStore.java
Suggestions
43
How to Run Your Program
● Run “./copy_code_to_instances” in client instance to copy your
code to servers on each of the data centers instance,
coordinators instance.
● Run “./start_servers” in the client instance to start the servers
on each of the data center instances, coordinator instances
and the truetime server instance.
● Use “./consistency_checker strong”, or “./consistency_checker
eventual” to test your implementation of each consistency.
(Our grader uses the same checker)
● If you want to test one simple PUT/GET request, you could
directly send the request to datacenters or coordinators.
44
Start early!Trickiest Individual Project!
45
tWITTER DATA ANALYTICS:TEAM PROJECT
46
Team Project
33
Twitter Analytics Web Service• Given ~1TB of Twitter data• Build a performant web service
to analyze tweets• Explore web frameworks• Explore and optimize storage systems
Team Project● Phase 1:
○ Q1○ Q2 (MySQL AND HBase)
● Phase 2○ Q1○ Q2 & Q3 (MySQL AND HBase)
● Phase 3○ Q1○ Q2 & Q3 (MySQL OR HBase OR ???)
Input your team account ID and GitHub
username on TPZ
34
Team Project Deadlines● Phase 1 milestones:
○ Checkpoint 1:■ Report, due on Sunday, 10/14
○ Checkpoint 2:■ Q1 on scoreboard, due on Sunday, 10/21
○ Phase 1 Deadline:■ Q2 on scoreboard, due on Sunday, 10/28
○ Phase 1, code and report:■ due on Tuesday, 10/30
36
Web Frameworks● Java: Vertx, Undertow, Rapidoid, Spring Boot● Python: Flask, Django, Tornado● Javascript: Node.JS● Ruby: Ruby on Rails
50
Choosing a Web Framework● Web Framework
○ Which one should I choose?■ Consider:
● Performance is at the top priority● Performance is not the only criteria to choose
the web framework
51
Q1 FAQ, QR Decoding● QR Code Q & A
○ Why is the string in the decoding example different from the encoding example?■ Read the write-up carefully. We asked you not only
to decode a simple QR Code. We want you to identify the QR Code in a 32*32 Matrix.
○ What is the order of decoding a QR Code?■ Do the logistic map with the whole matrix from left
to right, top to bottom■ Locate the QR Code and recognize the rotation
degree■ Extract the payload and translate it.
52
Q1 FAQ, QR Encoding● QR Code Q & A
○ The left part of the QR, why do the top and bottom position detection patterns have 8*9 cells (Figure 7)?■ We made the 9th column blank to simplify the
format information. What you need to do is follow the zigzag as shown in Figure 7 in the writeup.
53
Q1 FAQ, Misc● QR Code Q & A
○ Do we need to consider the case when the payload happens to have a position detection pattern at the bottom right.■ This will not happen since every 8 bits follow 1
correction bit. If the QR Code is valid, it won’t form a position detection pattern on the bottom right.
○ What type of instance should we use for submission?■ M family but not larger than the large series (e.g.,
m4.large, m5.large)
54
Profiling● Benchmarking and Logging Tools
○ Cloudwatch○ Stopwatch (Java) & Log○ NewRelic
55
Team Project, Q2● Query 2 is coming next week
○ ETL is the most costly part. Please review your ETL code carefully before running it.
○ Think about the schema before running ETL. Otherwise, you might have to rerun your ETL job.
● Read this good question on Piazza: https://piazza.com/class/jkvtywetsu35vh?cid=1914
56
57
Team Project Time TablePhase (and query due) Start Deadline Code and Report Due
Phase 1● Q1, Q2
Monday 10/08/201800:00:00 ET
Q1: Sunday 10/21/201823:59:59 ETQ2: Sunday 10/28/201823:59:59 ET
Tuesday 10/30/201823:59:59 ET
Phase 2● Q1, Q2,Q3
Monday 10/29/201800:00:00 ET
Sunday 11/11/201815:59:59 ET
Phase 2 Live Test (Hbase AND MySQL)
● Q1, Q2, Q3
Sunday 11/11/201818:00:00 ET
Sunday 11/11/201823:59:59 ET
Tuesday 11/13/201823:59:59 ET
Phase 3● Q1, Q2, Q3
Monday 11/12/201800:00:00 ET
Sunday 12/02/201815:59:59 ET
Phase 3 Live Test (Hbase OR MySQL)
● Q1, Q2, Q3
Sunday 12/02/201818:00:00 ET
Sunday 12/02/201823:59:59 ET
Tuesday 12/04/201823:59:59 ET
57
Upcoming Deadlines
● Conceptual Topics: OLI (Module 14)
○ Quiz 7 due: Thursday, 10/18/2018 11:59 PM Pittsburgh
● P3.3: Consistency Models
○ Due: Sunday, 10/21/2018 11:59 PM Pittsburgh
● Team Project: Phase 1
○ Query 1
■ Due: 10/21/2018 11:59 PM Pittsburgh
○ Query 2, (Next Sunday, Oct 28)
■ Due: 10/28/2018 11:59 PM Pittsburgh
58