60
CSC 536 Lecture 8

CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Embed Size (px)

Citation preview

Page 1: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

CSC 536 Lecture 8

Page 2: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Outline

Reactive StreamsStreamsReactive streamsAkka streams

Case studyGoogle infrastructure (part I)

Page 3: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Reactive Streams

Page 4: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Streams

Stream Process involving data flow and transformation Data possibly of unbounded size Focus on describing transformation

Examples bulk data transfer real-time data sources batch processing of large data sets monitoring and analytics

Page 5: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Needed: Asynchrony

For fault tolerance: Encapsulation Isolation

For scalability: Distribution across nodes Distribution across cores

Problem: Managing data flow across an async boundary

Page 6: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Types of Async Boundaries

between different applications

between network nodes

between CPUs

between threads

between actors

Page 7: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Possible solutions

Traditional way: Synchronous/blocking (possibly remote) method calls

Does not scale

Page 8: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Possible solutions

Traditional way: Synchronous/blocking (possibly remote) method calls

Does not scale

Push way: Asynchronous/non-blocking message passing

Scales! Problem: message buffering and message dropping

Page 9: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Supply and Demand

Traditional way: Synchronous/blocking (possibly remote) method calls

Does not scale

Push way: Asynchronous/non-blocking message passing

Scales! Problem: message buffering and message dropping

Reactive way: non-blocking non-dropping

Page 11: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Supply and Demand

data items flow downstream demand flows upstream data items flow only when there is demand

recipient is in control of incoming data rate data in flight is bounded by signaled demand

Page 12: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Dynamic Push-Pull

“push” behavior when consumer is faster“pull” behavior when producer is fasterswitches automatically between thesebatching demand allows batching data

Page 13: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Tailored Flow Control

Splitting the data means merging the demand

Page 14: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Tailored Flow Control

Merging the data means splitting the demand

Page 15: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Reactive Streams

Back-pressured Asynchronous Stream Processing asynchronous non-blocking data flow asynchronous non-blocking demand flow Goal: minimal coordination and contention

Message passing allows for distribution across applications across nodes across CPUs across threads across actors

Page 16: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Reactive Streams Projects

Standard implemented by many libraries

Engineers from Netflix Oracle Red Hat Twitter Typesafe …See http://reactive-streams.org

Page 17: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Reactive Streams

All participants had the same basic problem

All are building tools for their community

A common solution benefits everybody

Interoperability to make best use of efforts minimal interfaces rigorous specification of semantics full TCK for verification of implementation complete freedom for many idiomatic APIs

Page 18: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

The underlying (internal) API

trait Publisher[T] {

def subscribe(sub: Subscriber[T]): Unit

}trait Subscription {

def requestMore(n: Int): Unit

def cancel(): Unit

}

trait Subscriber[T] {

def onSubscribe(s: Subscription): Unit

def onNext(elem: T): Unit

def onError(thr: Throwable): Unit

def onComplete(): Unit

}

Page 19: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

The Process

Page 20: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Reactive Streams

All calls on Subscriber must dispatch async

All calls on Subscription must not block

Publisher is just there to create Subscriptions

Page 21: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Akka Streams

Powered by Akka Actors

Type-safe streaming through Actors with bounded buffering

Akka Streams API is geared towards end-users

Akka Streams implementation uses the Reactive Streams interfaces (Publisher/Subscriber) internally to pass data between the different processing stages

Page 22: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Examples

View slides 62-80 of http://www.slideshare.net/ktoso/reactive-streams-akka-streams-geecon-prague-2014

basic.scala

TcpEcho.scala

WritePrimes.scala

Page 23: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Overview of Google’s distributed systems

Page 24: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Original Google search engine architecture

Page 25: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

More than just a search engine

Page 26: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Organization of Google’sphysical infrastructure

40-80 PCsper rack(terabytes ofdisk space each)

30+ racksper cluster

Hundredsof clustersspread acrossdata centersworldwide

Page 27: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

System architecture requirements

Scalability

Reliability

Performance

Openness (at the beginning, at least)

Page 28: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Overall Google systems architecture

Page 29: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Google infrastructure

Page 30: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Design philosophy

SimplicitySoftware should do one thing and do it well

Provable performance“every millisecond counts”Estimate performance costs (accessing memory and disk, sending packet over network, locking and unlocking a mutex, etc.)

Testing”if it ain’t broke, you’re not trying hard enough”Stringent testing

Page 31: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Data and coordination services

Google File System (GFS)Broadly similar to NFS and AFSOptimized to type of files and data access used by Google

BigTableA distributed database that stores (semi-)structured dataJust enough organization and structure for the type of data Google uses

Chubbya locking service (and more) for GFS and BigTable

Page 32: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

GFS requirements

Must run reliably on the physical platformMust tolerate failures of individual components

So application-level services can rely on the file system

Optimized for Google’s usage patternsHuge files (100+MB, up to 1GB)Relatively small number of filesAccesses dominated by sequential reads and appendsAppends done concurrently

Meets the requirements of the whole Google infrastructurescalable, reliable, high performance, openImportant: throughput has higher priority than latency

Page 33: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

GFS architecture

File stored in 64MB chunks in a cluster witha master node (operations log replicated on remote machines)hundreds of chunk servers

Chunks replicated 3 times

Page 34: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Reading and writing

When the client wants to access a particular offset in a fileThe GFS client translates this to a (file name, chunk index)And then send this to the master

When the master receives the (file name, chunk index) pairIt replies with the chunk identifier and replica locations

The client then accesses the closest chunk replica directly

No client-side cachingCaching would not help in the type of (streaming) access GFS has

Page 35: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Keeping chunk replicas consistent

Page 36: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Keeping chunk replicas consistent

When the master receives a mutation request from a clientthe master grants a chunk replica a lease (replica is primary)returns identity of primary and other replicas to client

The client sends the mutation directly to all the replicasReplicas cache the mutation and acknowledge receipt

The client sends a write request to primaryPrimary orders mutations and updates accordinglyPrimary then requests that other replicas do the mutations in the same orderWhen all the replicas have acknowledged success, the primary reports an ack to the client

What consistency model does this seem to implement?

Page 37: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

GFS (non-)guarantees

Writes (at a file offset) are not atomicConcurrent writes to the same location may corrupt replicated chunksIf any replica is left inconsistent, the write fails (and is retried a few times)

Appends are executed atomically “at least once”Offset is chosen by primary May end up with non-identical replicated chunks with some having duplicate appendsGFS does not guarantee that the replicas are identicalIt only guarantees that some file regions are consistent across replicas

When needed, GFS needs an external locking service (Chubby)As well as a leader election service (also Chubby) to select the primary replica

Page 38: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Bigtable

GFS provides raw data storage

Also needed:Storage for structured data ...... optimized to handle the needs of Google’s apps ...... that is reliable, scalable, high-performance, open, etc

Page 39: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Examples of structured data

URLs:Content, crawl metadata, links, anchors, PageRank, ...

Per-user data:User preference settings, recent queries/search results, …

Geographic locations:Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, …

Page 40: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Commercial DB

Why not use commercial database?Not scalable enoughToo expensiveFull-featured relational database not requiredLow-level optimizations may be needed

Page 41: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Bigtable table

Implementation: Sparse distributed multi-dimensional map (row, column, timestamp) → cell contents

Page 42: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Rows

Each row has a keyA string up to 64KB in sizeAccess to data in a row is atomic

Rows ordered lexicographicallyRows close together lexicographically reside on one or close machines (locality)

Page 43: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Columns

“com.cnn.www”

‘contents:.’

“<html>…” “CNN Sports”

‘anchor:com.cnn.www/sport’

“CNN world”

‘anchor:com.cnn.www/world’

Columns have two-level name structure:family:qualifier

Column familylogical grouping of datagroups unbounded number of columns (named with qualifiers)may have a single column with no qualifier

Page 44: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Timestamps

Used to store different versions of data in a celldefault to current timecan also be set explicitly set by client

Garbage CollectionPer-column-family GC settings

“Only retain most recent K values in a cell”“Keep values until they are older than K seconds”...

Page 45: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

API

Create / delete tables and column families

Table *T = OpenOrDie(“/bigtable/web/webtable”);

RowMutation r1(T, “com.cnn.www”);

r1.Set(“anchor:com.cnn.www/sport”, “CNN Sports”);

r1.Delete(“anchor:com.cnn.www/world”);

Operation op;

Apply(&op, &r1);

Page 46: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Bigtable architecture

An instance of BigTable is a cluster that stores tableslibrary on client sidemaster servertablet servers

table is decomposed into tablets

Page 47: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Tablets

A table is decomposed into tabletsTablet holds contiguous range of rows100MB - 200MB of data per tabletTablet server responsible for ~100 tablets

Each tablet is represented by A set of files stored in GFS

The files use the SSTable format, a mapping of (string) keys to (string) values

Log files

Page 48: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Tablet Server

Master assigns tablets to tablet servers

Tablet serverHandles reads / writes requests to tablets from clients

No data goes through masterBigtable client requires a naming/locator service (Chubby) to find the root tablet, which is part of the metadata tableThe metadata table contains metadata about actual tablets

including location information of associated SSTables and log files

Page 49: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

MasterUpon startup, must grab master lock to insure it is the single masterof a set of tablet servers

provided by locking service (Chubby)

Monitors tablet serversperiodically scans directory of tablet servers provided by naming service (Chubby)keeps track of tablets assigned to its table servers

obtains a lock on the tablet server from locking service (Chubby)lock is the communication mechanism between master and tablet server

Assigns unassigned tablets in the cluster to tablet servers it monitorsand moving tablets around to achieve load balancing

Garbage collects underlying files stored in GFS

Page 50: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

BigTable tablet architecture

Each is an ordered and immutable mapping of keys to values

Page 51: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Tablet Serving

Writes committed to logMemtable: ordered log of recent commits (in memory)SSTables really store a snapshot

When Memtable gets too bigCreate new empty MemtableMerge old Memtable with SSTables and write to GFS

Page 52: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

SSTable

OperationsLook up value for keyIterate over all key/value pairs in specified range

Relies on lock service (Chubby)Ensure there is at most one active masterAdminister table server deathStore column family informationStore access control lists

Page 53: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Chubby

Chubby provides to the infrastructurea locking servicea file system for reliable storage of small filesa leader election service (e.g. to select a primary replica)a name service

Seemingly violates “simplicity” design philosophy but...

... Chubby really provides an asynchronous distributed agreement service

Page 54: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Chubby API

Page 55: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Overall architecture of Chubby

Cell: single instanceof Chubby system

5 replicas1 master replica

Each replica maintains a databaseof directories and files/locks

Consistency achieved using Lamport’s Paxos consensus protocol that uses an operation logChubby internally supports snapshots to periodically GCthe operation log

Page 56: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Paxos distributed consensus algorithm

A distributed consensus protocol for asynchronous systems

Used by servers managing replicas in order to reach agreement on update when

messages may be lost, re-ordered, duplicatedservers may operate at arbitrary speed and failservers have access to stable persistent storage

Fact: Consensus not always possible in asynchronous systemsPaxos works by insuring safety (correctness) not liveness (termination)

Page 57: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Paxos algorithm - step 1

Page 58: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Paxos algorithm - step 2

Page 59: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

The Big Picture

Customized solutions for Google-type problems

GFS: Stores data reliablyJust raw files

BigTable: provides key/value mapDatabase like, but doesn’t provide everything we need

Chubby: locking mechanismHandles all synchronization problems

Page 60: CSC 536 Lecture 8. Outline Reactive Streams Streams Reactive streams Akka streams Case study Google infrastructure (part I)

Common Principles

One master, multiple workersMapReduce: master coordinates work amongst map / reduce workersChubby: master among five replicas Bigtable: master knows about location of tablet servers GFS: master coordinates data across chunkservers