A Recovery-Friendly, Self-Managing Session State Store

Preview:

DESCRIPTION

A Recovery-Friendly, Self-Managing Session State Store. Benjamin Ling , Emre Kiciman, Armando Fox {bling,emrek,fox}@cs.stanford.edu. Outline. Motivation: What is Session State? SSM: Architecture Algorithm Backpressure and Admission Control SSM + Pinpoint - PowerPoint PPT Presentation

Citation preview

A Recovery-Friendly, Self-Managing A Recovery-Friendly, Self-Managing Session State StoreSession State Store

Benjamin LingBenjamin Ling, Emre Kiciman, Armando , Emre Kiciman, Armando FoxFox

{bling,emrek,fox}@cs.stanford.edu{bling,emrek,fox}@cs.stanford.edu

© 2004 Benjamin Ling

OutlineOutline

Motivation: What is Session State?Motivation: What is Session State?

SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control

SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring

BenchmarksBenchmarks

Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration

ConclusionConclusion

© 2004 Benjamin Ling

Proliferation of J2EE and Web Proliferation of J2EE and Web ServicesServices

J2EE embraced as industry standardJ2EE embraced as industry standard

Framework Framework Simplifies developmentSimplifies development

Allows for portability of servicesAllows for portability of services

Standardized interfacesStandardized interfaces

However, difficulties remain…However, difficulties remain…

© 2004 Benjamin Ling

The Pain – Administration and The Pain – Administration and MaintenanceMaintenance

Administration is difficult and costlyAdministration is difficult and costly $$ -- Database admins cost ~$200K/yr a head$$ -- Database admins cost ~$200K/yr a head

Development efficiency negatively impactedDevelopment efficiency negatively impacted

Failure/Recovery is costlyFailure/Recovery is costly Recovery slow, especially site outagesRecovery slow, especially site outages

Data loss on crashesData loss on crashes

Users adversely affectedUsers adversely affected

© 2004 Benjamin Ling

Not All State is Created EqualNot All State is Created Equal

Various types of state in J2EE…Various types of state in J2EE… User profile stateUser profile state

Persistent shared statePersistent shared state

Transaction history stateTransaction history state

But usually stored in the same placeBut usually stored in the same place Stored in DB or FSStored in DB or FS

Focus on particular classFocus on particular class

Exploit its propertiesExploit its properties

Simplify Administration and Simplify Administration and MaintenanceMaintenance

© 2004 Benjamin Ling

Example of Session StateExample of Session State

© 2004 Benjamin Ling

Properties of Session StateProperties of Session State

Subcategory of session stateSubcategory of session state Single-user, serial access, semi-persistent dataSingle-user, serial access, semi-persistent data

Examples: Temporary application data, Examples: Temporary application data, application workflowapplication workflow

Example of usage (e.g. J2EE):Example of usage (e.g. J2EE):

Browser

App Server1

2

34

56

© 2004 Benjamin Ling

GoalGoal

Build a session state store that is:Build a session state store that is:

Failure-friendlyFailure-friendly Does not lose data on crashDoes not lose data on crash Degrades gracefullyDegrades gracefully

Recovery-friendlyRecovery-friendly Recovers fastRecovers fast

Self-ManagingSelf-Managing

© 2004 Benjamin Ling

OutlineOutline

Motivation: What is Session State?Motivation: What is Session State?

SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control

SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring

BenchmarksBenchmarks

Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration

ConclusionConclusion

© 2004 Benjamin Ling

Session State Manager (SSM)Session State Manager (SSM)

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

AppServerSTUB

AppServerSTUB

Redundant, in-memory Redundant, in-memory hash table distributed hash table distributed

across nodesacross nodes

Algorithm: Redundancy similar to quorums Algorithm: Redundancy similar to quorums • Write to many random nodes, wait for few Write to many random nodes, wait for few

(avoid performance coupling)(avoid performance coupling)• Read oneRead one

RAM, Network Interface

© 2004 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

© 2004 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

© 2004 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

© 2004 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

Brick 5

© 2004 Benjamin Ling

Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2

14

Brick 5

Cookie holds metadata

Crashed? Slow?

© 2004 Benjamin Ling

Read example:Read example:

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

14

Brick 5

Try to read from Bricks 1, 4

© 2004 Benjamin Ling

Read example:Read example:

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

14

Brick 5

© 2004 Benjamin Ling

Read example:Read example:

Browser

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

Brick 1 crashes

© 2004 Benjamin Ling

Read example:Read example:

Browser

AppServerSTUB

Brick 2

Brick 3

Brick 4

Brick 5

© 2004 Benjamin Ling

SSM: Failure and RecoverySSM: Failure and Recovery

Failure of single nodeFailure of single node No data loss, WQ-1 remainNo data loss, WQ-1 remain

State is available for R/W during failureState is available for R/W during failure

RecoveryRecovery Restart – No recoveryRestart – No recovery

No special case recovery codeNo special case recovery code

State is available for R/W during brick restartState is available for R/W during brick restart

Session state is self-recovering Session state is self-recovering User’s access pattern causes data to be rewrittenUser’s access pattern causes data to be rewritten

© 2004 Benjamin Ling

Backpressure and Admission ControlBackpressure and Admission Control

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

AppServerSTUB

Heavy flow to Brick 3

Drop Requests

© 2004 Benjamin Ling

Backpressure and Admission ControlBackpressure and Admission Control

AppServerSTUB

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

AppServerSTUB

Drop Requests

Reduce Sending

Reject requests

© 2004 Benjamin Ling

OutlineOutline

Motivation: What is Session State?Motivation: What is Session State?

SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control

SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring

BenchmarksBenchmarks

Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration

ConclusionConclusion

© 2004 Benjamin Ling

Downtime

Recovery PhilosophyRecovery Philosophy

RECOVERY

COST

DETECTION ACCURACY

AccurateLax

Downtime

Undetected Errors

Undetected Errors

Ideal IdealCheap

Expensive

Aggressive

Hard

Hard

© 2004 Benjamin Ling

Failure detection and RecoveryFailure detection and Recovery

Failure

Detection Recovered

Recovery

SSM: Failure masked

Instant recovery

© 2004 Benjamin Ling

False PositivesFalse Positives

False positivetriggered Instant recovery

Normal Operation

© 2004 Benjamin Ling

Statistical MonitoringStatistical Monitoring

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

PinpointPinpoint

Statistics

Statistics

NumElementsMemoryUsed

InboxSizeNumDroppedNumReadsNumWrites

© 2004 Benjamin Ling

Statistical MonitoringStatistical Monitoring

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

PinpointPinpoint

Statistics

Statistics

NumElementsMemoryUsed

InboxSizeNumDroppedNumReadsNumWrites

REBOOT

© 2004 Benjamin Ling

Statistical MonitoringStatistical Monitoring

Brick 1

Brick 2

Brick 3

Brick 4

Brick 5

PinpointPinpoint

Statistics

Statistics

NumElementsMemoryUsed

InboxSizeNumDroppedNumReadsNumWrites

© 2004 Benjamin Ling

SSM MonitoringSSM Monitoring

N replicated bricks handle read/write requestsN replicated bricks handle read/write requests Cannot do structural anomaly detection!Cannot do structural anomaly detection!

Alternative features (performance, mem usage, etc)Alternative features (performance, mem usage, etc)

Activity statistics: How often did a brick do Activity statistics: How often did a brick do something?something? Msgs received/sec, dropped/sec, etc.Msgs received/sec, dropped/sec, etc.

Same across all peers, assuming balanced workloadSame across all peers, assuming balanced workload

Use anomalies as likely failuresUse anomalies as likely failures

State statistics: Current state of systemState statistics: Current state of system Memory usage, queue length, etc.Memory usage, queue length, etc.

Similar pattern across peers, but may not be in phaseSimilar pattern across peers, but may not be in phase

Look for patterns in time-series; differences in patterns Look for patterns in time-series; differences in patterns indicate failure at a node.indicate failure at a node.

© 2004 Benjamin Ling

Surprising Patterns in Time-SeriesSurprising Patterns in Time-Series

1. Discretize time-series into string. [Keogh]1. Discretize time-series into string. [Keogh]

[0.2, 0.3, 0.4, 0.6, 0.8, 0.2] -> “aaabba”[0.2, 0.3, 0.4, 0.6, 0.8, 0.2] -> “aaabba”

2. Calculate the frequencies of short substrings in the 2. Calculate the frequencies of short substrings in the string.string.

““aa” occurs twice; “ab”, “bb”, “ba” occurs once.aa” occurs twice; “ab”, “bb”, “ba” occurs once.

3. Compare frequencies to normal, look for substrings 3. Compare frequencies to normal, look for substrings that occur much less or much more than normal.that occur much less or much more than normal.

© 2004 Benjamin Ling

OutlineOutline

Motivation: What is Session State?Motivation: What is Session State?

SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control

SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring

BenchmarksBenchmarks

Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration

ConclusionConclusion

© 2004 Benjamin Ling

MicrobenchmarksMicrobenchmarks

UC Berkeley Millennium Cluster UC Berkeley Millennium Cluster Six bricks runningSix bricks running

Candidate Write Set = 3, Write quota = 2Candidate Write Set = 3, Write quota = 2

Candidate Read Set = 2Candidate Read Set = 2

State Size = 8KState Size = 8K

© 2004 Benjamin Ling

Induced FaultInduced Fault

One bricked killed Brick restarted by PP

SSM unaffected

© 2004 Benjamin Ling

Memory faultMemory fault

Memory fault detected in hash PP restarts Brick

SSM unaffected

© 2004 Benjamin Ling

Network Fault – 70% packet lossNetwork Fault – 70% packet loss

Network fault injected

Fault detectedBrick killed

PP restarts

Brick

© 2004 Benjamin Ling

Performance FaultPerformance Fault

Performance fault injected

© 2004 Benjamin Ling

MacrobenchmarkMacrobenchmark

TellMe’s Email-By-Phone ApplicationTellMe’s Email-By-Phone Application

Session state stored in memorySession state stored in memory Email header informationEmail header information

Index informationIndex information

Alter application to store session state usingAlter application to store session state using DiskDisk

SSMSSM

© 2004 Benjamin Ling

MacrobenchmarkMacrobenchmark

25% Throughput Degradation compared to in-memory

Throughput preserved compared to disk

© 2004 Benjamin Ling

Future WorkFuture Work

Integrate with Sun’s reference Application Integrate with Sun’s reference Application ServerServer Enterprise benchmarksEnterprise benchmarks

Statistical Anomaly DetectionStatistical Anomaly Detection Too many magic numbersToo many magic numbers

Integrated ROC-J2EE application serverIntegrated ROC-J2EE application server

© 2004 Benjamin Ling

ConclusionConclusion

SSMSSMA Recovery-Friendly, Self-ManagingA Recovery-Friendly, Self-Managing

Session State StoreSession State Store

Benjamin LingBenjamin Lingbling@cs.stanford.edubling@cs.stanford.edu

http://swig.stanford.edu/http://swig.stanford.edu/

© 2004 Benjamin Ling

Existing solutions :Existing solutions :

File System and DatabasesFile System and Databases Poor failure behaviorPoor failure behavior

Lose data (FS)Lose data (FS)

Slow recovery (Both)Slow recovery (Both)

Difficult to administer (DB)Difficult to administer (DB)

Difficult to tune (both)Difficult to tune (both)

In-memory replication using primary/secondary:In-memory replication using primary/secondary: Performance couplingPerformance coupling

Poor failover (uneven load balancing)Poor failover (uneven load balancing)

© 2004 Benjamin Ling

Other implementation detailsOther implementation details

Garbage collectionGarbage collection

Generational hash tableGenerational hash table Hash table of hash tablesHash table of hash tables Each hash table has an associated time Each hash table has an associated time

rangerange When time has passed, GC that tableWhen time has passed, GC that table

No reference counting, scanning, etc.No reference counting, scanning, etc.

© 2004 Benjamin Ling

SSM: Self-ManagingSSM: Self-Managing

Adaptive:Adaptive: Stub maintains count of maximum allowable in-flight Stub maintains count of maximum allowable in-flight

requests to each brickrequests to each brick Additive increase on successful request Additive increase on successful request Multiplicative decrease on timeoutMultiplicative decrease on timeout

Stubs discover capacity of each brickStubs discover capacity of each brick

Self-TuningSelf-Tuning

Admission controlAdmission control Stubs say “no” if insufficient bricksStubs say “no” if insufficient bricks Propagate backpressure from bricks to clientsPropagate backpressure from bricks to clients

Turn users away under overloadTurn users away under overload

Self-ProtectingSelf-Protecting

Recommended