Upload
bell
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
A Recovery-Friendly, Self-Managing Session State Store. Benjamin Ling , Emre Kiciman, Armando Fox {bling,emrek,fox}@cs.stanford.edu. Outline. Motivation: What is Session State? SSM: Architecture Algorithm Backpressure and Admission Control SSM + Pinpoint - PowerPoint PPT Presentation
Citation preview
A Recovery-Friendly, Self-Managing A Recovery-Friendly, Self-Managing Session State StoreSession State Store
Benjamin LingBenjamin Ling, Emre Kiciman, Armando , Emre Kiciman, Armando FoxFox
{bling,emrek,fox}@cs.stanford.edu{bling,emrek,fox}@cs.stanford.edu
© 2004 Benjamin Ling
OutlineOutline
Motivation: What is Session State?Motivation: What is Session State?
SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control
SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring
BenchmarksBenchmarks
Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration
ConclusionConclusion
© 2004 Benjamin Ling
Proliferation of J2EE and Web Proliferation of J2EE and Web ServicesServices
J2EE embraced as industry standardJ2EE embraced as industry standard
Framework Framework Simplifies developmentSimplifies development
Allows for portability of servicesAllows for portability of services
Standardized interfacesStandardized interfaces
However, difficulties remain…However, difficulties remain…
© 2004 Benjamin Ling
The Pain – Administration and The Pain – Administration and MaintenanceMaintenance
Administration is difficult and costlyAdministration is difficult and costly $$ -- Database admins cost ~$200K/yr a head$$ -- Database admins cost ~$200K/yr a head
Development efficiency negatively impactedDevelopment efficiency negatively impacted
Failure/Recovery is costlyFailure/Recovery is costly Recovery slow, especially site outagesRecovery slow, especially site outages
Data loss on crashesData loss on crashes
Users adversely affectedUsers adversely affected
© 2004 Benjamin Ling
Not All State is Created EqualNot All State is Created Equal
Various types of state in J2EE…Various types of state in J2EE… User profile stateUser profile state
Persistent shared statePersistent shared state
Transaction history stateTransaction history state
But usually stored in the same placeBut usually stored in the same place Stored in DB or FSStored in DB or FS
Focus on particular classFocus on particular class
Exploit its propertiesExploit its properties
Simplify Administration and Simplify Administration and MaintenanceMaintenance
© 2004 Benjamin Ling
Example of Session StateExample of Session State
© 2004 Benjamin Ling
Properties of Session StateProperties of Session State
Subcategory of session stateSubcategory of session state Single-user, serial access, semi-persistent dataSingle-user, serial access, semi-persistent data
Examples: Temporary application data, Examples: Temporary application data, application workflowapplication workflow
Example of usage (e.g. J2EE):Example of usage (e.g. J2EE):
Browser
App Server1
2
34
56
© 2004 Benjamin Ling
GoalGoal
Build a session state store that is:Build a session state store that is:
Failure-friendlyFailure-friendly Does not lose data on crashDoes not lose data on crash Degrades gracefullyDegrades gracefully
Recovery-friendlyRecovery-friendly Recovers fastRecovers fast
Self-ManagingSelf-Managing
© 2004 Benjamin Ling
OutlineOutline
Motivation: What is Session State?Motivation: What is Session State?
SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control
SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring
BenchmarksBenchmarks
Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration
ConclusionConclusion
© 2004 Benjamin Ling
Session State Manager (SSM)Session State Manager (SSM)
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
AppServerSTUB
AppServerSTUB
Redundant, in-memory Redundant, in-memory hash table distributed hash table distributed
across nodesacross nodes
Algorithm: Redundancy similar to quorums Algorithm: Redundancy similar to quorums • Write to many random nodes, wait for few Write to many random nodes, wait for few
(avoid performance coupling)(avoid performance coupling)• Read oneRead one
RAM, Network Interface
© 2004 Benjamin Ling
Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2
Brick 5
© 2004 Benjamin Ling
Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2
Brick 5
© 2004 Benjamin Ling
Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2
Brick 5
© 2004 Benjamin Ling
Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2
Brick 5
© 2004 Benjamin Ling
Write example: “Write to Many, Wait for Write example: “Write to Many, Wait for Few”Few”
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Try to write to W random bricks, W = 4Must wait for WQ bricks to reply, WQ = 2
14
Brick 5
Cookie holds metadata
Crashed? Slow?
© 2004 Benjamin Ling
Read example:Read example:
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
14
Brick 5
Try to read from Bricks 1, 4
© 2004 Benjamin Ling
Read example:Read example:
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
14
Brick 5
© 2004 Benjamin Ling
Read example:Read example:
Browser
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
Brick 1 crashes
© 2004 Benjamin Ling
Read example:Read example:
Browser
AppServerSTUB
Brick 2
Brick 3
Brick 4
Brick 5
© 2004 Benjamin Ling
SSM: Failure and RecoverySSM: Failure and Recovery
Failure of single nodeFailure of single node No data loss, WQ-1 remainNo data loss, WQ-1 remain
State is available for R/W during failureState is available for R/W during failure
RecoveryRecovery Restart – No recoveryRestart – No recovery
No special case recovery codeNo special case recovery code
State is available for R/W during brick restartState is available for R/W during brick restart
Session state is self-recovering Session state is self-recovering User’s access pattern causes data to be rewrittenUser’s access pattern causes data to be rewritten
© 2004 Benjamin Ling
Backpressure and Admission ControlBackpressure and Admission Control
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
AppServerSTUB
Heavy flow to Brick 3
Drop Requests
© 2004 Benjamin Ling
Backpressure and Admission ControlBackpressure and Admission Control
AppServerSTUB
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
AppServerSTUB
Drop Requests
Reduce Sending
Reject requests
© 2004 Benjamin Ling
OutlineOutline
Motivation: What is Session State?Motivation: What is Session State?
SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control
SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring
BenchmarksBenchmarks
Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration
ConclusionConclusion
© 2004 Benjamin Ling
Downtime
Recovery PhilosophyRecovery Philosophy
RECOVERY
COST
DETECTION ACCURACY
AccurateLax
Downtime
Undetected Errors
Undetected Errors
Ideal IdealCheap
Expensive
Aggressive
Hard
Hard
© 2004 Benjamin Ling
Failure detection and RecoveryFailure detection and Recovery
Failure
Detection Recovered
Recovery
SSM: Failure masked
Instant recovery
© 2004 Benjamin Ling
False PositivesFalse Positives
False positivetriggered Instant recovery
Normal Operation
© 2004 Benjamin Ling
Statistical MonitoringStatistical Monitoring
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
PinpointPinpoint
Statistics
Statistics
NumElementsMemoryUsed
InboxSizeNumDroppedNumReadsNumWrites
© 2004 Benjamin Ling
Statistical MonitoringStatistical Monitoring
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
PinpointPinpoint
Statistics
Statistics
NumElementsMemoryUsed
InboxSizeNumDroppedNumReadsNumWrites
REBOOT
© 2004 Benjamin Ling
Statistical MonitoringStatistical Monitoring
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
PinpointPinpoint
Statistics
Statistics
NumElementsMemoryUsed
InboxSizeNumDroppedNumReadsNumWrites
© 2004 Benjamin Ling
SSM MonitoringSSM Monitoring
N replicated bricks handle read/write requestsN replicated bricks handle read/write requests Cannot do structural anomaly detection!Cannot do structural anomaly detection!
Alternative features (performance, mem usage, etc)Alternative features (performance, mem usage, etc)
Activity statistics: How often did a brick do Activity statistics: How often did a brick do something?something? Msgs received/sec, dropped/sec, etc.Msgs received/sec, dropped/sec, etc.
Same across all peers, assuming balanced workloadSame across all peers, assuming balanced workload
Use anomalies as likely failuresUse anomalies as likely failures
State statistics: Current state of systemState statistics: Current state of system Memory usage, queue length, etc.Memory usage, queue length, etc.
Similar pattern across peers, but may not be in phaseSimilar pattern across peers, but may not be in phase
Look for patterns in time-series; differences in patterns Look for patterns in time-series; differences in patterns indicate failure at a node.indicate failure at a node.
© 2004 Benjamin Ling
Surprising Patterns in Time-SeriesSurprising Patterns in Time-Series
1. Discretize time-series into string. [Keogh]1. Discretize time-series into string. [Keogh]
[0.2, 0.3, 0.4, 0.6, 0.8, 0.2] -> “aaabba”[0.2, 0.3, 0.4, 0.6, 0.8, 0.2] -> “aaabba”
2. Calculate the frequencies of short substrings in the 2. Calculate the frequencies of short substrings in the string.string.
““aa” occurs twice; “ab”, “bb”, “ba” occurs once.aa” occurs twice; “ab”, “bb”, “ba” occurs once.
3. Compare frequencies to normal, look for substrings 3. Compare frequencies to normal, look for substrings that occur much less or much more than normal.that occur much less or much more than normal.
© 2004 Benjamin Ling
OutlineOutline
Motivation: What is Session State?Motivation: What is Session State?
SSM: SSM: Architecture Architecture AlgorithmAlgorithm Backpressure and Admission ControlBackpressure and Admission Control
SSM + PinpointSSM + Pinpoint Self-recovering, self-monitoringSelf-recovering, self-monitoring
BenchmarksBenchmarks
Next steps: Sun Reference AppServer integrationNext steps: Sun Reference AppServer integration
ConclusionConclusion
© 2004 Benjamin Ling
MicrobenchmarksMicrobenchmarks
UC Berkeley Millennium Cluster UC Berkeley Millennium Cluster Six bricks runningSix bricks running
Candidate Write Set = 3, Write quota = 2Candidate Write Set = 3, Write quota = 2
Candidate Read Set = 2Candidate Read Set = 2
State Size = 8KState Size = 8K
© 2004 Benjamin Ling
Induced FaultInduced Fault
One bricked killed Brick restarted by PP
SSM unaffected
© 2004 Benjamin Ling
Memory faultMemory fault
Memory fault detected in hash PP restarts Brick
SSM unaffected
© 2004 Benjamin Ling
Network Fault – 70% packet lossNetwork Fault – 70% packet loss
Network fault injected
Fault detectedBrick killed
PP restarts
Brick
© 2004 Benjamin Ling
Performance FaultPerformance Fault
Performance fault injected
© 2004 Benjamin Ling
MacrobenchmarkMacrobenchmark
TellMe’s Email-By-Phone ApplicationTellMe’s Email-By-Phone Application
Session state stored in memorySession state stored in memory Email header informationEmail header information
Index informationIndex information
Alter application to store session state usingAlter application to store session state using DiskDisk
SSMSSM
© 2004 Benjamin Ling
MacrobenchmarkMacrobenchmark
25% Throughput Degradation compared to in-memory
Throughput preserved compared to disk
© 2004 Benjamin Ling
Future WorkFuture Work
Integrate with Sun’s reference Application Integrate with Sun’s reference Application ServerServer Enterprise benchmarksEnterprise benchmarks
Statistical Anomaly DetectionStatistical Anomaly Detection Too many magic numbersToo many magic numbers
Integrated ROC-J2EE application serverIntegrated ROC-J2EE application server
© 2004 Benjamin Ling
ConclusionConclusion
SSMSSMA Recovery-Friendly, Self-ManagingA Recovery-Friendly, Self-Managing
Session State StoreSession State Store
Benjamin LingBenjamin [email protected]@cs.stanford.edu
http://swig.stanford.edu/http://swig.stanford.edu/
© 2004 Benjamin Ling
Existing solutions :Existing solutions :
File System and DatabasesFile System and Databases Poor failure behaviorPoor failure behavior
Lose data (FS)Lose data (FS)
Slow recovery (Both)Slow recovery (Both)
Difficult to administer (DB)Difficult to administer (DB)
Difficult to tune (both)Difficult to tune (both)
In-memory replication using primary/secondary:In-memory replication using primary/secondary: Performance couplingPerformance coupling
Poor failover (uneven load balancing)Poor failover (uneven load balancing)
© 2004 Benjamin Ling
Other implementation detailsOther implementation details
Garbage collectionGarbage collection
Generational hash tableGenerational hash table Hash table of hash tablesHash table of hash tables Each hash table has an associated time Each hash table has an associated time
rangerange When time has passed, GC that tableWhen time has passed, GC that table
No reference counting, scanning, etc.No reference counting, scanning, etc.
© 2004 Benjamin Ling
SSM: Self-ManagingSSM: Self-Managing
Adaptive:Adaptive: Stub maintains count of maximum allowable in-flight Stub maintains count of maximum allowable in-flight
requests to each brickrequests to each brick Additive increase on successful request Additive increase on successful request Multiplicative decrease on timeoutMultiplicative decrease on timeout
Stubs discover capacity of each brickStubs discover capacity of each brick
Self-TuningSelf-Tuning
Admission controlAdmission control Stubs say “no” if insufficient bricksStubs say “no” if insufficient bricks Propagate backpressure from bricks to clientsPropagate backpressure from bricks to clients
Turn users away under overloadTurn users away under overload
Self-ProtectingSelf-Protecting