JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software

Preview:

DESCRIPTION

 

Citation preview

The Economies of Scaling SoftwareAbdelmonaim Remani@PolymathicCoder

Creative Commons Attribution Non-Commercial License 3.0 Unported

The graphics and logos in this presentation belong to their rightful owner

• Platform Architect at just.me Inc.

• JavaOne RockStar and frequent speaker at many developer events and

conferences including JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc...

• Open-source advocate and contributor

• Active Community member• The NorCal Java User Group• The Silicon Valley Dart Meetup

Bio:

http://about.me/PolymathicCoder

Twitter: @PolymathicCoder

Email: abdelmonaim.remani

@gmail.com

SlideShare:

http://www.slideshare.net/PolymathicCoder/

| @PolymathicCoder

About Me

Follow @PolymathicCoder

http://speakerscore.com/jazoon-scalability

• The Economies of Scale

• “In microeconomics, economies of scale are the cost

advantages that enterprises obtain due to size [...] often

operational efficiency is [...] greater with increasing

scale [...]” -Wikipedia

| @PolymathicCoder

The Title of the Talk

Let’s Go!

• Only the enterprise worried about scalability• The rise of social and the abundance of mobile

• An exponential growth of internet traffic• The creation of a spoiled user-base

• I want to see the closest Moroccan restaurants to my current location on a map along with consumer ratings and whether any of my friends has recently checked-in in the last 30 days

• The lines are blurred between consumer applications and the enterprise applications

| @PolymathicCoder

Blurred Lines…

Scalability is everyone’s problem…

| @PolymathicCoder

The Bar Is Higher!

What is

Scalability?

• The ability of an application to handle an increasing amount of work without performance degradation

• Not a good definition! It implies:• You’ll need to scale forever

• Scalability is relative; It is bound by one’s specific needs

• You’ll need to be fully scalable from day one• Scalability is evolutionary; It is a gradual process

• There are no external constraints• Unrealistic

| @PolymathicCoder

The Common Definition

• The ability of an application to gracefully evolve within the constraints of its ecosystem in order to handle the maximum potential amount of work without performance degradation

• Work?• Simultaneous requests

• Performance degradation?• Increased latency or decreased throughput

| @PolymathicCoder

A Better Definition

• Don’t be surprised if• Your application supports one

million users• You add one more feature• 500,000 user load crashes your

system or renders it unusable

| @PolymathicCoder

A Black Art!

Latency Is Your Enemy

• To scale is to reduce latency• To reduce latency is to address bottlenecks• To scale is to address bottlenecks

• The usual suspects• The CPU• The Storage I/O• The Network I/O

• Inter-related

| @PolymathicCoder

Syllogismo

OvercomingThe CPU

Bottleneck

• Nothing affects the CPU more than the instructions it is summoned to execute

• This is about your application• How it is written (Architecture, code base,

etc..)• How it is deployed

| @PolymathicCoder

Overcoming the CPU Bottleneck

A Scalable

Architecture

• “Things that people perceive as hard-to-change” -Martin Flower• http://martinfowler.com/ieeeSoftware/

whoNeedsArchitect.pdf• Decision you commit to; the ones that will be

stuck with you forever

| @PolymathicCoder

Architecture?

• Choose the right technologies• Platform• Languages

• Frameworks• Libraries

• Make the right abstractions• Loosely-coupled components

• Functional abstractions• Technical abstractions• Make sure that the latter is subordinate to the former and not the

other way around

| @PolymathicCoder

Be Wise… Think Twice…

Write Good

Code

• Think your algorithms through and mind their complexity (Asymptotic Complexity, Cyclomatic Complexity, etc…)

• SOLIDify your design• Single Responsibility, Open-Closed, Liskov Substitution,

Interface Segregation, and Dependency Inversion• Understand the limitation of your technology and

leverage its strengths

| @PolymathicCoder

Write Good Code

• Obsess with testing• TDD/BDD

• Tools• Static code analyzers (PMD, FindBugs, etc…)• Profilers (Detect memory leaks, bottlenecks, etc…)

• Etc…

| @PolymathicCoder

Quality… Quality…

Quality!

• Read• The Classics (The Mythical Man-Mouth, etc…)• GoF’s “Design Patterns”• Eric Evans’ “Domain-Driven Design”• Every book by Martin Fowler• Uncle Bob’s “Clean Code”• Josh Bloch’s “Effective Java”• Brian Goetz’s “Java Concurrency in Practice”• Tech Papers/Blogs• Etc...

| @PolymathicCoder

Know Thy S#!t

The

Inevitable

You’ll end up with…

At best…The fading tradition of making cow dung piles

http://news.ukpha.org/2011/01/the-fading-tradition-of-making-cow-dung-piles/

| @PolymathicCoder

You do all that…

| @PolymathicCoder

Still better than…

• What is it?• The quick-and-dirty you are not proud of• What you would have done differently haven't you had

time• It’s a matter of time before it starts to smell really

bad• What to do?

• The fact you recognize it as debt is good thing in itself• Keep tabs and refactor often• Cut the right corners

• Don’t mortgage architecture (Don’t lock yourself out) | @PolymathicCoder

Technical Debt

Write Code That Scales

Up

• Vertical Scaling (Scaling Up)• On a single-node system• Adding more computing resources to the node (Getting

a beefier machine)• Writing code to harness the full power of the one

node

| @PolymathicCoder

Vertical Scaling

• Writing concurrent code of simultaneously executing code

• Simple business logic within containers is already multi-threaded

• Executing complex business logic within a reasonable time

• Break it into smaller steps• Execute them in parallel• Aggregate data back

| @PolymathicCoder

Parallelism At The Node Level

• Moore’s Law• Performance gain is automatically realized by software

(Code is faster on faster hardware)• Nothing is forever…

• The era of the multi-core chip• We need to write code to take advantage of all

cores

| @PolymathicCoder

Easier Said Than Done…

• Synchronize state across threads across multiple cores• Good luck!

• Relay on frameworks and libraries (Fork/Join, Akka, etc…)

• Go immutable• Not always straightforward or possible

• Go functional (Scala, Clojure, etc…)

| @PolymathicCoder

Easier Said Than Done…

• Amdahl’s Law• Throwing more cores does not necessarily result in

performance gain• Diminishing return at some point no matter how many

cores you throw in

| @PolymathicCoder

It Gets More Interesting…

• Leverage Probabilistic data structures and algorithms• Bloom Filters, Quotient filters, etc…

• Go Reactive• http://www.reactivemanifesto.org/• RxJava, Spring Reactor, etc…

| @PolymathicCoder

Miscellaneous

Write Code That Scales

Out

• Horizontal Scaling• On a distributed system (A cluster)• Adding more nodes

• Writing code to harness the full power of the cluster

| @PolymathicCoder

Horizontal Scaling

• A typical cluster consists of• A number of identical application server nodes behind

a load balancer

| @PolymathicCoder

Topology

• A typical cluster consists of• A number of identical application server nodes behind

a load balancerA number?

• It depends on how many you actually need and can afford

• Elastic Scaling / Auto-Scaling• The number of live nodes within the cluster shrinks and

grows depending on the load• New ones are provisioned or terminated as needed

| @PolymathicCoder

Topology

• A typical cluster consists of• A number of identical application server nodes behind

a load balancerIdentical?

• Application nodes are cloned off of image files (Ex. AWS Ec2 AMIs, etc...)

• Configuration Management tool (Chef, Puppet, Salt, etc...)

| @PolymathicCoder

Topology

• A typical cluster consists of• A number of identical application server nodes behind

a load balancer

Load balancer?• Load is evenly distributed across live nodes

according to some algorithm (Round-Robin typically)

| @PolymathicCoder

Topology

• Session data• Session Replication• Session Affinity / Sticky Session

• Requests from the same client are routed to the same node

• When the node dies, the session data dies with it• Shared Session / Distributed Session

• Session data is in a “centralized” location• Go Stateless

• No session data (Any node would do)

| @PolymathicCoder

Managing State

• Leverage Map/Reduce• “A programming model for processing large

data sets with a parallel, distributed algorithm on a cluster”

• Apache Hadoop

| @PolymathicCoder

Parallelism At The Cluster Level

• How to HTTPS?• End at load balancer• Wildcard SSL

• Distributed Lock Manager (DLM)• Synchronize access to shared resources

• (Google Chubby, Apache Zookeeper, etc…)• Distributed Transactions

• X/Open XA

| @PolymathicCoder

Miscellaneous

Deployment

• Multiple Environments• Development, Test, Stage, and Production• Automatic Configuration Management

• Practice Continuous Delivery• Leverage The Cloud

• IaaS, PaaS, SaaS, and NaaS

| @PolymathicCoder

Deployment

OvercomingThe Storage

I/OBottleneck

• The storage I/O is usually the most significant

| @PolymathicCoder

The Storage I/O Bottleneck

The Persistent

Datastore

• Relational of course!• Normalized schema guaranteeing data integrity• ACID Transactions• No biased towards specific access patterns• Flexible query language

• As datasets grow• Scale up (Buy beefier machines)• Database tuning / query optimization• Create materialized views• De-normalize• Etc…

| @PolymathicCoder

What Datastore to Use?

• No other choice but scaling out RDBMS• Master/Slave clusters• Sharding

• Failed big time!• RDBMS is designed to run on one machine• Eric Brewer’s CAP Theorem of distributed systems

• Pick 2 out of 3: Consistency, Availability, and Partition Tolerance

• The relational model is designed to favor CA, hence can never support P

| @PolymathicCoder

Mucho Data!

• A wide range of specialized datastores with the goal of addressing the challenges of the relational model

• “The whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for” –Eric Evans

• A wide variety• Key-Value Datastores• Columnar Datastores• Document Datastores• Graph Datastores

| @PolymathicCoder

NoSQL

• Within the application• Data is complex and accessed in many different ways• Why should we fit it into one storage model?

• Polyglot Persistence is about• Leveraging multiple data stores based on the specific

way the data is stored and accessed• For more info:

• Checkout my talk on YouTube from JAX Conf 2012• “The Rise of NoSQL and Polyglot Persistence”

• http://bit.ly/PCWtWi

| @PolymathicCoder

Polyglot Persistence

Caching

• A cache is typically a simple key-value data structure• Instead of incurring the overhead of data retrieval or

computation every time, you check the cache first• You can’t cache everything, caches can be configured to

use multiple algorithms depending on the use case (LRU, LFU, Bélády's Algorithm, etc...)

• Use aggressively!• What to cache?

• Frequently accessed data (Session data, feeds, etc…)• Results of intensive computations

| @PolymathicCoder

Caching

• Where to cache?• On disk

• File System: Slow and sequential access• DB: A bit better (Data is arranged in structures

designed for efficiant access, indexes, etc…)• Generally a terrible idea (SSDs make things a bit

better)• In-Memory: Fast and random access, but volatile• Something in between: Persistence caches (Redis,

etc…)• What type of cache?

• Local, Replicated, Distributed, and Clustered| @PolymathicCoder

Caching

• How to cache?• Most caches implement a very simple interface• Always attempt to get from cache first using a key

• If it is a hit, you saved yourself the overhead• If it is a miss, compute or read from the data store then

put in cache for subsequent gets• When you update you can evict stale data• You can set a TTL when you put

• Many other common operations...

| @PolymathicCoder

Caching

• Caching Query Results• Key: Hash of the query itself• How about parameterized queries?

• Key: Hash of the query itself + Hash of parameter values

• Method/Function Memoization• Key: Method name• How methods with parameters?

• Key: Hash of the method name + Hash of parameter values

• Caching Objects• Key: Identity of the object| @PolymathicCoder

Caching Patterns

• Time-series datasets (Ex. Real-time feed)• Most of the time pseudo/near real-time is enough• Use caching to throttle access to resources

• Cache query result with a t expiry• Fresh data is only read every t

| @PolymathicCoder

Caching Patterns

• Profile your code to assess what to cache, and whether you need to to begin with

• Stale state might bite you hard• Incoherence: Inconsistent copies of objects cached with

multiple keys• Stale nested aggregates

• Network overhead of misses might outweighs the performance gain of hits

• Consider writing/updating cache when writing/updating the persistence store

| @PolymathicCoder

Caching Gotchas

• EhCache• Memcahed• Oracle Coherence• Redis

• A persistence NoSQL datastore• Built-in data structures like sets and lists• Supports intelligent keys and namespaces

| @PolymathicCoder

Featured Solutions

OvercomingThe Network

I/OBottleneck

• The Network I/O is can bring you down as much

| @PolymathicCoder

The Network I/O

Bottleneck

Asynchronous

Processing

• Resource-intensive tasks cannot be handled practically during an HTTP session

• Synchronous processing is overused and not necessary most of the time

| @PolymathicCoder

Asynchronous Processing

• Pseudo-Asynchronous Processing• Flow

• Process data / operations in advance• User requests data or operation• Respond synchronously with pre-processed result

• Sometimes not possible (Dynamic content, etc...)

| @PolymathicCoder

Asynchronous Processing Patterns

• True Asynchronous Processing• Flow

• User request data or operation• Acknowledge

• Ex. A REST that return an “202 Accepted” HTTP status code

• Do Processing at your own convenience• Allow the user to check progress

• Optionally notify when processing is completed

| @PolymathicCoder

Asynchronous Processing Patterns

• Leverage Job/Work/Task Queues• JMS (Java Messaging Service) – JSR 914• AMQP (Advanced Message Queuing Protocol): RabbitMQ, ActiveMQ,

etc…• AWS SQS• Redis Lists• Etc…

• Task Scheduling• Jobs triggered periodically (Cron, Quartz, etc…)

• Batch Processing

| @PolymathicCoder

Techniques

Content Delivery

Network

• Static content• Binary (Video, Audio, etc…)• Web objects (HTML, JavaScript, CSS, etc…)

• Do NOT serve through your application server• Use a CDN

• “A large distributed system of servers deployed in multiple data centers across the internet”• Akamai• AWS CloudFront

| @PolymathicCoder

Content Delivery Network (CDN)

• Dirty Caches• script.js is a script file deployed on CDN• Multiple copies of script.js will be replicated across all

edge nodes of the CDN• Clients/browsers will their own copies of script.js locally• We update script.js• Since the new and old version have the same URI

• New clients will be served the old version by the CDN

• Old clients will continue to use the old version from their local cache

| @PolymathicCoder

CDN Gotchas

• Dirty Caches• What to do?

• Simply append version number to file names• script-v1.js, script-v2.js, etc…

• Force invalidation of all copies on edge nodes• Set HTTP caching headers properly

| @PolymathicCoder

CDN Gotchas

Domain Name

Service

• Do NOT rely on your free domain name registrar DNS• Use a scalable DNS solution

• AWS Route 53• DynECT• UltraDNS• Etc…

• Domain Sharding• Browsers limit the number of connections per host (Max of 6 usually)

• Creating multiple subdomains (CNAME entries) allow for more resources to be downloaded in parallel

• Watch out for: DNS lookup overhead, HTTPS cost, Browser’s Same-Origin Policy, etc…

| @PolymathicCoder

Domain Name Service (DNS)

Remoting

• In a SOA (Service Oriented Architecture)• RPC calls to multiple services• Data Exchange (Plain vs. Binary)

• SOAP / REST with XML or JSON• Google Protocol Buffers, Apache Thrift, Apache Avro,

etc…• Protocol

• JMS• HTTP• SPDY

| @PolymathicCoder

Remoting

QualifyingScalability

• Instrumentation: Bake it into the code early• Monitoring

• Health (Application / Infrastructure)• Key Performance Indicators (KPIs)

• Number of request handled, throughput, latency, Apdex Index, etc ...

• Logs• Testing

• Load/Stress testing

| @PolymathicCoder

Qualifying Scalability

Disaster Recovery

• Goal• Fault-tolerant system• Restore service and recover data ASAP in case of a

disaster• Be proactive

• Develop a Disaster Recovery Plan (DRP)• Practice and test your DRP by doing failure drills

| @PolymathicCoder

When Disaster Hits…

Scaling Teams

• Hiring• Always hire top talent

• You are as strong as your weakest link• Develop a process to bring people in

• Turnkey Hardware/Software Setup (Vagrant, etc...)• Arrange for proper access/accounts

• Develop a knowledge base (Architecture documentation, FAQs, etc...)

• Development Process• Be Agile• Refine in the spirit of Six Sigma

| @PolymathicCoder

Scaling Teams

• Team Structure• Small is good• Form ad-hoc teams from pools of Agile breeds

• Product Owners• Team Members

• Team Lead (Scrum Master)• Engineers• QAs

• Architecture Owners

• Give them ownership of their DevOps

| @PolymathicCoder

Scaling Teams

The Take-home

• The early-bird gets the worm• Design to scale from day one• Plan for capacity early

• Your needs determine how scalable “your scalable” needs to be• Do not over-engineer

• Do not bite more than you can chew• Building scalable system is process

• Commit to a road map around bottlenecks• Guided by planned business features

• Learn from others’ experiences (Twitter, Netflix, etc...) | @PolymathicCoder

The Take-home Message

Work smarter not harder…

| @PolymathicCoder

Take it slow… You’ll get there…

Questions?

Thanks for the attention!

Follow @PolymathicCoder

abdelmonaim.remani@gmail.comhttp://blog.polymathiccoder.com

http://speakerscore.com/jazoon-scalability

Recommended