30
Optimizing Rails & PostgreSQL

Day 7 - Make it Fast

Embed Size (px)

Citation preview

Optimizing Rails & PostgreSQL

Who are you optimizing for?

UsersResponse Time Limits• 0.1 second

– Limit for having the user feel that the system is reacting instantaneously

– No special feedback is necessary except to display the result

• 1.0 second– About the limit for the user's flow of thought to stay

uninterrupted– User will notice the delay– No special feedback is necessary during delays > 0.1s but <

1.0s– User does lose the feeling of operating directly on the data.

• 10 seconds– About the limit for keeping the user's attention focused on

the dialogue– Users will want to perform other tasks while waiting– Should be given feedback indicating expected completion– Feedback during the delay is more important for variable

response times

Servers

• Planning for scaling

– Database: Hard

– Web Server: Easy

• App response time

• Servers cost money

• Predictable scaling– Minimize traffic based

fluctuations

Anatomy of a Web Request

Reduce Lookup Time

Geographic DNS

Remove Unnecessary Requests

CDN

• Combine & Minify JS and CSS files to limit requests• Use public CDNs for common libraries to leverage browser cache (ex - jQuery via Google)

• Removes library from your rolled JS which shrinks the download on redeploy• Use Image Sprites to minimize requests for multiple images

Remove Server Stress

Dynamic Server CapacityDynamic Load BalancingUnlimited StorageRedundant StorageScalable Disk I/ODistributed Caching

The server request is the bottleneck preventing all other page loads from triggering

Throw Servers at it!

• Band-Aid solution– Translation: Throw servers at it while we spend time fixing / reworking the

bottlenecks– Good for start ups working on Minimum Desirable Product, rapid time to

market and limited/temporary development budget• Start making money on it first and then we’ll spend time on that part

• Marketing mantra of “The Cloud”– Responsible for lots of $$$

• Also supported by laziness• The real question…

– What is the cost of not tuning this?• Some optimizations are very complicated• Some only yield noticeable benefits when under high traffic• Some are simple and will save a lot of “diagnosis” time and scaling time later if they can

be avoided early• Tradeoff: What is the cost of doing it later?

– In many cases, doing it later wins– The trick is identifying the simple, expensive things and getting them out of the way early

Server Constraints

• Performance– Processor (super fast)

– RAM (super fast)

– Disk I/O• Standard Hard Disk (super slow)

• SSD (moderately slow, but no seek time)

– Bandwidth / Network (fast)

• Disk Space

• Disk Reliability

Better Living through Architecture

• Efficient Code and Queries– Lower Processor Usage– Reduce disk I/O– Minimize RAM footprint

• Caching– Reduce disk I/O– Avoid reprocessing the same code/query– Avoid calling and waiting for responses from the same external services– Increase RAM usage

• Background Processes and Queues– Reduce disk I/O– This is a low priority and can wait until other resources aren’t occupied– Predictable processing (no such thing as an overloaded queue)

• Throughput Optimization– Reduce disk I/O– Reduce bandwidth usage– Optimized images: use less disk space, bandwidth– Query only what you need: minimize bandwidth between database and application server

GOAL: Minimize Disk I/O

Architectural Solutions in Rails

• Multi-level Caching– Rack

– View

– Controller

– Model

• Turbo Links– Minimize requests

• Asset Pipeline– Compress JS/CSS

– Push to CDN

• Queuing– Do as much later as possible

• Query Optimization– Make removing JOINs easier

– Automatic request query caching

– Lazy queries• Don’t execute until used

• Offload tasks that can be avoided– Simple eTag integration to use

HTTP Cache

– Offload file serving to the web server

– Don’t process images in Ruby

Removed Caching Styles

Page Caching

• Full page cache

• No login / auth processing

• Great for public pages

• Available in ActionPack– http://bbll.us/1Baa4rG

Action Caching

• Full page cache at the Controller#Action level

• Allows before_filters to run for authentication

• Available in ActionPack– http://bbll.us/1oAQk85

Why were those removed?

• Remember me telling you how nothing really existed before?

• The method described in that post can be summarized as the following:

– “Use Memcached as it was designed”

• Rails did add things to make it easier though

Because DHH…that’s why

From the Rails docs…

“Action Caching has been removed from Rails 4. See the actionpack-action_caching gem. See DHH's key-based cache expiration overview for the newly-preferred method.”

Memcached Primer

• In RAM key-value cache

• Allows atomic updates (increment/decrement)

• Automatically expires old data based on “most recently read”

• Can be clustered across tons of machines

• dalli gem is currently the best Memcachedclient w/ Rails

– Can also be used for session management• Risky practice…

– kgio gem• async functionality to dalli

• slight speed boost

The “New” Way

http://bbll.us/1pmfmYF

Use an “updated_at” based cache key

Use ActiveRecord dependencies to ensure associated records are updated down the chain

Approach depends TOTALLY on Memcached to avoid having to cleanup old data. You will probably have Memcached available when you need this but it’s something to be aware of…

Prior to that, Rails used Sweepers

• Sweepers simplify cache invalidation• Have the callbacks of both Controllers AND Models• Sweeper knows what it is supposed to observe and

injects itself into those callbacks paths• Allowed centralizing cache management across

multiple data code points• Also useful for non-cache related code such as

tracking controller only user data

class ListSweeper < ActionController::Caching::Sweeper

observe List, Item

def after_save(record)

list = record.is_a?(List) ? record : record.list

expire_page(:controller => "lists",

:action => %w( show public feed ),

:id => list.id)

expire_action(:controller => "lists",

:action => "all")

list.shares.each do |share|

expire_page(:controller => "lists",

:action => "show",

:id => share.url_key)

end

end

end

Can still be found herehttp://bbll.us/1lxaNzt

View Fragment Caching

<% cache do %>

All available products:

<% some_products.each do |t| %>

# Display thing HTML here

<% end %>

<% end %>

NOTE: Even though `some_products` is

set in the controller, the query

doesn’t run until the loop starts

thanks to lazy queries.

Prevents having to manage queries

separately from view OR having to

ensure the query origin is within

the view cache.

# Or specify the name

cache('all_available_products’)

# Or pieces of the name

cache(action: 'recent',

action_suffix: 'all_products’)

# Or conditionally

cache_if (condition,

cache_key_for_products)

# Expire with

expire_fragment(controller:

'products', action: 'recent',

action_suffix: 'all_products')

Model Caching / Tuning

• Per request query cache– Avoid running the same query multiple times

• counter_cache– Update related count data to avoid COUNT(*) queries on sets later– class Order < ActiveRecord::Base

belongs_to :customer, counter_cache: true # Update order_countend

– Counter cache on crack with counter_culture• https://github.com/magnusvk/counter_culture

• .includes(:related_records, :other_related)– Eager loading– Fetch associated record types as a group– Avoid N+1 problem– Use the bullet gem in development to quickly red flag these avoidable

problems• https://github.com/flyerhzm/bullet

Fetch

# Look for cache with key ‘my_cache_key’

# If not found, run the code to create it

# and then return it

thing = Rails.cache.fetch(’my_cache_key',

expires_in: 15.minutes) do

# build and return my data structure

end

Application Level RAM Cache

• CAREFUL with this one…thread safety matters• ONLY for use with data that is accessed on virtually every request and rarely

changes• Store in an variable that will persist across requests• Use ||= to set only if not already set• Data will last until process reboot

class MyThing < ActiveRecord::Base

def self.cached_things

@things ||= Rails.cache.fetch(

’my_cache_key',

expires_in: 15.minutes) do

# build and return my data structure

end

end

end

Controller Caching

• Use etag / http level caching

class ProductsController < ApplicationController

def show

@product = Product.find(params[:id])

# If the request is stale according to the given timestamp and etag value

# (i.e. it needs to be processed again) then execute this block

if stale?(last_modified: @product.updated_at.utc, etag: @product.cache_key)

respond_to do |wants|

# ... normal response processing

end

end

end

End

# OR

stale?(@product)

# OR

fresh_when last_modified: @product.published_at.utc, etag: @product

Asset Pipeline

• Compression JS/CSS# Single request for App CSS

//mydomain.com/assets/application-6d810…f396d4.css

# Single request for App JS

//mydomain.com/assets/application-6d810…f396d4.js

• Sync with CDN– On deploy, push images, js, css to S3 / Cloudfiles– Automatically direct asset URLs to those paths– Not necessary with a CDN that allows specifying origin server

• Amazon Cloud Front• Fastly

– Gem https://github.com/rumblelabs/asset_sync

• Shared CDN libraries//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js

– Don’t wrap common libraries in app JS– Don’t make users re-download – https://github.com/kenn/jquery-rails-cdn

Turbo Links

• Similar benefits to CDN

• Only request NEW things

– Don’t request CSS, JS, or Images again

– Inject HTML body into the page

– Skip even CHECKING for changes to those files

Offloading things

• Serve static files with Apache/nginx

config.action_dispatch.x_sendfile_header = "X-Sendfile" # for apache

config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect' # for nginx

• Don’t process images in ruby

– RMagick / ImageScience = BAD

– MiniMagick = GOOD

• Command line image processing with imagemagick

• Runs OUTSIDE of your Rails processes

Rack Mini-ProfilerTime queries, response times, requests, AJAX requests and redirects

https://github.com/MiniProfiler/rack-mini-profiler

Queuing…do it later

Rails < 4.2

• Pick a queue

• Generally same interfaces

• Switching is fairly easy for the most part

• Monkey Patches make “do this queue function with this queue system” simple in most cases

• Delay email with MailHopper

Rails 4.2 >=• ActiveJob

– Queuing Standard– Top level delayed email– http://bbll.us/W70uGc

• Supported Queues– Backburner– Delayed Job– Que– Queue Classic– Resque– Sidekiq– Sneakers– Sucker Punch

Overview of Queues

• Delayed Job– Database backed (non-specific)

• Queue Classic– PG backed

• Uses LISTEN/NOTIFY• Enforces perfect job locking

– Forking Processes• Lower Concurrency• High stability

– Built and used by Heroku

• Que– Newer. Keep an eye on it.– PG backed

• Uses Advisory Locks• Ultra fast perfect job locking

– 10,000 vs 300 jobs / sec

– Threaded• High concurrency

• Qu– Multi-backend (Redis / Mongo)– Tries to overcome Resque/DJ issues

• Resque– Redis backed (fast writes)– Forking

• Sidekiq– Current Rock Star– Easily migrate from Resque– Redis backed (fast writes)– Threaded

• Sneakers– RabbitMQ

• Sucker Punch– Background threads in current process

• Beanstalker– beanstalkd deep integration– Not included in ActiveJob…yet

• Backburner– beanstalkd simple integration– Included in ActiveJob

What if your site is attacked?

• Undesired request traffic– Site vulnerability scans– Brute force password attacks– Denial of Service

• Identify and stop as early as possible

• 3rd party services built for this too– CloudFlare– Incapsula

• Rack::Protection– Built in protection against

common attack vectors

• Rack::Attack!!!– Handle in Rack layer– Whitelist

• Manual

– Blacklist• Manual• Fail2Ban• Allow2Ban

– Throttle• Request / Time Period

– Track• Track certain types of requests

– Use RAM based cache• No Disk I/O

What about PostgreSQL?

• Query Optimization– Varies based on needs– Multi-index queries (automatic)– Simpler to define indexes

• Conditional queries allow smaller indexes– Don’t index things we don’t

need

• Schemas to segment your data into smaller indexes

• Concurrent index creation• Advisory Locks• Table partitioning

• LISTEN / NOTIFY– Avoid polling, make use of

TRIGGERS– Built in workers (experimental)

• Hook directly to Memcached– http://bbll.us/1tXNJeM

• Write Ahead Log– Streaming Backup

• https://github.com/wal-e/wal-e

– Streaming Replication– Extensible Clustering

• Postgres-XC

• Materialized Views– Cached tables– Refresh with Triggers

Extreme Tune Up…Whaaaaaa!

• pgtune

– Suggest configuration based on the machine specs

• pg_partman

– Manage partitions

• Many more…

• Biggish Data w/ Rails & PostgreSQL

– http://bit.ly/biggish-data

– http://bbll.us/1reQBms

Pagination done right

Overview• Limit the WHERE scope• Only sort returned results• Smaller data set size• Must know “previous” page info for query• Ideal for “infinite scroll”

Great slides

http://bbll.us/1sVeY6Q

Order Query Gem

http://bbll.us/1B962QB