Upload
barry-jones
View
162
Download
4
Tags:
Embed Size (px)
Citation preview
Who are you optimizing for?
UsersResponse Time Limits• 0.1 second
– Limit for having the user feel that the system is reacting instantaneously
– No special feedback is necessary except to display the result
• 1.0 second– About the limit for the user's flow of thought to stay
uninterrupted– User will notice the delay– No special feedback is necessary during delays > 0.1s but <
1.0s– User does lose the feeling of operating directly on the data.
• 10 seconds– About the limit for keeping the user's attention focused on
the dialogue– Users will want to perform other tasks while waiting– Should be given feedback indicating expected completion– Feedback during the delay is more important for variable
response times
Servers
• Planning for scaling
– Database: Hard
– Web Server: Easy
• App response time
• Servers cost money
• Predictable scaling– Minimize traffic based
fluctuations
Remove Unnecessary Requests
CDN
• Combine & Minify JS and CSS files to limit requests• Use public CDNs for common libraries to leverage browser cache (ex - jQuery via Google)
• Removes library from your rolled JS which shrinks the download on redeploy• Use Image Sprites to minimize requests for multiple images
Remove Server Stress
Dynamic Server CapacityDynamic Load BalancingUnlimited StorageRedundant StorageScalable Disk I/ODistributed Caching
The server request is the bottleneck preventing all other page loads from triggering
Throw Servers at it!
• Band-Aid solution– Translation: Throw servers at it while we spend time fixing / reworking the
bottlenecks– Good for start ups working on Minimum Desirable Product, rapid time to
market and limited/temporary development budget• Start making money on it first and then we’ll spend time on that part
• Marketing mantra of “The Cloud”– Responsible for lots of $$$
• Also supported by laziness• The real question…
– What is the cost of not tuning this?• Some optimizations are very complicated• Some only yield noticeable benefits when under high traffic• Some are simple and will save a lot of “diagnosis” time and scaling time later if they can
be avoided early• Tradeoff: What is the cost of doing it later?
– In many cases, doing it later wins– The trick is identifying the simple, expensive things and getting them out of the way early
Server Constraints
• Performance– Processor (super fast)
– RAM (super fast)
– Disk I/O• Standard Hard Disk (super slow)
• SSD (moderately slow, but no seek time)
– Bandwidth / Network (fast)
• Disk Space
• Disk Reliability
Better Living through Architecture
• Efficient Code and Queries– Lower Processor Usage– Reduce disk I/O– Minimize RAM footprint
• Caching– Reduce disk I/O– Avoid reprocessing the same code/query– Avoid calling and waiting for responses from the same external services– Increase RAM usage
• Background Processes and Queues– Reduce disk I/O– This is a low priority and can wait until other resources aren’t occupied– Predictable processing (no such thing as an overloaded queue)
• Throughput Optimization– Reduce disk I/O– Reduce bandwidth usage– Optimized images: use less disk space, bandwidth– Query only what you need: minimize bandwidth between database and application server
GOAL: Minimize Disk I/O
Architectural Solutions in Rails
• Multi-level Caching– Rack
– View
– Controller
– Model
• Turbo Links– Minimize requests
• Asset Pipeline– Compress JS/CSS
– Push to CDN
• Queuing– Do as much later as possible
• Query Optimization– Make removing JOINs easier
– Automatic request query caching
– Lazy queries• Don’t execute until used
• Offload tasks that can be avoided– Simple eTag integration to use
HTTP Cache
– Offload file serving to the web server
– Don’t process images in Ruby
Removed Caching Styles
Page Caching
• Full page cache
• No login / auth processing
• Great for public pages
• Available in ActionPack– http://bbll.us/1Baa4rG
Action Caching
• Full page cache at the Controller#Action level
• Allows before_filters to run for authentication
• Available in ActionPack– http://bbll.us/1oAQk85
Why were those removed?
• Remember me telling you how nothing really existed before?
• The method described in that post can be summarized as the following:
– “Use Memcached as it was designed”
• Rails did add things to make it easier though
Because DHH…that’s why
From the Rails docs…
“Action Caching has been removed from Rails 4. See the actionpack-action_caching gem. See DHH's key-based cache expiration overview for the newly-preferred method.”
Memcached Primer
• In RAM key-value cache
• Allows atomic updates (increment/decrement)
• Automatically expires old data based on “most recently read”
• Can be clustered across tons of machines
• dalli gem is currently the best Memcachedclient w/ Rails
– Can also be used for session management• Risky practice…
– kgio gem• async functionality to dalli
• slight speed boost
The “New” Way
http://bbll.us/1pmfmYF
Use an “updated_at” based cache key
Use ActiveRecord dependencies to ensure associated records are updated down the chain
Approach depends TOTALLY on Memcached to avoid having to cleanup old data. You will probably have Memcached available when you need this but it’s something to be aware of…
Prior to that, Rails used Sweepers
• Sweepers simplify cache invalidation• Have the callbacks of both Controllers AND Models• Sweeper knows what it is supposed to observe and
injects itself into those callbacks paths• Allowed centralizing cache management across
multiple data code points• Also useful for non-cache related code such as
tracking controller only user data
class ListSweeper < ActionController::Caching::Sweeper
observe List, Item
def after_save(record)
list = record.is_a?(List) ? record : record.list
expire_page(:controller => "lists",
:action => %w( show public feed ),
:id => list.id)
expire_action(:controller => "lists",
:action => "all")
list.shares.each do |share|
expire_page(:controller => "lists",
:action => "show",
:id => share.url_key)
end
end
end
Can still be found herehttp://bbll.us/1lxaNzt
View Fragment Caching
<% cache do %>
All available products:
<% some_products.each do |t| %>
# Display thing HTML here
<% end %>
<% end %>
NOTE: Even though `some_products` is
set in the controller, the query
doesn’t run until the loop starts
thanks to lazy queries.
Prevents having to manage queries
separately from view OR having to
ensure the query origin is within
the view cache.
# Or specify the name
cache('all_available_products’)
# Or pieces of the name
cache(action: 'recent',
action_suffix: 'all_products’)
# Or conditionally
cache_if (condition,
cache_key_for_products)
# Expire with
expire_fragment(controller:
'products', action: 'recent',
action_suffix: 'all_products')
Model Caching / Tuning
• Per request query cache– Avoid running the same query multiple times
• counter_cache– Update related count data to avoid COUNT(*) queries on sets later– class Order < ActiveRecord::Base
belongs_to :customer, counter_cache: true # Update order_countend
– Counter cache on crack with counter_culture• https://github.com/magnusvk/counter_culture
• .includes(:related_records, :other_related)– Eager loading– Fetch associated record types as a group– Avoid N+1 problem– Use the bullet gem in development to quickly red flag these avoidable
problems• https://github.com/flyerhzm/bullet
Fetch
# Look for cache with key ‘my_cache_key’
# If not found, run the code to create it
# and then return it
thing = Rails.cache.fetch(’my_cache_key',
expires_in: 15.minutes) do
# build and return my data structure
end
Application Level RAM Cache
• CAREFUL with this one…thread safety matters• ONLY for use with data that is accessed on virtually every request and rarely
changes• Store in an variable that will persist across requests• Use ||= to set only if not already set• Data will last until process reboot
class MyThing < ActiveRecord::Base
def self.cached_things
@things ||= Rails.cache.fetch(
’my_cache_key',
expires_in: 15.minutes) do
# build and return my data structure
end
end
end
Controller Caching
• Use etag / http level caching
class ProductsController < ApplicationController
def show
@product = Product.find(params[:id])
# If the request is stale according to the given timestamp and etag value
# (i.e. it needs to be processed again) then execute this block
if stale?(last_modified: @product.updated_at.utc, etag: @product.cache_key)
respond_to do |wants|
# ... normal response processing
end
end
end
End
# OR
stale?(@product)
# OR
fresh_when last_modified: @product.published_at.utc, etag: @product
Asset Pipeline
• Compression JS/CSS# Single request for App CSS
//mydomain.com/assets/application-6d810…f396d4.css
# Single request for App JS
//mydomain.com/assets/application-6d810…f396d4.js
• Sync with CDN– On deploy, push images, js, css to S3 / Cloudfiles– Automatically direct asset URLs to those paths– Not necessary with a CDN that allows specifying origin server
• Amazon Cloud Front• Fastly
– Gem https://github.com/rumblelabs/asset_sync
• Shared CDN libraries//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js
– Don’t wrap common libraries in app JS– Don’t make users re-download – https://github.com/kenn/jquery-rails-cdn
Turbo Links
• Similar benefits to CDN
• Only request NEW things
– Don’t request CSS, JS, or Images again
– Inject HTML body into the page
– Skip even CHECKING for changes to those files
Offloading things
• Serve static files with Apache/nginx
config.action_dispatch.x_sendfile_header = "X-Sendfile" # for apache
config.action_dispatch.x_sendfile_header = 'X-Accel-Redirect' # for nginx
• Don’t process images in ruby
– RMagick / ImageScience = BAD
– MiniMagick = GOOD
• Command line image processing with imagemagick
• Runs OUTSIDE of your Rails processes
Rack Mini-ProfilerTime queries, response times, requests, AJAX requests and redirects
https://github.com/MiniProfiler/rack-mini-profiler
Queuing…do it later
Rails < 4.2
• Pick a queue
• Generally same interfaces
• Switching is fairly easy for the most part
• Monkey Patches make “do this queue function with this queue system” simple in most cases
• Delay email with MailHopper
Rails 4.2 >=• ActiveJob
– Queuing Standard– Top level delayed email– http://bbll.us/W70uGc
• Supported Queues– Backburner– Delayed Job– Que– Queue Classic– Resque– Sidekiq– Sneakers– Sucker Punch
Overview of Queues
• Delayed Job– Database backed (non-specific)
• Queue Classic– PG backed
• Uses LISTEN/NOTIFY• Enforces perfect job locking
– Forking Processes• Lower Concurrency• High stability
– Built and used by Heroku
• Que– Newer. Keep an eye on it.– PG backed
• Uses Advisory Locks• Ultra fast perfect job locking
– 10,000 vs 300 jobs / sec
– Threaded• High concurrency
• Qu– Multi-backend (Redis / Mongo)– Tries to overcome Resque/DJ issues
• Resque– Redis backed (fast writes)– Forking
• Sidekiq– Current Rock Star– Easily migrate from Resque– Redis backed (fast writes)– Threaded
• Sneakers– RabbitMQ
• Sucker Punch– Background threads in current process
• Beanstalker– beanstalkd deep integration– Not included in ActiveJob…yet
• Backburner– beanstalkd simple integration– Included in ActiveJob
What if your site is attacked?
• Undesired request traffic– Site vulnerability scans– Brute force password attacks– Denial of Service
• Identify and stop as early as possible
• 3rd party services built for this too– CloudFlare– Incapsula
• Rack::Protection– Built in protection against
common attack vectors
• Rack::Attack!!!– Handle in Rack layer– Whitelist
• Manual
– Blacklist• Manual• Fail2Ban• Allow2Ban
– Throttle• Request / Time Period
– Track• Track certain types of requests
– Use RAM based cache• No Disk I/O
What about PostgreSQL?
• Query Optimization– Varies based on needs– Multi-index queries (automatic)– Simpler to define indexes
• Conditional queries allow smaller indexes– Don’t index things we don’t
need
• Schemas to segment your data into smaller indexes
• Concurrent index creation• Advisory Locks• Table partitioning
• LISTEN / NOTIFY– Avoid polling, make use of
TRIGGERS– Built in workers (experimental)
• Hook directly to Memcached– http://bbll.us/1tXNJeM
• Write Ahead Log– Streaming Backup
• https://github.com/wal-e/wal-e
– Streaming Replication– Extensible Clustering
• Postgres-XC
• Materialized Views– Cached tables– Refresh with Triggers
Extreme Tune Up…Whaaaaaa!
• pgtune
– Suggest configuration based on the machine specs
• pg_partman
– Manage partitions
• Many more…
• Biggish Data w/ Rails & PostgreSQL
– http://bit.ly/biggish-data
– http://bbll.us/1reQBms
Pagination done right
Overview• Limit the WHERE scope• Only sort returned results• Smaller data set size• Must know “previous” page info for query• Ideal for “infinite scroll”
Great slides
http://bbll.us/1sVeY6Q
Order Query Gem
http://bbll.us/1B962QB