Upload
rowan-hick
View
4.264
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presentation given to Toronto Rails Project Night, performance tips for ActiveRecord usage
Citation preview
How to avoid hanging yourself with Rails
Using ActiveRecord right the first time
work.rowanhick.com
1
Discussion tonight
• Intended for new Rails Developers
• People that think Rails is slow
• Focus on simple steps to improve common :has_many performance problems
• Short - 15mins
• All links/references up on http://work.rowanhick.com tomorrow
2
About me
• New Zealander (not Australian)
• Product Development Mgr for a startup in Toronto
• Full time with Rails for 2 years
• Previously PHP/MySQL for 4 years
• 6 years Prior QA/BA/PM for Enterprise CAD/CAM software dev company
3
Disclaimer
• For sake of brevity and understanding, the SQL shown here is cut down to “psuedo sql”
• This is not an exhaustive in-depth analysis, just meant as a heads up
• Times were done using ApacheBench through mongrel in production mode
• ab -n 1000 http://127.0.0.1/orders/test_xxxx
4
ActiveRecord lets you get in trouble far to quick.
• Super easy syntax comes at a cost. @orders = Order.find(:all)@orders.each do |order| puts order.customer.name puts order.customer.country.nameend
✴Congratulations, you just overloaded your DB with (total number of Orders x 2) unnecessary SQL calls
5
What happened there?
• One query to get the orders@orders = Order.find(:all)“SELECT * FROM orders”
• For every item in the orders collection customer.name:“SELECT * FROM customers WHERE id = x”
customer.country.name:“SELECT * FROM customers WHERE id = y”
6
Systemic Problem in Web development
I’ve seen:
- 15 Second page reloads
- 10000 queries per page
“<insert name here> language performs really poorly, we’re going to get it redeveloped in <insert new language here>”
7
Atypical root cause
• Failure to build application with *real* data
• ie “It worked fine on my machine” but the developer never loaded up 100’000 records to see what would happen
• Using Rake tasks to build realistic data sets
• Test, test, test
• tail -f log/development.log
8
Faker to the rescue• in lib/xchain.rake
namespace :xchain do desc "Load fake customers" task :load_customers => :environment do require 'Faker' Customer.find(:all, :conditions => "email LIKE('%XCHAIN_%')").each { |c| c.destroy } 1..300.times do c = Customer.new c.status_id = rand(3) + 1 c.country_id = rand(243) + 1 c.name = Faker::Company.name c.alternate_name = Faker::Company.name c.phone = Faker::PhoneNumber.phone_number c.email = "XCHAIN_"+Faker::Internet.email c.save end end
$ rake xchain:load_customers
9
Eager loading
• By using :include in .finds you create sql joins
• Pull all required records in one queryfind(:all, :include => [ :customer, :order_lines ])
✓ order.customer, order.order_lines
find(:all, :include => [ { :customer => :country }, :order_lines ])
✓ order.customer order.customer.country order.order_lines
10
Improvement
• Let’s start optimising ... @orders = Order.find(:all, :include => {:customers => :country} )
• Resulting SQL ...“SELECT orders.*, countries.* FROM orders LEFT JOIN customers ON ( customers.id = orders.customers_id ) LEFT JOIN countries ON ( countries.id = customers.country_id)
✓ 7.70 req/s 1.4x faster
11
Select only what you need
• Using the :select parameter in the find options, you can limit the columns you are requesting back from the database
• No point grabbing all columns, if you only want :id and :name Orders.find(:all, :select => ‘orders.id, orders.name’)
12
The last slide was very important
• Not using selects is *okay* provided you have very small columns, and never any binary, or large text data
• You can suddenly saturate your DB connection.
• Imagine our Orders table had an Invoice column on it storing a pdf of the invoice...
13
Oops
• Can’t show a benchmark
• :select and :include don’t work together !, reverts back to selecting all columns
• Core team for a long time have not included patches to make it work
• One little sentence in ActiveRecord rdoc “Because eager loading generates the SELECT statement too, the :select option is ignored.”
14
‘mrj’ to the rescue
• http://dev.rubyonrails.org/attachment/ticket/7147/init.5.rb
• Monkey patch to fix select/include problem
• Produces much more efficient SQL
15
Updated finder
• Now :select and :include playing nice: @orders = Order.find(:all, :select => 'orders.id, orders.created_at, customers.name, countries.name, order_statuses.name', :include => [{:customer[:name] => :country[:name]}, :order_status[:name]], :conditions => conditions, :order => 'order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC')
✓15.15 req/s 2.88x faster
16
r8672 change
• http://blog.codefront.net/2008/01/30/living-on-the-edge-of-rails-5-better-eager-loading-and-more/
• The following uses new improved association load (12 req/s)
@orders = Order.find(:all, :include => [{:customer => :country}, :order_status] )
• The following does not
@orders = Order.find(:all, :include => [{:customer => :country}, :order_status], :order => ‘order_statuses.sort_order’ )
17
r8672 output...
• Here’s the SQL
Order Load (0.000837) SELECT * FROM `orders` WHERE (order_status_id < 100) LIMIT 10
Customer Load (0.000439) SELECT * FROM `customers` WHERE (customers.id IN (2106,2018,1920,2025,2394,2075,2334,2159,1983,2017))
Country Load (0.000324) SELECT * FROM `countries` WHERE (countries.id IN (33,17,56,150,194,90,91,113,80,54))
OrderStatus Load (0.000291) SELECT * FROM `order_statuses` WHERE (order_statuses.id IN (10))
18
But I want more
• Okay, this still isn’t blazing fast. I’m building the next killr web2.0 app
• Forgetabout associations, just load it via SQL, depending on application, makes a huge difference
• Concentrate on commonly used pages
19
Catch 22
• Hard coding SQL is the fastest solution
• No construction of SQL, no generation of ActiveRecord associated classes
• If your DB changes, you have to update SQL
‣ Keep SQL with models where possible
20
It ain’t pretty.. but it’s fast
• Find by SQL class order def self.find_current_orders find_by_sql("SELECT orders.id, orders.created_at, customers.name as customer_name, countries.name as country_name, order_statuses.name as status_name FROM orders LEFT OUTER JOIN `customers` ON `customers`.id = `orders`.customer_id LEFT OUTER JOIN `countries` ON `countries`.id = `customers`.country_id LEFT OUTER JOIN `order_statuses` ON `order_statuses`.id = `orders`.order_status_id WHERE order_status_id < 100 ORDER BY order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC") endend
• 28.90 req/s ( 5.49x faster )
21
And the results
find(:all) 5.26 req/s
find(:all, :include) 7.70 req/s 1.4x
find(:all, :select, :include) 15.15 req/s 2.88x
find_by_sql() 28.90 req/s 5.49x
22
Don’t forget indexes
• 64000 ordersOrderStatus.find(:all).each { |os| puts os.orders.count }
• Avg 0.61 req/s no indexes
• EXPLAIN your SQLALTER TABLE `xchain_test`.`orders` ADD INDEX order_status_idx(`order_status_id`);
• Avg 23 req/s after index (37x improvment)
23
Avoid .count
• It’s damned slowOrderStatus.find(:all).each { |os| puts os.orders.count }
• Add column orders_count + update codeOrderStatus.find(:all).each { |os| puts os.orders_count }
✓34 req/s vs 108 req/s (3x faster)
24
For the speed freaks
• Merb - http://merbivore.com
• 38.56 req/s - 7x performance improvement
• Nearly identical code
• Blazingly fast
25
The End
work.rowanhick.com
26