45
Common Sense Performance Indicators Nick Gerner June 24, 2010

Common Sense Performance Indicators in the Cloud

Embed Size (px)

DESCRIPTION

Nick Gerner speaks about performance indicators and measurement tools for Velocity 2010

Citation preview

Page 1: Common Sense Performance Indicators in the Cloud

Common Sense Performance Indicators

Nick GernerJune 24, 2010

Page 2: Common Sense Performance Indicators in the Cloud

Goals

Common Sense in the Cloudsame as outside the cloud

1. Tune performance2. Investigate issues3. Visualize architecture

Page 3: Common Sense Performance Indicators in the Cloud

Nick Gerner

www.nickgerner.com@gerner

• Formerly senior engineer at SEOmoz• Linkscape: index of the web for SEO• Lead data services• Developer• Back-end ops guy

Page 4: Common Sense Performance Indicators in the Cloud

SEOmoz

• Seattle-based Startup (~7 engineers)• SEO Blog and Community• Toolset and Platform

OpenSiteExplorer.org

• 300TB/month processing pipeline• 5 mil req/day API hits

Page 5: Common Sense Performance Indicators in the Cloud

SEOmoz Engineering

• 50 < nodes < 500• AWS based since 2008

– EC2 – linux root access to bare VM– S3 – networked disk– EBS – local disk I/O– ELB – load balancing as a service

Page 6: Common Sense Performance Indicators in the Cloud

SEOmoz ArchitectureProcessing

TheWeb Crawlers

RawStorage

Crawlers Process Prepare

Data Pipeline

Page 7: Common Sense Performance Indicators in the Cloud

SEOmoz ArchitectureAPI

Partners

SEOmozApps

ELB

Lighttpd

Lighttpd

Lighttpd

App

App

App

Memcache

Memcache

Memcache

Memcache

Memcache

Memcache

S3

Page 8: Common Sense Performance Indicators in the Cloud

End-to-EndPerformance Indicators

Latency

Time toOn-load

Conversion Rate

DNS

Web ObjectCount

Page 9: Common Sense Performance Indicators in the Cloud

Great...but not the focus of this talk

Latency

Time toOn-load

Conversion Rate

DNS

Web ObjectCount

Page 10: Common Sense Performance Indicators in the Cloud

Performance Indicators

SystemCharacteristics

AppStack

Database WS-API

Back-end

Caching

Middleware

Front-End

Drives

CompetesFor

http://www.flickr.com/photos/dnisbet/3118888630/

CPU

DiskNet

Mem

Page 11: Common Sense Performance Indicators in the Cloud

Performance Indicators

SystemCharacteristics App

Stack

Database WS-API

Back-end

Caching

Middleware

Front-End

Drives

CompetesFor

http://www.flickr.com/photos/dnisbet/3118888630/

CPU

DiskNet

Mem

Page 12: Common Sense Performance Indicators in the Cloud

/proc

• System stats• Per-process stats• It all comes from here

...but use tools to see it

Page 13: Common Sense Performance Indicators in the Cloud

System Characteristics

Load AverageCPU

MemoryDisk

Network

Page 14: Common Sense Performance Indicators in the Cloud

Load Average

• Combines a few things• Good place to start• Explains nothing

http://www.flickr.com/photos/maple03/4176389418/

Page 15: Common Sense Performance Indicators in the Cloud

CPU

• Break out by process• Break out user vs system• User, System, I/O wait, Idle

http://www.flickr.com/photos/pacdog/213442876/

Page 16: Common Sense Performance Indicators in the Cloud

Why watch it?

• Who's doing work• Is CPU maxed?• Blocked on I/O?• Compare to Load Average

http://www.flickr.com/photos/pacdog/213442876/

Page 17: Common Sense Performance Indicators in the Cloud

Memory

• Break out by Process• Free, cached, used

http://www.flickr.com/photos/williamhook/3118248600/

Page 18: Common Sense Performance Indicators in the Cloud

Why watch it?• Cached + Free = Available• Do you have spare memory?

– App uses– Memcache– DB cache

http://www.flickr.com/photos/williamhook/3118248600/

Page 19: Common Sense Performance Indicators in the Cloud

Disk

• Read bytes/sec• Write bytes/sec• Disk utilization

http://www.flickr.com/photos/robfon/2174992215/

Page 20: Common Sense Performance Indicators in the Cloud

Why watch it?

• Is disk busy?• When?• Who's using it?

http://www.flickr.com/photos/robfon/2174992215/

Page 21: Common Sense Performance Indicators in the Cloud

Network

• Read bytes/sec• Write bytes/sec• Established connections

http://www.flickr.com/photos/ahkitj/20853609/

Page 22: Common Sense Performance Indicators in the Cloud

Why watch it?

• Max connections(~1024 is magic)

• Bandwidth is $$$• When are you busy?• SOA considerations

http://www.flickr.com/photos/ahkitj/20853609/

Page 23: Common Sense Performance Indicators in the Cloud

Perf Monitoring Solution

1. data collection (collectd)2. data storage (rrdtool)3. dashboard management (drraw)

FREE, in Aptv

Page 24: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Cluster

Cluster

Multiple Clusters

Multiple Applications

Nodes come upand go down

Page 25: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Cluster

Cluster

collectd agents

new nodes getgeneric config

node namesfollow conventionaccording to role

Page 26: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Perf MonitoringServer

Cluster

Cluster

On its own server:collectd server

Web serverdrraw.cgi

allows connections from new nodes

perf data backed up daily

Page 27: Common Sense Performance Indicators in the Cloud

Perf Monitoring Architecture

Perf MonitoringServer

Cluster

Cluster

Happy Sysadmin

Visibility into systemhistory of performance

Page 28: Common Sense Performance Indicators in the Cloud

Perf Dashboard Featurs

1. Summarize nodes/systems2. Visualize data over time3. Stack measurements– Per-process– Per-node

4. Handle new nodes–

Page 29: Common Sense Performance Indicators in the Cloud

Batch Mode Dashboard

Page 30: Common Sense Performance Indicators in the Cloud

CPU

Page 31: Common Sense Performance Indicators in the Cloud

Memory

Page 32: Common Sense Performance Indicators in the Cloud

Disk

Page 33: Common Sense Performance Indicators in the Cloud

Network

Page 34: Common Sense Performance Indicators in the Cloud

Web Server Dashboard

Page 35: Common Sense Performance Indicators in the Cloud

Web Requests

Page 36: Common Sense Performance Indicators in the Cloud

mod_status

Page 37: Common Sense Performance Indicators in the Cloud

System-Wide Dashboard

Page 38: Common Sense Performance Indicators in the Cloud

Per-request

Page 39: Common Sense Performance Indicators in the Cloud

Graph Summary

• cpu, mem, disk, net• over time• per node• per process• Through in relevant app measures

e.g. per request stats:• req/sec• median latency/req

Page 40: Common Sense Performance Indicators in the Cloud

Ad-hoc Tools

• $ dstat -cdnmlsystem characteristics

• $ iotopper-process disk I/O

• $ iostat -x 3detailed disk stats

• $ netstat -tnpfast, per-process TCP connection stats

Page 41: Common Sense Performance Indicators in the Cloud

Resources

• Perf Testing: What, How, Whyhttp://www.nickgerner.com/2010/02/performance-testing-what-andhow-why/

• Perf Testing Case Study: OSEhttp://www.nickgerner.com/2010/01/performance-testing-case-study-ose/

• S3 Benchmarkshttp://twopieceset.blogspot.com/2009/06/s3-performance-benchmarks.html

• Perf Measurement– http://twopieceset.blogspot.com/2009/03/performance-

measurement-for-small-and.html

Page 42: Common Sense Performance Indicators in the Cloud

More Resources

• http://www.collectd.org• http://oss.oetiker.ch/rrdtool/• http://web.taranis.org/drraw/• http://dag.wieers.com/home-made/dstat/

• $ man proc–

Page 43: Common Sense Performance Indicators in the Cloud

Q: Why? A: Perf Tuning

Test

InterpretImprove

Validate Measure

Page 44: Common Sense Performance Indicators in the Cloud

Q: Why? A: System Arch

• Better Devs/Ops• Identify Bottlenecks• Scaling

Considerations

Page 45: Common Sense Performance Indicators in the Cloud

Q: Why? A: Issue Investigation

• Machine Specific?• System Wide?• Which Component?• Timeline?• Cascading Failures?