87

(WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Embed Size (px)

DESCRIPTION

Log data contains some of the most valuable raw information you can gather and analyze about your infrastructure and applications. Amid the mess of confusing lines of seemingly random text can be hints about performance, security, flaws in code, user access patterns, and other operational data. Without the proper tools, finding insights in these logs can be like searching for a hay-colored needle in a haystack. In this session you learn what practices and patterns you can easily implement that can help you better understand your log files. You see how you can customize web logs to add more information to them, how to digest logs from around your infrastructure, and how to analyze your log files in near real time.

Citation preview

Page 1: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 2: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Chris Munns - @chrismunns

Page 3: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://secure.flickr.com/photos/psd/4389135567/

Page 4: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 5: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 6: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 7: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://secure.flickr.com/photos/iangbl/338035861

Page 8: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 9: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

client

mobile client

CloudFront

region

VPC

Amazon S3

MySQL DB

instance

Web

instances

Elastic Load

BalancingApp

instances

Elastic Load

Balancing

Page 10: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

client

mobile client

CloudFront

region

VPC

Amazon S3

MySQL DB

instance

Web

instances

Elastic Load

BalancingApp

instances

Elastic Load

Balancing

Page 11: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://secure.flickr.com/photos/hk_brian/5753530941

Page 12: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Is this one

important?

https://secure.flickr.com/photos/hk_brian/5753530941

Page 13: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Is this one

important?

What about

this one?https://secure.flickr.com/photos/hk_brian/5753530941

Page 14: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 15: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Is this one

important?

Page 16: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Is this one

important?

What about

this one?

Page 17: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Let’s go back to the

beginning

https://secure.flickr.com/photos/paukrus/9826882836

Page 18: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 19: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Numerical code Facility

0 kernel messages

1 user-level messages

2 mail system

3 system daemons

4 security/authorization

messages

23 local use 7 (local7)

Numerical code Severity

0 Emergency

1 Alert

2 Critical

3 Error

4 Warning

7 Debug

Page 20: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47

- BOM'su root' failed for lonvick on /dev/pts/8

Page 21: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 22: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Easy, right?

https://secure.flickr.com/photos/21734563@N04/2225069096

Page 23: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 24: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

66.249.64.XXX - - [07/Sep/2014:08:33:43 +0000] "GET / HTTP/1.1"

200 819 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;

+http://www.google.com/bot.html)"

Page 25: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 26: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Thanks for the

history lesson,

Chris!

Now what do I do?https://secure.flickr.com/photos/decade_null/142235888

Page 27: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

*Each step has several moving pieces

*

https://secure.flickr.com/photos/james_wheeler/9619984584

Page 28: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 29: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Apache LogFormat:

173.248.147.XXX - - [16/Sep/2014:15:36:31 +0000] "GET / HTTP/1.1"

200 819 "-"

"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

Page 30: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Customize log data

%D = The time taken to serve the request, in microseconds

%T = The time taken to serve the request, in seconds

%v = The canonical ServerName of the server serving the request

%{Foobar}C = The contents of cookie Foobar in the request sent to

the server

%{Foobar}n = The contents of note Foobar from another module

Source: https://httpd.apache.org/docs/2.2/mod/mod_log_config.html

Apache LogFormat:

Page 31: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Apache LogFormat:

64.237.55.3 php-app1 [16/Sep/2014:16:21:31 +0000] "GET / HTTP/1.1"

200 819 23765 "-"

"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

"%h %v %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-agent}i\""

173.248.147.XXX - - [16/Sep/2014:15:36:31 +0000] "GET / HTTP/1.1"

200 819 "-"

"Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"

Page 32: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Apache vs. Nginx CLF patterns:

'$remote_addr - $remote_user [$time_local] ' '"$request" $status

$body_bytes_sent ' '"$http_referer" "$http_user_agent"'

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""

Page 33: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 34: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 35: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Either way!!!

Get that data off your

host ASAP!!

https://secure.flickr.com/photos/foresthistory/3663382060

Page 36: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Why?

Page 37: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Instance failure.

Filled disks.

Auto Scaling actions.

https://secure.flickr.com/photos/eurleif/186807023

Page 38: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 39: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

syslog-ng, rsyslog, nxlog

Pros:

• Open source

– Linux, Windows, and almost everything else!

• Both variants of syslogd

– Add filtering, flexible configuration,

TCP as a transport

• Runs as an OS process

• Typically take the centralized data and

feed into another analytics tool

• Can often accept logs from third-party

sources like network devices

Central

logging

instance

virtual private cloud

App

instances

Etc.

instances

Web

instances

Page 40: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

syslog-ng, rsyslog, nxlog

Cons:

• No built-in analytics/dashboard abilities

• Typically centralized host can become

a single point of failure

• Potentially more difficult to scale

– Federate logs to different

centralized hosts?Central

logging

instance

virtual private cloud

App

instances

Etc.

instances

Web

instances

Page 41: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Splunk

Pros:

• Enterprise grade

• Extremely scalable

• Fault tolerance and load balancing

built in

• Security of data built in

• Can technically accept data from

other third-party sources as well

• Full log forwarding, analyzing,

dashboarding stack + third-party apps

Splunk

indexer

virtual private cloud

Splunk

indexer

App

instances

Etc.

instances

Web

instances

Page 42: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Splunk

Cons:

• Enterprise-grade pricing

• Enterprise-grade licensing

• Indexer resources become an

important part of capacity planning

A great option for Enterprises and

large shops!Splunk

indexer

virtual private cloud

Splunk

indexer

App

instances

Etc.

instances

Web

instances

Page 43: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Logstash

Pros:

• Open source

• Extremely scalable

• Fault tolerance built in

• Support offerings from Elasticsearch!

• Active code base and ecosystem

• Pluggable

• Ties in with other tools for

dashboarding/analytics

virtual private cloud

App

instances

Etc.

instances

Web

instances

Redis

Elasticsearch ElasticsearchElasticsearch

Logstash

indexer

Logstash

indexer

Page 44: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Logstash

Cons:

• “ELK Stack” has many moving pieces

• Lot of DIY to getting it set up

• Very quickly changing/improving

technology stack

Most popular open source option today!

virtual private cloud

App

instances

Etc.

instances

Web

instances

Redis

Elasticsearch ElasticsearchElasticsearch

Logstash

indexer

Logstash

indexer

Page 45: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

SaaS options

Pros:

• Hosted

• Very easy to get started with

• No concerns about scaling yourself

• Flexible pricing methods

• Support

• Either their agents or syslog to them

• Built-in dashboards/analytics tools

• Constantly adding features/capabilities

virtual private cloud

NAT

instance

App

instances

Etc.

instances

Web

instances

Page 46: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

SaaS options

Cons:

• Data leaving your control/infrastructure

• Some restrictiveness in flexibility of the

dashboards, collection agents, archive

limits

SaaS makes a lot of sense if you are small

and trying to move fast and should be

focusing on product first!

virtual private cloud

NAT

instance

App

instances

Etc.

instances

Web

instances

Page 47: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 48: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 49: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Part of Amazon CloudWatch serviceCloudWatch

Page 50: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

# yum install awslogs

# tail -n 7 /etc/awslogs/awslogs.conf

# aws logs put-metric-filter

--log-group-name

--filter-name

--filter-pattern

--metric-transformations

CloudWatch

Page 51: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 52: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 53: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 54: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://github.com/etsy/logster

[root@php-app1 logster-master]# /usr/bin/logster --dry-run --output=ganglia SampleLogster /var/log/httpd/access_log.../usr/bin/gmetric -d 180 -c /etc/ganglia/gmond.conf --name http_2xx --value 0.533333333333 --type float --units "Responses per sec”...

Page 55: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

• Can process log files on the fly

outputting metric data to numerous

services:– CloudWatch

– Ganglia

– Graphite via statsd

– Boundary

– DataDog

– many others!

• Runs as a constantly running

daemon

• Little bit easier than Logster

• Can do metric output and full log

centralization at the same time!

input {

file {

path => "/var/log/apache/access.log"

type => "apache-access” }

}

filter {

grok {

type => "apache-access"

pattern => "%{COMBINEDAPACHELOG}” }

}

output {

statsd {

# Count one hit every event by response

increment => "apache.response.%{response}” }

} (from: http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs)

Logstash

Page 56: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 57: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Dashboards

https://secure.flickr.com/photos/joeross/6544781203

Page 58: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 59: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 60: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 61: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 62: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 63: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Dashboard for Logstash

Page 64: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 65: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Each of these examples

took less than an hour

to set up!

Page 66: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 67: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 68: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 69: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Focus first on what affects your customers:

Then on important technical issues:

Page 70: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Alarming:

Send alarms with:

Page 71: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 72: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Log backup

& archiving

https://secure.flickr.com/photos/ant-ti/6016877003

Page 73: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 74: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 75: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

How to do it right:

1. Get data into Amazon S3

2. Get data into Amazon Glacier

Amazon S3 Amazon

Glacier

Page 76: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Amazon S3 Amazon

Glacier

Page 77: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Sounds easy!Amazon S3 Amazon

Glacier

"MyLoggingBucket": {

"Type": "AWS::S3::Bucket",

"Properties": {

"BucketName": "MyLoggingBucket"

"LifecycleConfiguration": {

"Rules": [

{

"Id": "GlacierRule"

"Prefix": "logs",

"Status": "Enabled",

"ExpirationInDays": "365",

"Transition": {

"TransitionInDays": ”30",

"StorageClass": "Glacier"

}

}

]

}

}

}

Page 78: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Given the importance of log data, securing them

properly is also important:IAM

Page 79: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Don’t do this by hand! Make

use of tools:

Build basic log centralization

into every AMI!

directory "/opt/aws/cloudwatch" do

recursive true

end

remote_file "/opt/aws/cloudwatch/awslogs-agent-

setup.py" do

source "https://s3.amazonaws.com/aws-cloudwatch/

downloads/latest/awslogs-agent-setup.py"

mode "0755"

end

execute "Install CloudWatch Logs agent" do

command "/opt/aws/cloudwatch/awslogs-agent-

setup.py -n -r us-west-2 -c /etc/cwlogs.cfg"

not_if { system "pgrep -f aws-logs-agent-setup" }

end

Page 80: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 81: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://secure.flickr.com/photos/pfly/1537122018

In Closing

Page 82: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Logs ARE important!

Page 83: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://secure.flickr.com/photos/ocarchives/5333790414

Logs are fun!

Page 84: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Spend the time to do

log analysis right!

Page 85: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Page 86: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

https://secure.flickr.com/photos/dullhunk/202872717/

Page 87: (WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals