46
Logging for Production Systems in The Container Era Sadayuki Furuhashi Founder & Software Architect DOCKER MOUNTAIN VIEW

Logging for Production Systems in The Container Era

Embed Size (px)

Citation preview

Page 1: Logging for Production Systems in The Container Era

Logging for Production Systems

in The Container Era

Sadayuki Furuhashi Founder & Software Architect

DOCKER MOUNTAIN VIEW

Page 2: Logging for Production Systems in The Container Era

A little about me…

Sadayuki Furuhashi

github: @frsyuki

A founder of Treasure Data, Inc. located in Silicon Valley.

Fluentd - Unifid log collection infrastracture Embulk - Plugin-based ETL tool

OSS projects I founded:

An open-source hacker.

Page 3: Logging for Production Systems in The Container Era

It's like JSON. but fast and small.

A little about me…

Page 4: Logging for Production Systems in The Container Era

The Container EraServer Era Container Era

Service Architecture Monolithic Microservices

System Image Mutable Immutable

Managed By Ops Team DevOps Team

Local Data Persistent Ephemeral

Log Collection syslogd / rsync ?

Metrics Collection Nagios / Zabbix ?

Page 5: Logging for Production Systems in The Container Era

Server Era Container Era

Service Architecture Monolithic Microservices

System Image Mutable Immutable

Managed By Ops Team DevOps Team

Local Data Persistent Ephemeral

Log Collection syslogd / rsync ?

Metrics Collection Nagios / Zabbix ?

The Container Era

How should log & metrics collection be done in The Container Era?

Page 6: Logging for Production Systems in The Container Era

Problems

Page 7: Logging for Production Systems in The Container Era

The traditional logrotate + rsync on containers

Log Server

Application

Container A

File FileFile

Hard to analyze!!Complex text parsers

Application

Container C

File FileFile

Application

Container B

File FileFile

High latency!!Must wait for a day

Ephemeral!!Could be lost at any time

Page 8: Logging for Production Systems in The Container Era

Server 1

Container AApplication

Container BApplication

Server 2

Container CApplication

Container DApplication

Kafka

elasticsearch

HDFS

Container

Container

Container

Container

Small & many containers make storages overloadedToo many connections from micro containers!

Page 9: Logging for Production Systems in The Container Era

Server 1

Container AApplication

Container BApplication

Server 2

Container CApplication

Container DApplication

Kafka

elasticsearch

HDFS

Container

Container

Container

Container

System images are immutableToo many connections from micro containers!

Embedding destination IPsin ALL Docker images makes management hard

Page 10: Logging for Production Systems in The Container Era

Combination explosion with microservicesrequires too many scripts for data integration

LOG

script to parse data

cron job forloading

filteringscript

syslogscript

Tweet-fetching

script

aggregationscript

aggregationscript

script to parse data

rsyncserver

Page 11: Logging for Production Systems in The Container Era

A solution: centralized log collection service

LOG

Log Service

Page 12: Logging for Production Systems in The Container Era

The centralized log collection service

LOG

Page 13: Logging for Production Systems in The Container Era

The centralized log collection service

LOG

We Released!(Apache License)

Page 14: Logging for Production Systems in The Container Era

What’s Fluentd?

Simple core + Variety of plugins

Buffering, HA (failover), Secondary output, etc.

Like syslogd

AN EXTENSIBLE & RELIABLE DATA COLLECTION TOOL

Page 15: Logging for Production Systems in The Container Era

How to collect logs from Docker containers

Page 16: Logging for Production Systems in The Container Era

Text logging with --log-driver=fluentdServer

Container

App

FluentdSTDOUT / STDERR

docker run \ --log-driver=fluentd \ --log-opt \ fluentd-address=localhost:24224

{ “container_id”: “ad6d5d32576a”, “container_name”: “myapp”, “source”: stdout}

Page 17: Logging for Production Systems in The Container Era

Metrics collection with fluent-loggerServer

Container

App

Fluentd

from fluent import senderfrom fluent import event

sender.setup('app.events', host='localhost')event.Event('purchase', { 'user_id': 21, 'item_id': 321, 'value': '1'})

tag = app.events.purchase{ “user_id”: 21, “item_id”: 321 “value”: 1,}fluent-logger library

Page 18: Logging for Production Systems in The Container Era

Logging methods for each purpose• Collecting log messages

> --log-driver=fluentd

• Application metrics

> fluent-logger

• Access logs, logs from middleware

> Shared data volume

• System metrics (CPU usage, Disk capacity, etc.)

> Fluentd’s input plugins(Fluentd pulls those data periodically)

Page 19: Logging for Production Systems in The Container Era

Deployment Patterns

Page 20: Logging for Production Systems in The Container Era

Server 1

Container AApplication

Container BApplication

Server 2

Container CApplication

Container DApplication

Kafka

elasticsearch

HDFS

Container

Container

Container

Container

Primitive deployment…Too many connections from many containers!

Embedding destination IPsin ALL Docker images makes management hard

Page 21: Logging for Production Systems in The Container Era

Server 1

Container AApplication

Container BApplication

Fluentd

Server 2

Container CApplication

Container DApplication

Fluentd Kafka

elasticsearch

HDFS

Container

Container

Container

Container

destination is always localhost from app’s point of view

Source aggregation decouples config from apps

Page 22: Logging for Production Systems in The Container Era

Server 1

Container AApplication

Container BApplication

Fluentd

Server 2

Container CApplication

Container DApplication

Fluentd

active / standby /load balancing

Destination aggregation makes storages scalable for high traffic

Aggregation server(s)

Page 23: Logging for Production Systems in The Container Era

Aggregation servers• Logging directly from microservices makes log

storages overloaded. > Too many RX connections > Too frequent import API calls

• Aggregation servers make the logging infrastracture more reliable and scalable. > Connection aggregation > Buffering for less frequent import API calls > Data persistency during downtime > Automatic retry at recovery from downtime

Page 24: Logging for Production Systems in The Container Era

Fluentd Internal Architecture

Page 25: Logging for Production Systems in The Container Era

Internal Architecture (simplified)

Plugin

Input Filter Buffer Output

Plugin Plugin Plugin

2012-02-04 01:33:51myapp.buylog{

“user”:”me”,“path”: “/buyItem”,“price”: 150,“referer”: “/landing”}

TimeTag

Record

Page 26: Logging for Production Systems in The Container Era

Architecture: Input Plugins

HTTP+JSON (in_http)File tail (in_tail)Syslog (in_syslog)…

Receive logs

Or pull logs from data sources

In non-blocking manner

Plugin

Input

Page 27: Logging for Production Systems in The Container Era

Filter

Architecture: Filter Plugins

Transform logs

Filter out unnecessary logs

Enrich logs

Plugin

Encrypt personal dataConvert IP to countriesParse User-Agent…

Page 28: Logging for Production Systems in The Container Era

Buffer

Architecture: Buffer Plugins

Plugin

Improve performance

Provide reliability

Provide thread-safety

Memory (buf_memory)File (buf_file)

Page 29: Logging for Production Systems in The Container Era

Architecture: Output Plugins

Output

Write or send event logs

Plugin

File (out_file)Amazon S3 (out_s3)MongoDB (out_mongo)…

Page 30: Logging for Production Systems in The Container Era

Buffer

Architecture: Buffer Plugins

Chunk

Plugin

Improve performance

Provide reliability

Provide thread-safety

Input

Output

Chunk

Chunk

Page 31: Logging for Production Systems in The Container Era

Retry

Error

Retry

Batch

Stream Error

Retry

Retry

Divide & Conquer for retry

Page 32: Logging for Production Systems in The Container Era

Divide & Conquer for recoveryBuffer (on-disk or in-memory)

Error

Overloaded!!

recovery

recovery + flow control

queued chunks

Page 33: Logging for Production Systems in The Container Era

Example Use Cases

Page 34: Logging for Production Systems in The Container Era

Streaming from Apache/Nginx to Elasticsearch

in_tail /var/log/access.log

/var/log/fluentd/buffer

but_file

Page 35: Logging for Production Systems in The Container Era

Error Handling and Recovery

in_tail /var/log/access.log

/var/log/fluentd/buffer

but_file

Buffering for any outputs Retrying automatically With exponential wait and persistence on a disk and secondary output

Page 36: Logging for Production Systems in The Container Era

Tailing & parsing files

Supported built-in formats:

Read a log file Custom regexp Custom parser in Ruby

• apache • apache_error • apache2 • nginx

• json • csv • tsv • ltsv

• syslog • multiline • none

pos fileevents.log

?(your app)

Page 37: Logging for Production Systems in The Container Era

Out to Multiple Locations

Routing based on tags Copy to multiple storages

bufferaccess.log

in_tail

Page 38: Logging for Production Systems in The Container Era

Example configuration for real time batch combo

Page 39: Logging for Production Systems in The Container Era

Data partitioning by time on HDFS / S3

access.logbuffer

Custom file formatter

Slice files based on time

2016-01-01/01/access.log.gz 2016-01-01/02/access.log.gz 2016-01-01/03/access.log.gz …

in_tail

Page 40: Logging for Production Systems in The Container Era

3rd party input plugins

dstat

df AMQL

munin

jvmwatcher

SQL

Page 41: Logging for Production Systems in The Container Era

3rd party output plugins

AMQL

Graphite

Page 42: Logging for Production Systems in The Container Era

Real World Use Cases

Page 43: Logging for Production Systems in The Container Era

Microsoft

Operations Management Suite uses Fluentd: "The core of the agent uses an existing open source data aggregator called Fluentd. Fluentd has hundreds of existing plugins, which will make it really easy for you to add new data sources."

Syslog

Linux Computer

Operating SystemApache

MySQLContainers

omsconfig (DSC)PS DSC

Prov

ider

s

OMI Server(CIM Server)

omsagent

Fire

wal

l / p

roxy

OM

S Se

rvic

e

Upload Data(HTTPS)

Pullconfiguration

(HTTPS)

Page 44: Logging for Production Systems in The Container Era

Atlassian

"At Atlassian, we've been impressed by Fluentd and have chosen to use it in Atlassian Cloud's logging and analytics pipeline."

Kinesis

Elasticsearchcluster

Ingestionservice

Page 45: Logging for Production Systems in The Container Era

Amazon web services

The architecture of Fluentd (Sponsored by Treasure Data) is very similar to Apache Flume or Facebook’s Scribe. Fluentd is easier to install and maintain and has better documentation and support than Flume and Scribe.

Types of DataStoreCollectTransactional • Database reads & write (OLTP)• Cache

Search • Logs• Streams

File • Log files (/val/log)• Log collectors & frameworks

Stream • Log records• Sensors & IoT data

Web Apps

IoT

Appl

icat

ions

Logg

ing

Mobile AppsDatabase

Search

File Storage

Stream Storage

Page 46: Logging for Production Systems in The Container Era

Thank you!