Kiyoto TamuraNov 17, 2014
RubyConf 2014
FluentdUnified Logging Layer
whoami
Kiyoto Tamura
GitHub/Twitter: kiyoto/kiyototamura
Treasure Data, Inc.
Director of Developer Relations
Fluentd maintainer
2
a ruby n00b
Fluentd n00b too
why me?
Busy writing code! Just gave a talk!
I’m giving a talk!Busy writing code!
Busy as CTO! San Diego’s nice!
What’s Fluentd?
An extensible & reliable data collection tool
simple core + plugins
buffering, HA (failover), load balance, etc.
like syslogd
data collection tool
Blueflood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your system
bash scripts ruby scripts
rsync
log file
bash
python scripts
customloggger
✓ duplicated code for error handling... ✓ messy code for retrying mechnism...
cron
other customscripts...
(this is painful!!!)
Blueflood
MongoDB
Hadoop
Metrics
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Your systemfilter / buffer / route
extensible
Core Plugins
12
• Divide & Conquer
• Buffering & Retries
• Error Handling
• Message Routing
• Parallelism
• Read Data
• Parse Data
• Buffer Data
• Write Data
• Format Data
Core Plugins
13
• Divide & Conquer
• Buffering & Retries
• Error Handling
• Message Routing
• Parallelism
• Read Data
• Parse Data
• Buffer Data
• Write Data
• Format Data
CommonConcerns
Use CaseSpecific
reliable
reliable data transfer
Divide & Conquer & Retry
error retry
error retry retry
retry
reliable process
This?
18
Or this?
19
M x N → M + N
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databasesbuffer/filter/route
use cases
Simple Forwarding
22
# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag backend.apache</source>
# logs from client libraries<source> type forward port 24224</source>
# store logs to ES and HDFS<match backend.*> type mongo database fluent collection test</match>
Less Simple Forwarding
24
Lambda Architecture
25
# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>
# logs from client libraries<source> type forward port 24224</source>
# store logs to ES and HDFS<match backend.*> type copy
<store> type elasticsearch logstash_format true </store>
<store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store></match>
CEP for Stream Processing
27
Container Logging
28
Fluentd on Kubernetes
architecture
Internal Architecture
Input Parser Buffer Output Formatter
Internal Architecture
Input Parser Buffer Output Formatter
“input-ish” “output-ish”
Input plugins
HTTP+JSON (in_http) File tail (in_tail) Syslog (in_syslog) ...
✓ Receive logs
✓ Or pull logs from data sources
✓ non-blocking
Input
Input plugins
module Fluent class NewTailInput < Input Plugin.register_input('tail', self)
def initialize super @paths = [] @tails = {} end end # Little more codeend
Input pluginsmodule Fluent class NewTailInput < Input Plugin.register_input('tail', self)
def initialize super @paths = [] @tails = {} end
config_param :path, :string config_param :tag, :string config_param :rotate_wait, :time, :default => 5 config_param :pos_file, :string, :default => nil config_param :read_from_head, :bool, :default => false config_param :refresh_interval, :time, :default => 60
attr_reader :paths
def configure(conf) super
@paths = @path.split(',').map {|path| path.strip } if @paths.empty? raise ConfigError, "tail: 'path' parameter is required on tail input" end
unless @pos_file $log.warn "'pos_file PATH' parameter is not set to a 'tail' source." $log.warn "this parameter is highly recommended to save the position to resume tailing." end
configure_parser(conf) configure_tag
@multiline_mode = conf['format'] == 'multiline' @receive_handler = if @multiline_mode method(:parse_multilines) else method(:parse_singleline) end end
def configure_parser(conf) @parser = TextParser.new @parser.configure(conf) end
def configure_tag if @tag.index('*') @tag_prefix, @tag_suffix = @tag.split('*') @tag_suffix ||= '' else @tag_prefix = nil @tag_suffix = nil end end
def start if @pos_file @pf_file = File.open(@pos_file, File::RDWR|File::CREAT, DEFAULT_FILE_PERMISSION) @pf_file.sync = true @pf = PositionFile.parse(@pf_file) end
@loop = Coolio::Loop.new refresh_watchers
@refresh_trigger = TailWatcher::TimerWatcher.new(@refresh_interval, true, log, &method(:refresh_watchers)) @refresh_trigger.attach(@loop) @thread = Thread.new(&method(:run)) end
def shutdown @refresh_trigger.detach if @refresh_trigger && @refresh_trigger.attached?
stop_watchers(@tails.keys, true) @loop.stop rescue nil # when all watchers are detached, `stop` raises RuntimeError. We can ignore this exception. @thread.join @pf_file.close if @pf_file end
def expand_paths date = Time.now paths = [] @paths.each { |path| path = date.strftime(path) if path.include?('*') paths += Dir.glob(path) else # When file is not created yet, Dir.glob returns an empty array. So just add when path is static. paths << path end } paths end
# in_tail with '*' path doesn't check rotation file equality at refresh phase. # So you should not use '*' path when your logs will be rotated by another tool. # It will cause log duplication after updated watch files. # In such case, you should separate log directory and specify two paths in path parameter. # e.g. path /path/to/dir/*,/path/to/rotated_logs/target_file def refresh_watchers target_paths = expand_paths existence_paths = @tails.keys
unwatched = existence_paths - target_paths added = target_paths - existence_paths
700 lines!
Input pluginsmodule Fluent class TcpInput < SocketUtil::BaseInput Plugin.register_input('tcp', self)
config_set_default :port, 5170 config_param :delimiter, :string, :default => "\n" # syslog family add "\n" to each message and this seems only way to split messages in tcp stream
def listen(callback) log.debug "listening tcp socket on #{@bind}:#{@port}" Coolio::TCPServer.new(@bind, @port, SocketUtil::TcpHandler, log, @delimiter, callback) end endend
Input pluginsclass BaseInput < Fluent::Input # some code def on_message(msg, addr) @parser.parse(msg) { |time, record| unless time && record log.warn "pattern not match: #{msg.inspect}" return end
record[@source_host_key] = addr[3] if @source_host_key Engine.emit(@tag, time, record) } # some codeend
Input pluginsclass BaseInput < Fluent::Input # some code def on_message(msg, addr) @parser.parse(msg) { |time, record| unless time && record log.warn "pattern not match: #{msg.inspect}" return end
record[@source_host_key] = addr[3] if @source_host_key Engine.emit(@tag, time, record) } # some codeend
Parser plugins
JSON Regexp Apache/Nginx/Syslog CSV/TSV, etc.
✓ Parse into JSON
✓ Common formats out of the box
✓ v0.10.46 and above
Parser
Parser plugins
<source> type tcp tag tcp.data format /^(?<field_1>\d+) (?<field_2>\w+)/</source>
Parser pluginsdef call(text) m = @regexp.match(text) # some code time = nil record = {}
m.names.each {|name| if value = m[name] case name when "time" time = @mutex.synchronize { @time_parser.parse(value) } else record[name] = if @type_converters.nil? value else convert_type(name, value) end end end } # some codeend
Buffer plugins
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safetyMemory (buf_memory) File (buf_file)
Buffer
Buffer plugins
✓ Chunk = adjustable unit of data
✓ Buffer = Queue of chunks
chunk
chunk
chunk output
Input
Output plugins
✓ Write to external systems
✓ Buffered & Non-buffered
✓ 200+ plugins
Output
File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) ...
Output pluginsclass FileOutput < TimeSlicedOutput Plugin.register_output('file', self) # some code def write(chunk) path = generate_path(chunk) FileUtils.mkdir_p File.dirname(path)
case @compress when nil File.open(path, "a", DEFAULT_FILE_PERMISSION) {|f| chunk.write_to(f) } when :gz File.open(path, "a", DEFAULT_FILE_PERMISSION) {|f| gz = Zlib::GzipWriter.new(f) chunk.write_to(gz) gz.close } end
return path # for test end # more code
Formatter plugins
✓ Format output
✓ Only partially supported for now
✓ v0.10.49 and aboveJSON CSV/TSV “single value”
Formatter
Formatter plugins
class SingleValueFormatter include Configurable
config_param :message_key, :string, :default => 'message' config_param :add_newline, :bool, :default => true
def format(tag, time, record) text = record[@message_key].to_s text << "\n" if @add_newline text endend
Internal Architecture
Input Parser Buffer Output Formatter
Adding Filter in v0.12!
Input Parser Buffer Output FormatterFilter
Roadmap
50
Nov Dec Jan Feb Mar Apr May
2014 2015
v0.12 • filter • label
v0.14 • plugin API • ServerEngine
V1.0!? • we can
use help!
goodies
fluentd-ui
52
Treasure Agent
• Treasure Data distribution of Fluentd
• including Ruby, core libraries and QA’ed 3rd party plugins
• rpm/deb/dmg
• 2.1.2 is released TODAY with fluentd-ui
53
fluentd-forwarder
• Forwarding agent written in Go
• mainly for Windows support
• less mature than Fluentd
• Bundle TCP input/output and TD output
• No plugin mechanism
54