Transcript
Page 1: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

1.5 Million Log Lines per Second

Big Data Everywhere Chicago 2014

Mike Keane [email protected]

Building and maintaining Flume flows at Conversant

Page 2: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

• Quicker insight into production data

• Reduce complexity of administering/managing new servers,

data centers, etc.

• Scalable

• No data loss or duplication

• Replace TSV files with Avro objects

• Able to be monitored by Network Operations Center (NOC)

• Able to recover from downtime quickly

R SLA for Event Driven Logging with Flume

Page 3: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

• A Flume Flow is a series of flume agents data follows from origination

to final destination

• Data on a Flume Flow is packaged in FlumeEvent Avro objects

• A FlumeEvent is composed of

• Headers – A map of string value pairs

• Body – A byte array

• A FlumeEvent is an atomic unit of data

• FlumeEvents are sent in batches

• When a batch of FlumeEvents only partially makes it to the next flume

agent in the flow, the entire batch is resent resulting in duplicates

R Simplistic Flume Overview

Page 4: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

R Simplistic Flume Overview

Flume Agent

Page 5: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

R Simplistic Flume Overview

EmbeddedAgent Compressor Agent

Landing Agent

Page 6: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Overview of existing network topology

• 3 data centers divided into 12 lanes participating in the OpenRTB market

• 6 lanes in the east coast data center

• 4 lanes in the west coast data center

• 2 lanes in the European data center

• Each lane has approximately 75 servers handling OpenRTB

operations.

• 30 different logs

• Over 60,000,000,000 log lines per day

Page 7: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Overview of existing network topology.

Page 8: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

• 2 Server Flume Flow from East Coast (IAD) to Chicago (ORD) with

over 250K TSV lines per second

• No Data Loss

• Failover

• Compression performance

P.O.C. Can Flume handle our log volume reliably?

Page 9: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

P.O.C. Overview

Page 10: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

P.O.C. passes

• Larger Batch sizes helped, but could not reach 250K per second

• Multiple TSV lines Per FlumeEvent hits over 360K per second

• Failover passed with duplicates

• Compression passed but needed to parallelize 7X sinks

Page 11: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Taking Flume to Production

• Embedding the EmbeddedAgent in existing servers

• Modify EmbeddedAgent

• Properties from existing infrastructure

• Implement Monitoring

• Create “Flume”Implementation of proprietary logging interface

• Replace POJO to TSV with Avro to AvroDataFile

• Preventing duplicates, not removing

• Add LogType header

Page 12: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Taking Flume to Production

• Custom Sink for AvroDataFile body (based on HDFSEventSink)

• Check if UUID header is in HBase

• Yes – increment duplicate count metric

• No

• Write AvroDataFile body to HDFS using Custom Writer

• Put UUID to HBase

Page 13: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Taking Flume to Production

• Custom Selector based on MultiplexingChannelSelector

• Route FlumeEvents to channels by log type or groups of log

types

• Bifurcate to multiple locations each log and each location

with its own percentage of data to bifurcate

Page 14: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Configuring Flume Flows

• Configuring Flume can be tedious, use a templating engine

• In Q2 2014 Conversant expanded from 7 lanes in 2 data centers

to 12 lanes in 3 data centers (~400 more servers to configure).

• Static headers useful for tracking flows

• 15 minutes to configure all Q2 expansion CompressorLane('iad6', [CompressorAgent("dtiad06flm01p"),

CompressorAgent("dtiad06flm02p"),

CompressorAgent("dtiad06flm03p")])

compressor.list = dtiad06flm01p, dtiad06flm02p,dtiad06flm03p

Page 15: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Monitoring the Flume Flows

• Flume metrics are available by JMX or Json over HTTP

• Metrics to monitor

• ChannelFillPercentage

• Rate of change on EventDrainSuccessCount on failover sinks

• FLUME-2307 – File channel deletes fail after timeout (fixed 1.5)

• Publishing metrics to TSDB provides great visual insight

Page 16: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Monitoring the Flume Flows

ChannelFillPercentage

Page 17: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Monitoring the Flume Flows

Rate of taking events off “Critical Logs” file channel

Page 18: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Monitoring the Flume Flows

Rate of Flume Events by data center East Coast, West Coast, Europe

Page 19: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Monitoring the Flume Flows

Monitoring by Groups

Page 20: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Benefits of migrating to Flume

• Business has insight into data in under 10 minutes

• Configuring expansion trivial

• Failover enables automatic recovery from down time

• Bifurcation

• enables scaled constant regression lane(s)

• Subset of data to analytics development cluster

Page 21: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Benefits of migrating to Flume

5 minute aggregations to business within 10 minutes

Page 22: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Gotchas…

• Scaling for Compression

• Auto reloading of properties inconsistent

• “It is recommended (though not required) to use a separate disk

for the File Channel checkpoint.”

RAID-6 raid array, Force Write Back

• Bad configurations not easy to see, not always clear in log file.

• NetcatSource – Not too useful beyond trivial usage

Page 23: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

Gotchas…

• POM file edits

• JUnits are not deterministic

• Hadoop jars added to classpath by startup script – IDE

• Avoiding cost of Avro schema evolution

Page 24: Big Data Everywhere Chicago: 1.5 Million Log Lines Per Second: Building and Maintaining Flume Flows at Coversant (Conversant)

What is next

• Upgrade to Flume 1.5

• Bifurcate to micro batch (Storm? Spark?)

• Disable sink switch


Recommended