View
235
Download
5
Category
Preview:
Citation preview
Agenda
Meetup Intro
Tech Overview: Kafka and Binary Logs (binlogs)
Change Data Capture Overview
Demo: binlogs -> maxwell -> kafka -> HDFS/Spark/Zeppelin + Elastic
About Me• Data Scientist who leans Computer
Scientist
• Lead Data Scientist, Stackspace.io and b23.io
• PMC Member & Committer, Apache Metron (incubating)
• Contributed to Apache Spark, MLlib
• @_mbittmann_
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.}
FastScalable}
Durable
http://kafka.apache.org/documentation.html
Design Features
• Distributed => cluster-centric design offers strong durability and fault-tolerance guarantees
• Partitioned => messages spread over a cluster of machines for streams that might exceed capacity of a single machine
• Replicated => messages persisted on disk and replicated within the cluster to prevent data loss
http://kafka.apache.org/documentation.html
https://martin.kleppmann.com/2015/05/27/logs-for-data-infrastructure.html
The power of Kafka lies within what you build around it.
The binary log contains a record of all changes to the databases, both
data and structure.
https://mariadb.com/kb/en/mariadb/binary-log/
ROW based binlog
{"database":"bintest","table":"mytable","type":"delete","ts":1459958130,"xid":14261,"commit":true,"data":{"some_blob":"AMgyGQr/","some_text":"text object","id":98,"some_bool":0,"uuid":"fcb3a514-fc0f-11e5-841c-60f81dc2691c","some_value":0,"ts":"2016-04-06"}}
Implementations• MySQL/MariaDB/Aurora/Percona: binlog
• Oracle: GoldenGate
• PostgreSQL: logical decoding
• MongoDB: oplog
• CouchDB: changes feed
A change in data means something happened
and when something happens many applications
might want to know about it.
https://martin.kleppmann.com/2015/05/27/logs-for-data-infrastructure.html
Recommended