Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
From Zero to Hero with Kafka Connect
@rmoff
A practical guide to becoming l33t with Kafka Connecta.k.a.
#KafkaMeetup
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
What is Kafka Connect?
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Sources
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
Amazon S3
Google BigQuerySinks
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
syslog
Amazon S3
Google BigQuery
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
{ "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://asgard:3306/demo", "table.whitelist": "sales,orders,customers" }
http
s://d
ocs.
conf
luen
t.io/
curr
ent/c
onne
ct/
Look Ma, No Code!
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Streaming Pipelines
RDBMS Kafka Connect
Kafka Connect
Amazon S3
HDFS
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Kafka Connect
Writing to data stores from KafkaApp
Data Store
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Evolve processing from old systems to new
RDBMS
Existing
AppNew App
<x>
Kafka
Connect
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Demo
http:!//rmoff.dev/kafka-connect-code
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Configuring Kafka Connect
Inside the API - connectors, transforms, converters
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Kafka Connect basics
KafkaKafka ConnectSource
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Connectors
KafkaKafka ConnectSource
Connector
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Connectors
"config": {
[...]
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:postgresql://postgres:5432/",
"topics": "asgard.demo.orders",
}
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Connectors
Connect Record
Native data
Connector
KafkaKafka ConnectSource
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Converters
Connect Record
Native data bytes[]
KafkaKafka ConnectSource
Connector Converter
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Serialisation & Schemas
-> Confluent Schema Registry
Avro Protobuf JSON CSV
https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
The Confluent Schema Registry
Source
Avro Message
Target
Schema RegistryAvro
Schema
Kafka Connect
Kafka ConnectAvro
Message
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Converters
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Set as a global default per-worker; optionally can be overriden per-connector
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
What about internal converters?
value.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
value.internal.value.converter=org.apache.kafka.connect.json.JsonConverter
key.internal.value.converter.bork.bork.bork=org.apache.kafka.connect.json.JsonConverter
key.internal.value.please.just.work.converter=org.apache.kafka.connect.json.JsonConverter
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Single Message Transforms
KafkaKafka ConnectSource
Connector Transform(s)Converter
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Single Message Transforms"config": {
[...]
"transforms": "addDateToTopic,labelFooBar",
"transforms.addDateToTopic.type": "org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.addDateToTopic.topic.format": "${topic}-${timestamp}",
"transforms.addDateToTopic.timestamp.format": "YYYYMM",
"transforms.labelFooBar.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.labelFooBar.renames": "delivery_address:shipping_address",
}
Do these transforms
Transforms config Config per transform
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Extensible
Connector Transform(s)Converter
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Confluent Hub
hub.confluent.io
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Deploying Kafka Connect
Connectors, Tasks, and Workers
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Connectors and Tasks
JDBC Source S3 Sink
JDBC Task #2JDBC Task #1
S3 Task #1
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Connectors and Tasks
JDBC Source S3 Sink
JDBC Task #2JDBC Task #1
S3 Task #1
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Connectors and Tasks
JDBC Source S3 Sink
JDBC Task #2JDBC Task #1
S3 Task #1
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Tasks and Workers
JDBC Source S3 Sink
JDBC Task #2JDBC Task #1
S3 Task #1
Worker
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Kafka Connect Standalone Worker
JDBC Task #2JDBC Task #1
S3 Task #1
WorkerOffsets
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
"Scaling" the Standalone Worker
JDBC Task #2
JDBC Task #1
S3 Task #1
WorkerOffsetsOffsets
Worker
Fault-tolerant? Nope.
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
JDBC Task #2
Kafka Connect Distributed Worker
JDBC Task #1 JDBC Task #2
S3 Task #1
Offsets Config Status
Fault-tolerant? Yeah!
WorkerKafka Connect cluster
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Scaling the Distributed Worker
JDBC Task #1 JDBC Task #2
S3 Task #1
Offsets Config Status
Fault-tolerant? Yeah!
Worker WorkerKafka Connect cluster
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Distributed Worker - fault tolerance
JDBC Task #1
S3 Task #1
Offsets Config Status
Worker WorkerKafka Connect cluster
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Distributed Worker - fault tolerance
JDBC Task #1
S3 Task #1
Offsets Config Status
WorkerKafka Connect cluster
JDBC Task #2
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Multiple Distributed Clusters
JDBC Task #1
S3 Task #1
Offsets Config Status
Kafka Connect cluster #1
JDBC Task #2
Kafka Connect cluster #2
Offsets Config Status
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Containers
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Kafka Connect images on Docker Hub
confluentinc/cp-kafka-connect-base
kafka-connect-elasticsearch kafka-connect-jdbc kafka-connect-hdfs […]
confluentinc/cp-kafka-connect
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Adding connectors to a container
confluentinc/cp-kafka-connect-base
JAR
Confluent Hub
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
At runtime
JAR
confluentinc/cp-kafka-connect-base
kafka-connect: image: confluentinc/cp-kafka-connect:5.2.1 environment: CONNECT_PLUGIN_PATH: '/usr/share/java,/usr/share/confluent-hub-components'command: - bash - -c - | confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:1.0.0 /etc/confluent/docker/run
http://rmoff.dev/ksln19-connect-docker
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Build a new image
FROM confluentinc/cp-kafka-connect:5.2.1ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components"RUN confluent-hub install --no-prompt neo4j/kafka-connect-neo4j:1.0.0
JAR
confluentinc/cp-kafka-connect-base
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Automating connector creation# # Download JDBC drivers cd /usr/share/java/kafka-connect-jdbc/ curl https:"//cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-8.0.13.tar.gz | tar xz # # Now launch Kafka Connect /etc/confluent/docker/run & # # Wait for Kafka Connect listener while [ $$(curl -s -o /dev/null -w %{http_code} http:"//$$CONNECT_REST_ADVERTISED_HOST_NAME:$… echo -e $$(date) " Kafka Connect listener HTTP state: " $$(curl -s -o /dev/null -w %{http_… sleep 5 done # # Create JDBC Source connector curl -X POST http:"//localhost:8083/connectors -H "Content-Type: application/json" -d '{ "name": "jdbc_source_mysql_00", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql:"//mysql:3306/demo", "connection.user": "connect_user", "connection.password": "asgard", "topic.prefix": "mysql-00-", "table.whitelist" : "demo.customers", } }' # Don't let the container die sleep infinity http://rmoff.dev/ksln19-connect-docker
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Troubleshooting Kafka Connect
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Troubleshooting Kafka Connect
$ curl -s "http://localhost:8083/connectors/source-debezium-orders/status" | \ jq '.connector.state'"RUNNING"$ curl -s "http://localhost:8083/connectors/source-debezium-orders/status" | \ jq '.tasks[0].state'"FAILED"
http://go.rmoff.net/connector-status
RUNNING
FAILED
Connector
Task
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Troubleshooting Kafka Connect
curl -s "http:!//localhost:8083/connectors/source-debezium-orders-00/status" | jq '.tasks[0].trace'
"org.apache.kafka.connect.errors.ConnectException\n\tat io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)\n\tat io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:197)\n\tat io.debezium.connector.mysql.BinlogReader$ReaderThreadLifecycleListener.onCommunicationFailure(BinlogReader.java:1018)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:950)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:580)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:825)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: java.io.EOFException\n\tat com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.read(ByteArrayInputStream.java:190)\n\tat com.github.shyiko.mysql.binlog.io.ByteArrayInputStream.readInteger(ByteArrayInputStream.java:46)\n\tat com.github.shyiko.mysql.binlog.event.deserialization.EventHeaderV4Deserializer.deserialize(EventHeaderV4Deserializer.java:35)\n\tat com.github.shyiko.mysql.binlog.event.deserialization.EventHeaderV4Deserializer.deserialize(EventHeaderV4Deserializer.java:27)\n\tat com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:212)\n\tat io.debezium.connector.mysql.BinlogReader$1.nextEvent(BinlogReader.java:224)\n\tat com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:922)\n\t!!... 3 more\n"
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
The log is the source of truth
$ confluent log connect
$ docker-compose logs kafka-connect
$ cat /var/log/kafka/connect.log
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Kafka Connect
Symptom not Cause
Task is being killed and will not recover until manually restarted
""
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Error Handling and Dead Letter Queues
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Mismatched converters
"value.converter": "AvroConverter"
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Use the correct Converter for the source dataⓘ
Messages are not Avro
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Mixed serialisation methods
"value.converter": "AvroConverter"
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Some messages are not Avro
Use error handling to deal with bad messagesⓘ
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Error Handling and DLQHandled Convert -> read/write from Kafka -> [de]-serialisation Transform
Not Handled Start -> Connections to a data store Poll / Put -> Read/Write from/to data store*
* can be retried by Connect
https://cnfl.io/connect-dlq
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Fail Fast
Kafka Connect
Source topic messages
Sink messageshttps://cnfl.io/connect-dlq
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
YOLO ¯\_(ツ)_/¯
Kafka Connect
Source topic messages
Sink messages
errors.tolerance=all
https://cnfl.io/connect-dlq
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Dead Letter Queue
Kafka Connect
Source topic messages
Sink messages
Dead letter queue
errors.tolerance=all errors.deadletterqueue.topic.name=my_dlq
https://cnfl.io/connect-dlq
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
Re-processing the Dead Letter QueueSource topic messages
Sink messages
Dead letter queue
Kafka Connect (Avro sink)
Kafka Connect (JSON sink)
https://cnfl.io/connect-dlq
@rmoff #KafkaMeetup
From Zero to Hero with Kafka Connect
Metrics and Monitoring
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
REST API
http://go.rmoff.net/connector-status
From Zero to Hero with Kafka Connect
@rmoff #KafkaMeetup
JMX
KS19Meetup.CONFLUENT COMMUNITY DISCOUNT CODE
25% OFF*
*Standard Priced Conference pass