Upload
apache-apex
View
439
Download
0
Embed Size (px)
Citation preview
Big Data Ingestion with Kafka
Chinmay [email protected]
Agenda
● Data Ingestion● Use case: Kafka => HDFS● Brief about Kafka● Steps for development● Let’s code!!!
2
Data Ingestion3
● Reading data in
● Storing in accessible location
● Beginning data pipeline or write path
● From here, it is processed further or read path
Use case: Examples5
● Log Aggregation○ Collect logs from various sources○ Streams them as a single topic○ Put all the logs in centralized place i.e. HDFS
● Real time sensor data processing○ Read sensor data from various sources○ Process stream○ Dump results to HDFS
Brief about Kafka6
● Distributed Messaging System
● Fast Reads and Writes
● Can handle large number of clients
● Scalable, fault-tolerant, partitionable
● Persistent messages
Steps for developing application8
1. Create maven project using apex mvn archetype2. Add required maven dependencies3. Add operators to DAG4. Add stream(s) to DAG5. Set properties in properties.xml6. Compile and run
Summary10
● Ease of development using Apex
● Reusable malhar components
● Fault-tolerant, Scalable
● Reduced Time to Production
Resources
Apache Apex Meetup
• Apache Apex website - http://apex.incubator.apache.org/
• Subscribe - http://apex.incubator.apache.org/community.html
• Download - http://apex.incubator.apache.org/downloads.html
• Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex
• Facebook - https://www.facebook.com/ApacheApex/
• Meetup - http://www.meetup.com/topics/apache-apex
• Startup Program – Free Enterprise License for startups, Universities, Non-Profits