Building a Real-Time Data Pipeline with Spark, Kafka, and Python

View
1.548
Download
1
Category

Data & Analytics

Preview:

Citation preview

Douglas ButlerProduct Manager

massively parallel, lock free, FASTdistributed SQL database

in-memory, on-diskACID

JSON and geospatialtransactions and analytics

2 Minute Install

A Simple Pipeline

from pystreamliner.api import Extractor

class CustomExtractor(Extractor): def initialize(self, streaming_context, sql_context, config, interval, logger): logger.info("Initialized Extractor") def next(self, streaming_context, time, sql_context, config, interval, logger): rdd = streaming_context._sc.parallelize([[x] for x in range(10)]) return sql_context.createDataFrame(rdd, ["number"])

> memsql-ops pip install [package]

distributed cluster-wide

any Python package

bring your own

Real-time pipeline

Q & A time

Recommended

Kafka spark cassandra webinar feb 16 2016

Technology

Real Time Aggregation with Kafka ,Spark Streaming and ... · Real Time Aggregation with Kafka ,Spark Streaming and ElasticSearch , scalable beyond Million RPS Dibyendu B Dataplatform

Documents

Spark streaming with kafka

Software

Stream Processing using Apache Spark and Apache Kafka

Technology

Release 2.0.2-dev · kafka-python Documentation, Release 2.0.2-dev Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much

Documents

Big Data Logging Pipeline with Apache Spark and Kafka

Data & Analytics

Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera

Data & Analytics

1 V - static.ucloud.cn · kjwc.jari spark-streaming-kafka-assembly_2.10-1.5.2.jar x(l Ï; U Ô QìÐjÛ§ spark-submit --master yarn --jars spark-streaming-kafka-assembly_2.10-1

Documents

Streaming Big Data with Spark Streaming, Kafka, Cassandra ...chariotsolutions.com/wp-content/uploads/2015/04/HelenaEdelson_ETE... · Streaming Big Data with Spark Streaming, Kafka,

Documents

Spark stream - Kafka

Software

High-Performance Python On Spark

Data & Analytics

Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans

Data & Analytics

Streaming Big Data with Spark Streaming, Kafka, Cassandra and … · 2017-08-23 · Asynchronous Data Passing Kafka, Akka, Spark Fast, Low Latency, Data Locality Cassandra, Spark,

Documents

Data Driven Performance Repository to Classify and ... · MongoDB. Cluster-Python Driver. Cassandra - Python Driver. Python. Spark Cluster. Spark - Cassandra Connector. Spark - MongoDB

Documents

Data Science using Python - digitalvidya.com · Spark, Apache Storm, Kafka, MongoDB. Rohit Kumar Research Assistant, ULB ... Model Evaluation and Parameters Tuning Example of …

Documents

Feeding Cassandra with Spark-Streaming and Kafka

Technology

Spark Essentials Python

Documents

REAL TIME ANALYTICS WITH SPARK AND KAFKA - …static1.squarespace.com/static/55007c24e4b001deff386756/t... · REAL TIME ANALYTICS WITH SPARK AND KAFKA NIXON PATEL, Chief Data Scientist,

Documents

Scaling with Couchbase, Kafka and Apache Spark

Software

Python & Spark

Software