Upload
amazon-web-services
View
345
Download
1
Embed Size (px)
Citation preview
React Fast by Processing Streaming DataKobi Biton, Solutions Architect, AWS
Ran Tessler, Mgr. Solutions Architecture, AWS
SpoTaxi
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
Most data is produced continuously
Recent data is highly valuable• If you act on it in time• Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable • If you have the means to combine them
The diminishing value of data
• Durable• Continuous• Fast
• Correct• Reactive• Reliable
What are the key requirements?
Ingest Transform Analyze React Persist
Processing real-time, streaming data
Amazon Kinesis Streams
Easy administration: Create a stream, set capacity level with shards. Scale to match your data throughput rate & volume.
Build real-time applications: Process streaming data w/ Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda,...
Low cost: Cost-efficient for workloads of any scale.
Amazon Kinesis Firehose
Zero administration: Capture and deliver streaming data to Amazon S3, Redshift, Elasticsearch w/o writing an app or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery in as little as 60 seconds
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit streaming data to
Firehose
Analyze streaming data using your favorite BI tools
Firehose loads streaming data continuously into S3, Amazon Redshift and Amazon Elasticsearch
Amazon Kinesis Analytics
Apply SQL on streams: Easily connect to a Kinesis Stream or Firehose Delivery Stream and apply SQL skills.
Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies.
Easy Scalability : Elastically scales to match data throughput.
Connect to Kinesis streams,
Firehose delivery streams
Run standard SQL queries against data streams
Kinesis Analytics can send processed data to analytics tools so you can
create alerts and respond in real-time
Use SQL to build real-time applications
Easily write SQL code to process streaming data
Connect to streaming source
Continuously deliver SQL results
A streaming table is a STREAM
• In relational databases, you work with SQL tables • With Kinesis Analytics, you work with STREAMs• SELECT, INSERT, and CREATE can be used with STREAMs
CREATE STREAM Tweets(author VARCHAR(20), text VARCHAR(140));
INSERT INTO Tweets SELECT …
A simple streaming query
• Tweets about the DLD Festival Summit• Selecting from a STREAM of tweets, an in-application
stream• Each row has a corresponding ROWTIME
SELECT STREAM ROWTIME, author, textFROM TweetsWHERE text LIKE ‘%#DLDTelAviv%'
Writing queries on unbounded datasets
• Streams are unbounded data sets• Need continuous queries, row by row or across rows• WINDOWS define a start and end to the query
SELECT STREAM author, count(author) OVER ONE_MINUTE
FROM Tweets WINDOW ONE_MINUTE AS (PARTITION BY author RANGE INTERVAL '1' MINUTE PRECEDING);
Different types of Windows
Tumbling
Sliding
Amazon Kinesis: Streaming Data Made EasyServices make it easy to capture, deliver and process streams on AWS
Amazon Kinesis FirehoseFor all developers, data scientists
Easily load massive volumes of streaming data into Amazon S3, Redshift ElasticSearch
Amazon Kinesis StreamsFor Technical Developers
Collect and stream data for ordered, replayable, real-time processing
Amazon Kinesis Analytics For all developers, data scientists
Easily analyze data streams using standard SQL queries
Demo – detailed architecture
taxi-telemetryKinesis Stream
taxi-statsKinesis Stream
CalculateStatsKinesis Analytics
PipeStatsToDDBAWS Lambda
statsAmazon DynsmoDB
taxi-telemetry-to-s3Kinesis Firehose
spottaxi-dataAmazon S3
PipeTelemetryToFirehoseAWS Lambda
spottaxiAmazon Elasticsearch ServicePipeTelemetryToES
AWS Lambda