16
AWS Cloud Kata for Start-Ups and Developers Hong Kong Real-time Data Processing Using AWS Lambda KJ Wu Solutions Architect

Real-time Data Processing Using AWS Lambda

Embed Size (px)

Citation preview

Page 1: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Hong

Kong

Real-time Data Processing

Using AWS Lambda

KJ Wu

Solutions Architect

Page 2: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

AWS Services for Data ProcessingAWS Lambda

Amazon Kinesis

Architecture & Workflow for Streaming Data Processing

Demo

Best Practices in Building Data Processing Solutions

Agenda

Page 3: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

AWS Services for Data

Processing

Amazon

Kinesis

AWS

Lambda

Page 4: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Amazon Lambda: OverviewServerless compute service that runs code in response to events without need to manage servers

No Servers to Manage

• Automatically runs code without Provisioning or Managing servers.

• Just write the code and upload it to Lambda.

Continuous Scaling • Auto scales Application

precisely with the size of the workload.

• Code runs in Parallel & Processes each trigger individually

Subsecond Metering

• Starts Code within

milliseconds of an Event

Executes only when event

is triggered.

Page 5: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

AWS Lambda: Serverless Compute in the Cloud

Easy to author, deploy, maintain,

secure and manage

Allows for focus on business logic,

not infrastructure

Stateless, event-driven code with native support for

Node.js, Java, and Python languages

Compute & Code without managing infrastructure like

EC2 instances and auto scaling groups

Makes it easy to Build back-end

services that perform at scale

Page 6: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Amazon Kinesis Streams

• Build your own custom

applications that process or

analyze streaming data

Amazon Kinesis Firehose

• Easily load massive volumes of

streaming data into Amazon S3,

ElasticSearch and Redshift

Amazon Kinesis: OverviewA managed service for streaming data ingestion and processing

Buffer size/interval

Data compression

Data Encryption with

KMS

Page 7: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Data Processing/Streaming

Architecture & Workflow

Smart

Devices

Click

Stream

Log

Data

Page 8: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

AWS Lambda and Amazon Kinesis integration

Stream-based model: ▪ Lambda polls the stream; When new records detected Lambda

function invoked.

▪ New records are passed by Kinesis as parameter.

▪ Kinesis mapped as Event source in Lambda

Synchronous invocation: ▪ Lambda invoked by RequestResponse invocation

▪ Lambda function is executed once.

Event structure:▪ Event received by Lambda function is a collection of records from

Kinesis stream.

▪ Lambda Kinesis Event source: Batch size/Max records configured to be received per invocation.

Page 9: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Streaming Architecture Workflow: Lambda+Kinesis

Data Input Kinesis Action Lambda Data Output

IT application activity

Capture the

stream

Audit

Process the

stream

SNS

Metering records Condense Redshift

Change logs Backup S3

Financial data** Store RDS

Transaction orders** Process SQS

Server health metrics Monitor EC2

User clickstream Analyze EMR

IoT device data Respond Backend endpoint

Custom data Custom action Custom application

Page 10: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Common Architecture: Lambda + KinesisReal Time Data Processing

Amazon

Kinesis

AWS

Lambda 1

Amazon

CloudWatch

Amazon

DynamoDB

AWS

Lambda 2 Amazon

S3

1. Real-time event data sent to Amazon

Kinesis, allows multiple AWS Lambda

functions to process the same events.

2. In AWS Lambda, Function 1 processes the

incoming events and stores event data in

Amazon DynamoDB

3. Lambda Function 1 also sends values to

Amazon CloudWatch for simple monitoring

of metrics.

4. In AWS Lambda function, Function 2 stores

incoming data events in Amazon S3

Page 11: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Demo: Real time processing of

Amazon Kinesis data streams with

AWS Lambda

Page 12: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Data Processing:

Best Practices & Tips

Page 13: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Best Practices: Kinesis

Batch size:

▪ Number of records that AWS Lambda will retrieve from Kinesis at the time of invoking your function

▪ Increasing batch size will cause fewer Lambda function invocations with more data processed per function

Starting Position:

▪ The position in the stream where Lambda

starts reading

▪ Set to “Trim Horizon” for reading from

start of stream (all data)

▪ Set to “Latest” for reading most recent

data (LIFO) (latest data)

Performance tuning Kinesis as an

event source

Page 14: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Best Practices: Lambda

• Write your Lambda function code in a stateless style

• Be aware of Lambda retries on different error and retry scenario

• Synchronous invocation – The invoking application receives

a 429 error, and is responsible for retries.

• Stream-based event sources - Lambda attempts to process

the erring batch of records until the time the data expires

• Minimizing the use of startup code not directly related to

processing the current event, ex. the third party lib Cost &

Performance

Page 15: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

Get Started: Data Processing with AWS

1. Create your first Kinesis stream. Configure hundreds of thousands of data

producers to put data into an Amazon Kinesis stream. Ex. data from Social

media feeds.

2. Create and test your first Lambda function. Use any third party library, even

native ones. First 1M requests each month are on us! (free-tier)

3. Read the Developer Guide, AWS Lambda and Kinesis Tutorial, and resources

on GitHub at AWS Labs

• http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html

• https://github.com/awslabs/lambda-streams-to-firehose lambda-streams-to-firehose

Next Steps

Page 16: Real-time Data Processing Using AWS Lambda

AWS Cloud Kata for Start-Ups and Developers

THANK YOU