Machine Learning & Data Lake for IoT scenarios on AWS

Preview:

Citation preview

Machine Learning & Data Lake for IoT scenarios on AWS

JohnChang

TechnologyEvangelistOctober2016

Three types of data-driven development

Retrospec)veanalysisandrepor<ng

AmazonRedshiA,AmazonRDSAmazonS3AmazonEMR

Three types of data-driven development

Retrospec)veanalysisandrepor<ng

Here-and-nowreal-<meprocessinganddashboards

AmazonKinesisAmazonEC2AWSLambda

AmazonRedshiA,AmazonRDSAmazonS3AmazonEMR

Three types of data-driven development

Retrospec)veanalysisandrepor<ng

Here-and-nowreal-<meprocessinganddashboards

Predic)onstoenablesmartapplica<ons

AmazonKinesisAmazonEC2AWSLambda

AmazonRedshiA,AmazonRDSAmazonS3AmazonEMR

Machine learning and smart applica@ons Machinelearningisthetechnologythatautoma<callyfindspaMernsinyourdataandusesthemtomakepredic<onsfornewdatapointsastheybecomeavailable.

Machine learning and smart applica@ons Machinelearningisthetechnologythatautoma<callyfindspaMernsinyourdataandusesthemtomakepredic<onsfornewdatapointsastheybecomeavailable.

Yourdata+machinelearning=smartapplica<ons

Smart applica@ons by example

Basedonwhatyouknowabouttheuser:Willtheyuseyourproduct?

Smart applica@ons by example

Basedonwhatyouknowabouttheuser:Willtheyuseyourproduct?

Basedonwhatyouknowaboutanorder:Isthisorderfraudulent?

Smart applica@ons by example

Basedonwhatyouknowabouttheuser:Willtheyuseyourproduct?

Basedonwhatyouknowaboutanorder:Isthisorderfraudulent?

Basedonwhatyouknowaboutanewsar)cle:Whatotherar)clesareinteres)ng?

And a few more examples… Frauddetec)on Detec<ngfraudulenttransac<ons,filteringspamemails,

flaggingsuspiciousreviews,…

Personaliza)on Recommendingcontent,predic<vecontentloading,improvinguserexperience,…

Targetedmarke)ng Matchingcustomersandoffers,choosingmarke<ngcampaigns,cross-sellingandup-selling,…

Contentclassifica)on Categorizingdocuments,matchinghiringmanagersandresumes,…

Churnpredic)on Findingcustomerswhoarelikelytostopusingtheservice,upgradetarge<ng,…

Customersupport Predic<verou<ngofcustomeremails,socialmedialistening,…

Smart applica@ons by counterexample

Dear Alex,

This awesome quadcopter is on sale for just $49.99!

Smart applica@ons by counterexample SELECT c.ID

FROM customers c

LEFT JOIN orders o

ON c.ID = o.customer

GROUP BY c.ID

HAVING o.date > GETDATE() – 30

Wecanstartbysendingtheoffertoallcustomerswhoplacedanorderinthelast30days

Smart applica@ons by counterexample SELECT c.ID

FROM customers c

LEFT JOIN orders o

ON c.ID = o.customer

GROUP BY c.ID

HAVING O.CATEGORY = ‘TOYS’

AND o.date > GETDATE() – 30

…let’snarrowitdowntojustcustomerswhoboughttoys

Smart applica@ons by counterexample SELECT c.IDFROM customers c LEFT JOIN orders o ON c.ID = o.customer

LEFT JOIN PRODUCTS P ON P.ID = O.PRODUCTGROUP BY c.IDHAVING o.category = ‘toys’ AND ((P.DESCRIPTION LIKE ‘%HELICOPTER%’ AND O.DATE > GETDATE() - 60) OR (COUNT(*) > 2 AND SUM(o.price) > 200 AND o.date > GETDATE() – 30) )

…andexpandthequerytocustomerswhopurchasedothertoyhelicoptersrecently,ormadeseveralexpensivetoypurchases

Smart applica@ons by counterexample SELECT c.ID

FROM customers c

LEFT JOIN orders o

ON c.ID = o.customer

LEFT JOIN products p

ON p.ID = o.product

GROUP BY c.ID

HAVING o.category = ‘toys’

AND ((p.description LIKE ‘%COPTER%’

AND o.date > GETDATE() - 60)

OR (COUNT(*) > 2

AND SUM(o.price) > 200

AND o.date > GETDATE() – 30)

)

…butwhataboutquadcopters?

Smart applica@ons by counterexample SELECT c.IDFROM customers c LEFT JOIN orders o ON c.ID = o.customer LEFT JOIN products p ON p.ID = o.productGROUP BY c.IDHAVING o.category = ‘toys’ AND ((p.description LIKE ‘%copter%’ AND o.date > GETDATE() - 120) OR (COUNT(*) > 2 AND SUM(o.price) > 200 AND o.date > GETDATE() – 30) )

…maybeweshouldgobackfurtherin<me

Smart applica@ons by counterexample SELECT c.ID

FROM customers c

LEFT JOIN orders o

ON c.ID = o.customer

LEFT JOIN products p

ON p.ID = o.product

GROUP BY c.ID

HAVING o.category = ‘toys’

AND ((p.description LIKE ‘%copter%’

AND o.date > GETDATE() - 120)

OR (COUNT(*) > 2

AND SUM(o.price) > 200

AND o.date > GETDATE() – 40)

)

…tweakthequerymore

Smart applica@ons by counterexample SELECT c.IDFROM customers c LEFT JOIN orders o ON c.ID = o.customer LEFT JOIN products p ON p.ID = o.productGROUP BY c.IDHAVING o.category = ‘toys’ AND ((p.description LIKE ‘%copter%’ AND o.date > GETDATE() - 120) OR (COUNT(*) > 2 AND SUM(o.price) > 150 AND o.date > GETDATE() – 40) )

…again

Smart applica@ons by counterexample SELECT c.ID

FROM customers c

LEFT JOIN orders o

ON c.ID = o.customer

LEFT JOIN products p

ON p.ID = o.product

GROUP BY c.ID

HAVING o.category = ‘toys’

AND ((p.description LIKE ‘%copter%’

AND o.date > GETDATE() - 90)

OR (COUNT(*) > 2

AND SUM(o.price) > 150

AND o.date > GETDATE() – 40)

)

…andagain

Smart applica@ons by counterexample SELECT c.ID

FROM customers c

LEFT JOIN orders o

ON c.ID = o.customer

LEFT JOIN products p

ON p.ID = o.product

GROUP BY c.ID

HAVING o.category = ‘toys’

AND ((p.description LIKE ‘%copter%’

AND o.date > GETDATE() - 90)

OR (COUNT(*) > 2

AND SUM(o.price) > 150

AND o.date > GETDATE() – 40)

)

Usemachinelearningtechnologytolearnyourbusinessrulesfromdata!

Why aren’t there more smart applica@ons? 1.  Machinelearningexper<seisrare.

2.  Buildingandscalingmachinelearningtechnologyishard.

3.  Closingthegapbetweenmodelsandapplica<onsis)me-consumingandexpensive.

Building smart applica@ons today Exper)se Technology Opera)onaliza)on

Limitedsupplyofdatascien<sts

Manychoices,fewmainstays Complexanderror-pronedataworkflows

Expensivetohireoroutsource

Difficulttouseandscale

CustompladormsandAPIs

Manymovingpiecesleadtocustomsolu<onsevery&me

Reinven<ngthemodellifecyclemanagementwheel

What if there were a beIer way?

Introducing Amazon Machine Learning Easy-to-use,managedmachinelearningservicebuiltfordevelopers

Robust,powerfulmachinelearningtechnologybasedonAmazon’sinternalsystems

CreatemodelsusingyourdataalreadystoredintheAWScloud

Deploymodelstoproduc<oninseconds

Easy-to-use and developer-friendly Usetheintui<ve,powerfulserviceconsoletobuildandexploreyourini<almodels

•  Dataretrieval•  Modeltraining,qualityevalua<on,fine-tuning•  Deploymentandmanagement

AutomatemodellifecyclewithfullyfeaturedAPIsandSDKs

•  Java,Python,.NET,JavaScript,Ruby,PHPEasilycreatesmartiOSandAndroidapplica<onswithAWSMobileSDK

Powerful machine learning technology BasedonAmazon’sbaMle-hardenedinternalsystems

Notjustthealgorithms:•  Smartdatatransforma<ons•  Inputdataandmodelqualityalerts•  Built-inindustrybestprac<ces

Growswithyourneeds

•  Trainonupto100GBofdata•  Generatebillionsofpredic<ons•  Obtainpredic<onsinbatchesorreal-<me

Integrated with the AWS data ecosystem AccessdatathatisstoredinAmazonS3,AmazonRedshiA,orMySQLdatabasesinAmazonRDSOutputpredic<onstoAmazonS3foreasyintegra<onwithyourdataflowsUseAWSIden<tyandAccessManagement(IAM)forfine-graineddataaccesspermissionpolicies

Fully-managed model and predic@on services End-to-endservice,withnoserverstoprovisionandmanage

One-clickproduc<onmodeldeployment

Programma<callyquerymodelmetadatatoenableautoma<cretrainingworkflows

Monitorpredic<onusagepaMernswithAmazonCloudWatchmetrics

Pay-as-you-go and inexpensive Dataanalysis,modeltraining,andevalua<on:$0.42/instancehour

Batchpredic<ons:$0.10/1000Real-<mepredic<ons:$0.10/1000+hourlycapacityreserva<oncharge

Three supported types of predic@ons Binaryclassifica<on

PredicttheanswertoaYes/Noques<on

Mul<classclassifica<onPredictthecorrectcategoryfromalist

Regression

Predictthevalueofanumericvariable

Train model

Evaluate and op@mize

Retrieve predic@ons

1 2 3

Building smart applica@ons with Amazon ML

Trainmodel

Evaluateandop<mize

Retrievepredic<ons

1 2 3

Building smart applica@ons with Amazon ML

-  Createadatasourceobjectpoin<ngtoyourdata-  Exploreandunderstandyourdata-  Transformdataandtrainyourmodel

Create a datasource object

>>> import boto

>>> ml = boto.connect_machinelearning()

>>> ds = ml.create_data_source_from_s3(

data_source_id = ’my_datasource',

data_spec = {

'DataLocationS3': 's3://bucket/input/data.csv',

'DataSchemaLocationS3': 's3://bucket/input/data.schema',

’compute_statistics’: True } )

Explore and understand your data

Train your model

>>> import boto

>>> ml = boto.connect_machinelearning()

>>> model = ml.create_ml_model(

ml_model_id = ’my_model',

ml_model_type = 'REGRESSION',

training_data_source_id = 'my_datasource')

Train model

Evaluate and optimize

Retrieve predictions

1 2 3

Building smart applica@ons with Amazon ML

-  Measureandunderstandmodelquality-  Adjustmodelinterpreta<on

Explore model quality

Fine-tune model interpreta@on

Fine-tune model interpreta@on

Train model

Evaluate and optimize

Retrieve predictions

1 2 3

Building smart applica@ons with Amazon ML

-  Batchpredic<ons-  Real-<mepredic<ons

Batch predic@ons Asynchronous,large-volumepredic<ongenera<onRequestthroughserviceconsoleorAPIBestforapplica<onsthatdealwithbatchesofdatarecords

>>> import boto

>>> ml = boto.connect_machinelearning()

>>> model = ml.create_batch_prediction(

batch_prediction_id = 'my_batch_prediction’,

batch_prediction_data_source_id = ’my_datasource’,

ml_model_id = ’my_model',

output_uri = 's3://examplebucket/output/’)

Real-@me predic@ons Synchronous,low-latency,high-throughputpredic<ongenera<onRequestthroughserviceAPI,server,ormobileSDKsBestforinterac<onapplica<onsthatdealwithindividualdatarecords

>>> import boto

>>> ml = boto.connect_machinelearning()

>>> ml.predict(

ml_model_id = ’my_model',

predict_endpoint = ’example_endpoint’,

record = {’key1':’value1’, ’key2':’value2’})

{

'Prediction': {

'predictedValue': 13.284348,

'details': {

'Algorithm': 'SGD',

'PredictiveModelType': 'REGRESSION’

}

}

}

Data Lake

Retailers need to deliver con@nuous differen@a@on

Personalization Merchandising Real-time engagement

Personalization Merchandising Real-time engagement

Retailers need to deliver con@nuous differen@a@on

Afull-serviceresiden<alrealestatebrokerage

Redfin manages data on hundreds of millions

of properties and millions of customers

The Hot Homes algorithm automatically calculates

the likelihood by analyzing more than 500 attributes of

each home

Was fully AWS-native since day one

https://aws.amazon.com/solutions/case-studies/redfin/

Hot Homes

There's an 80% chance this home will sell in the next 11 days – go tour it soon.

Ingest/Collect

Consume/visualizeStore Process/

analyze

Data1 40 9

5

AmazonS3Datalake AmazonEMR

AmazonKinesis

AmazonRedShiV

Answers&Insights

HotHomesUsers

Proper)es

Agents

User Profile Recommendation Hot Homes Similar Homes Agent Follow-up Agent Scorecard Marketing A/B Testing Real Time Data …

AmazonDynamoDB

BI/Repor)ng

Redfin Manages Data on Hundreds of Millions of Properties Using AWS

.

Once we solved the infrastructure problem, we could

dream a little bigger. Now we can deliver results without

worrying about how to scale.

Yong Huang, Director, Big Data and Analytics

“ •  Zero on-premises infrastructure

•  Using spot pricing for EC2, Redfin saved 90% compared to running on-demand

•  Using AWS, Redfin maintains a small technical team, allowing much simplified server management and allowing the transition to DevOps

•  Redfin is able to launch products like Hot Homes to greatly increase the buyer experience, by leveraging the agility and scale of AWS

Personalization Merchandising Real-time engagement

Retailers need to deliver con@nuous differen@a@on

American upscale fashion retailer

Nordstrom has 323 stores operating in 38 of the United States and also in Canada; the largest in number of

stores and geographic footprint

of its retail competitors

Fashion retailer that sells clothing, shoes, cosmetics,

and accessories

Nordstrom is going all in on AWS

https://aws.amazon.com/solutions/case-studies/nordstrom/

NORDSTROM

Ingest/ Collect

Consume/ visualize Store Process/

analyze

Data 1 4

0 9 5 Outcomes

& Insights

Personalized recommendations within seconds (from 15-20 min) Scale the expertise of stylists to all shoppers Reduce costs by 2X order of magnitude …

Mobile Users

Desktop Users

Analytics Tools

Online Stylist

Amazon RedShift

Amazon Kinesis

AWS Lambda

Amazon DynamoDB

AWS Lambda

AmazonS3DataStorage

NORDSTROM

Nordstrom gives personalized style recommendations in seconds

.

Alert me when the internet is down ...

Keith Homewood Cloud Product Owner, Nordstrom

“ •  Nordstrom Recommendation is the online version of a stylist. It can analyze and deliver personalized recommendations in seconds

•  Going All-In on AWS has resulted in reducing costs by 2X

•  Continuous delivery allows Nordstrom to deliver multiple production launches a day in a single application

•  Can now create a personalized recommendation in seconds, in what used to take 15-20 minutes of processing

•  Nordstrom Cloud Product Owner finds the reliability and availability of AWS so suitable that as long as the internet is working, Nordstrom Recommendation is working

Nordstrom

Personalization Merchandising Real-time engagement

Retailers need to deliver con@nuous differen@a@on

Technology that helps brick-and-mortar retailers optimize performance

Trusted by over 500 global brands in

45 countries worldwide and counting

Euclid analyzes customer movement data to

correlate traffic with marketing campaigns and to help retailers optimize

hours for peak traffic

Was fully AWS-native since day one

https://aws.amazon.com/solutions/case-studies/euclid/

Ingest/Collect

Consume/visualizeStore Process/

analyze

Data1 40 9

5

Answers&Insights

EuclidAnaly)cs

Campaigns

WiFi-Foottraffic

Transac)ons

Walk-Bys New & Return Visitors Visit Duration Engagement Rate Bounce Rate Storefront Potential & Conversion

Customer segmentation and loyalty assessment Regional and categorical roll-up reporting Zoning for large-format locations

EuclidEventIQAmazonS3Datalake

AmazonRDSforMySQL

AmazonEMR

AmazonRedShiV

AmazonEC2

AmazonElas)cBeanstalk

Elas)cLoadBalancing

AmazonRedshiA AmazonElas<cMapReduce

DataWarehouse Semi-structured

Amazon Glacier

Use an op@mal combina@on of highly interoperable services

AmazonSimpleStorageService

DataStorage Archive

AmazonDynamoDB

AmazonMachineLearning

AmazonKinesis

NoSQL Predic)veModels OtherAppsStreaming

AWS IoT

“SecurelyconnectoneoronebilliondevicestoAWS,sotheycaninteractwithapplica<onsandotherdevices”

AWS IoT

DEVICESDKSetofclientlibrariestoconnect,

authen<cateandexchangemessages

DEVICEGATEWAYCommunicatewithdevicesvia

MQTTandHTTP

AUTHENTICATIONAUTHORIZATIONSecurewithmutual

authen<ca<onandencryp<on

RULESENGINETransformmessages

basedonrulesandroutetoAWSServices

AWSServices-----

3PServices

DEVICESHADOWPersistentthingstateduringintermiMentconnec<ons

APPLICATIONS

AWSIoTAPI

DEVICEREGISTRYIden<tyandManagementof

yourthings

AWS IoT Rules Engine Ac@ons

RULESENGINETransformmessages

basedonrulesandroutetoAWSServices

AWSServices-----

3PServices

AWSServices-----

3PServices

1.AWSServices(DirectIntegra&on)

RulesEngine

Ac<ons

AWS IoT Rules Engine

AWSLambda

AmazonSNS

AmazonSQS

AmazonS3

AmazonKinesis

AmazonDynamoDB Amazon RDS

Amazon Redshift

Amazon Glacier

Amazon EC2

3.ExternalEndpoints(viaLambdaandSNS)

RulesEngineconnectsAWSIoTtoExternalEndpointsandAWSServices.

2.RestofAWS(viaAmazonKinesis,AWSLambda,AmazonS3,andmore)

AWS IoT Rules Engine Ac@ons

Rules Engine evaluates inbound messages published into AWS IoT, transforms and delivers to the appropriate endpoint based on business rules. External endpoints can be reached via Lambda and Simple Notification Service (SNS).

InvokeaLambdafunc<on

PutobjectinanS3bucket

Insert,Update,ReadfromaDynamoDBtablePublishtoanSNSTopicorEndpoint

PublishtoanAmazonKinesisstream

Ac<ons

AmazonKinesisFirehose

RepublishtoAWSIoT

AmazonMachineLearning

AmazonElas<csearch

AWS IoT Rules Engine & Amazon SNS

Push Notifications Apple APNS Endpoint, Google GCM Endpoint, Amazon ADM Endpoint, Windows WNS Amazon SNS -> HTTP Endpoint (Or SMS or Email) Call HTTP based 3rd party endpoints through SNS with subscription and retry support

SNS

2

AWS IoT Rules Engine for Machine Learning

Anomaly Detection Amazon Machine Learning can feed predictive evaluation criteria to the Rules Engine Continuous Improvement Around Predication Continuously look for outliers and re-calibrate the Amazon Machine Learning models

SendtoS3

AmazonMachineLearning

Re-Train

S3

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

AWS IoT Rules Engine & Stream Data

N:1 Inbound Streams of Sensor Data (Signal to Noise Reduction) Rules Engine filters, transforms sensor data then sends aggregate to Amazon Kinesis Amazon Kinesis Streams to Enterprise Applications Simultaneously stream processed data to databases, applications, other AWS Services

OrderedStream

AmazonKinesis

Thankyou!

Getpredic<onswithAmazonMLbatchAPI

ProcessdatawithEMR

RawdatainS3Aggregateddata

inS3Predic)ons

inS3 Yourapplica)on

Batch predic@ons with EMR

Structured data In Amazon RedshiT

Load predic@ons into Amazon RedshiT

- or - Read predic@on results directly

from S3

Predic@ons in S3

Get predic@ons with Amazon ML batch API

Your applica@on

Batch predic@ons with Amazon RedshiT

Your applica@on

Get predic@ons with Amazon ML real-@me API

Amazon ML service

Real-@me predic@ons for interac@ve applica@ons

Your applica@on Amazon DynamoDB

+

Trigger events with Lambda +

Get predic@ons with Amazon ML real-@me API

Adding predic@ons to an exis@ng data flow

Recommenda@on engine

AmazonS3 AmazonRedshiA

AmazonML

DataCleansing

RawData

Trainmodel

BuildModels

S3Sta<cWebsite

Predic<ons

Recommended