31
Big Data and the Cloud Shekhar Vemuri #phxdataconferenc

Big data in the cloud - Shekhar Vemuri

Embed Size (px)

Citation preview

Page 1: Big data in the cloud - Shekhar Vemuri

Big Data and the CloudShekhar Vemuri

#phxdataconference

Page 2: Big data in the cloud - Shekhar Vemuri

ABOUT

• PRINCIPAL at CLAIRVOYANT

• PRODUCT, DATA, ANALYTICS and CLOUD

• large scale web and data systems

• simple, lightweight solutions

Page 3: Big data in the cloud - Shekhar Vemuri

QUICK POLL

• HADOOP, HIVE, PIG

• PUBLIC CLOUD, IaaS, SaaS

• AMAZON AWS, EC2

• ELASTICITY

• S3, EMR, KINESIS

• IoT

Page 4: Big data in the cloud - Shekhar Vemuri

WHAT WILL WE TALK ABOUT

Page 5: Big data in the cloud - Shekhar Vemuri

BIG DATA

Page 6: Big data in the cloud - Shekhar Vemuri

USE CASES

RISK MODELING PERSONALIZEDMEDICINE AD TARGETING

INTERNET OF THINGS

THREAT ANALYSIS

RECOMMENDATIONS

SURVEILLANCE RETENTION 360 CUSTOMERVIEW

Page 7: Big data in the cloud - Shekhar Vemuri

DRIVING FACTORS

• variety in data

• not just transactional data

• potential for tremendous insight - when combining transactional data with additional data sources

• LinkedIn, Twitter, Facebook, Pinterest , Open Data

• Internet of Things

Page 8: Big data in the cloud - Shekhar Vemuri

the CLOUD

Page 9: Big data in the cloud - Shekhar Vemuri

the CLOUD

• IaaS, SaaS

• on demand subscription

• subscription vs owning

• tradeoff

• ease of adoption

• powering nextgen entrepreneurship

Page 10: Big data in the cloud - Shekhar Vemuri

LANDSCAPE

Page 11: Big data in the cloud - Shekhar Vemuri

DATA VALUE CHAIN

Page 12: Big data in the cloud - Shekhar Vemuri

1010101011010101010101010101010101010101010101010101010101GENERATE STORE ANALYZE INSIGHTS

> > >

DATA VALUE CHAIN

ingest transform transform

Page 13: Big data in the cloud - Shekhar Vemuri

BIG DATA + the CLOUD

Page 14: Big data in the cloud - Shekhar Vemuri
Page 15: Big data in the cloud - Shekhar Vemuri

LOG ANALYSIS

Page 16: Big data in the cloud - Shekhar Vemuri

AMAZON S3

AMAZON EC2

LOG FILES

ReST CLIENTS

WEB APP, REST APIs

Page 17: Big data in the cloud - Shekhar Vemuri

AMAZON EMR

AMAZON S3

AMAZON EC2

LOG FILES

ReST CLIENTS

WEB APP, REST APIs

AMAZON REDSHIFT

LOG FILES - STORED in S3

MAP-REDUCE, HIVE, PIG, CASCADING jobs

STORE summarized data

Page 18: Big data in the cloud - Shekhar Vemuri

AMAZON EMR

AMAZON S3

AMAZON EC2

LOG FILES

ReST CLIENTS

WEB APP, REST APIs

LOG FILES - STORED in S3

MAP-REDUCE, HIVE, PIG, CASCADING jobs

CLOUDERA IMPALA

Page 19: Big data in the cloud - Shekhar Vemuri

AMAZON S3

AMAZON KINESIS

AMAZON REDSHIFT AMAZON DYNAMODBAMAZON RDS

AMAZON EMR

Page 20: Big data in the cloud - Shekhar Vemuri

AMAZON S3

DATA

Page 21: Big data in the cloud - Shekhar Vemuri

AMAZON S3

INPUT

AMAZON EMR

Page 22: Big data in the cloud - Shekhar Vemuri

AMAZON S3

INPUT

OUTPUT

AMAZON EMR

Page 23: Big data in the cloud - Shekhar Vemuri

AMAZON S3

INPUT

OUTPUT

AMAZON EMR

AMAZON EMRWITH SPOT instances

Page 24: Big data in the cloud - Shekhar Vemuri

BUILDING BLOCKS

• amazon AWS

• amazon EMR

• amazon S3

• kinesis

• redshift

• spot instances

Page 25: Big data in the cloud - Shekhar Vemuri

HEADER

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud.

SUBHEADER

exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Page 26: Big data in the cloud - Shekhar Vemuri

PROS

• like other cloud solutions - reduces the barrier to adoption

• especially if you are already in the cloud

• can provide ability to implement quick POCs

Page 27: Big data in the cloud - Shekhar Vemuri

HEADER

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud.

SUBHEADER

exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Category 4

Category 3

Category 2

Category 1

0 1.3 2.5 3.8 5 6.3

Series 1 Series 2Series 3

Page 28: Big data in the cloud - Shekhar Vemuri

CONS

• depending on your current infrastructure - may end up continually replicating data

• data security, privacy

Page 29: Big data in the cloud - Shekhar Vemuri

LEARNINGS

• Build platforms once the need is strongly felt

• Prepare to Fail fast, couple of times before the final version

• what you think will happen, will not

Page 30: Big data in the cloud - Shekhar Vemuri

LEARNINGS

• COSTS can spiral out of control

• Leverage spot instances to reduce costs, especially for bursty workloads

• S3 Can be very slow to run and initialize large workloads

• especially in recovery scenarios

• but data resiliency is not an issue

Page 31: Big data in the cloud - Shekhar Vemuri

www.clairvoyantsoft.com

@shekharvlinkedin.com/in/shekharvemuri