22
INSTRUMENTING YOUR INSTRUMENTS Premal Shah Co-Founder @ 6sense Hadoop Summit 2016

Instrumenting your Instruments

Embed Size (px)

Citation preview

Page 1: Instrumenting your Instruments

INSTRUMENTING YOUR INSTRUMENTS

Premal ShahCo-Founder @ 6senseHadoop Summit 2016

Page 2: Instrumenting your Instruments

AGENDA

What does 6sense do?How do we do it?What does the pipeline look like?Where do we do it?What are the challenges?How are we planning to solve them?

Page 3: Instrumenting your Instruments

WHAT DOES 6SENSE DO?

• We find prospects that are in market to buy• We empower marketing and sales teams

Page 4: Instrumenting your Instruments

SAMPLE OUTPUTAccount Name Buying Stage Profile Fit

ACME Corporation Purchase Strong

ABC Corp Decision Strong

XYZ Systems Consideration Medium

Doe Inc Awareness Strong

PURCHASE

DECISION

CONSIDERATION

AWARENESS

Page 5: Instrumenting your Instruments

HOW DO WE DO IT?

1st Party WebCRM

Marketing Automati

on

3rd Party• Web• Search • Ad

Impressions

Modelling & Scoring

Actionable Data for the

Customer

Page 6: Instrumenting your Instruments

Customer Systems

WHAT DOES THE PIPELINE LOOK LIKE?

Customer

Systems

Ingest

Process

Export

Customer

Systems

Page 7: Instrumenting your Instruments

THE DAILY PROCESS GRAPH (DAG)

Page 8: Instrumenting your Instruments

THE REAL WORLD

Page 9: Instrumenting your Instruments

THE REAL WORLD * N

Page 10: Instrumenting your Instruments

PIPELINE COMPONENTS

Hadoop Eco System

YARN

Hive

Presto

Mesos World

Mesos

Chronos

Marathon

Page 11: Instrumenting your Instruments

WORKFLOW

Chronos Queue Marathon

JobsHadoop

HivePrestoPython

Page 12: Instrumenting your Instruments

WHERE DO WE DO IT?

• AWS─ Elastic─ Easy to experiment─ No CAPEX

• Hadoop─ Data Nodes are run separately from Node Managers─ Most of the data sits in S3

Page 13: Instrumenting your Instruments

PROJECT RAVEN

Page 14: Instrumenting your Instruments

WHAT AFFECTS PERFORMANCE

• Hive─ Joins ─ Non-Partitioned tables─ Filters─ Bucketing

• Hadoop─ File format─ Compression─ Data Locality

Page 15: Instrumenting your Instruments

METRICS THAT MATTER• # of Mappers

• # of Input Files

• # of Input Records

• # of Records passed on to the next stage

• Time taken in─ Mappers─ Copy─ Shuffle─ Reducers

• # of Reducers

• # of compressed vs uncompressed files

• File formats

• Etc.

Page 16: Instrumenting your Instruments

WHAT DO WE STORE?

• Job Name 1─ Date 1

o Yarn Job # 1 Metrics

o Yarn Job # 2 Metrics

─ Date 2o Repeat as above

• Job Name 2─ Repeat as above

Page 17: Instrumenting your Instruments

WHAT DO WE USE THEM FOR?

• Finding the Job that ─ Is the slowest─ Process the most files─ Filter out most of the data─ Use the most amount of memory

• Observe trends over time in the above metrics

• Get alerted on changes in the trends, both up and down

Page 18: Instrumenting your Instruments

RECOMMENDATIONS

• Storage Format

• Compression Type

• Partition Columns

• Bucketing

• Etc.

Page 19: Instrumenting your Instruments

OPTIMIZATIONS

• Which job is causing the bottleneck?

• How many errors can we tolerate?

• Which job is the biggest offender?

• Which job fails the most?

• What did the latest release do?

Page 20: Instrumenting your Instruments

SCALING

• Can we scale the number of customers?

• What does it cost to add a customer?

• What does it cost to add a job to each customer’s pipeline?

Page 21: Instrumenting your Instruments

VENDOR SHOUT OUT

• ClusterK (now AWS Spot Fleet)─ Allows us to use different instance types to load balance and reduce costs

• Sumo Logic─ Detect variances in behavior over a custom time period

• OpsClarity─ Collects, monitors and alerts on the following metrics

o AWS Cloud Watch metrics (Queue length, S3 bucket size, etc.)o Host metrics (CPU, Memory, Disk Space, etc.)o Service metrics (YARN, HBase, Mesos, etc.)o Container metrics - Dockero Custom metrics – Anything else you want to send

Page 22: Instrumenting your Instruments

THANK YOU

• premal at 6sense.com

• https://www.linkedin.com/in/premaljshah