12
Operational Data Applications on Hadoop http://cask.co

Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

Operational Data Applications on Hadoop

http://cask.co

Page 2: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

PROPRIETARY & CONFIDENTIAL

“If big data analytics can’t capture the enthusiasm of developers, its effectiveness will be hindered. Developers are the key to carrying

functionality over the last mile to real-life end-users.”Andrew Brust, Gigaom Research, Outlook: Big Data and analytics in 2015

Big Data Evolution

EDW Offload Science and Viz Data Apps

Spring XD

Web Scale

2

Page 3: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

Core HadoopHDFS, MR

2006

HbaseZooKeeper

Core Hadoop

2008

HivePig

MahoutHbase

ZooKeeperCore Hadoop

2009

SqoopWhirrAvroHivePig

MahoutHbase

ZookeeperCore Hadoop

2010

FlumeBigtopOozie

MRUnitHCatalog

SqoopWhirrAvroHivePig

MahoutHbase

ZookeeperCore Hadoop

2011

SparkImpala

SolrKafkaFlumeBigtopOozie

MRUnitHCatalog

SqoopWhirrAvroHivePig

MahoutHbase

ZookeeperCore Hadoop

2012

ParquetSentrySparkImpala

SolrKafkaFlumeBigtopOozie

MRUnitHCatalog

SqoopWhirrAvroHivePig

MahoutHbase

ZookeeperCore Hadoop

Present

Technology Explosion

Page 4: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

Challenges

Page 5: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

CASK DATA APP PLATFORM

CDAP is an integrated, distributed and a extensible platform for building and managing data applications and data on Hadoop

Page 6: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

Simplify

What is Hadoop Hadoop Challenges

Innovate

Accelerate

A complex distributed system with low-level APIs

Not just a batch bit bucket! It's more!

Different processing paradigms require your data to be organized certain way and when you have to support multiple ways of processing your data - Forces making deep architec-tural choices upfront - No Future Proofing

All data within CDAP is always available for realtime, batch and adhoc processing without reoganizing your data.

It's difficult to build complex data patterns that would maintain consistency and guarantee correctness of your data at all times.

A collection of multiple open source projects

Must understand and hand-code integration between multiple technologies which is cumbersome and error prone

An integrated platform that provides conceptual integrity and removes the need for boilerplate code

Built-in capabilities for data ingestion and process-ing to get going in minutes (to remove the data integration bottleneck)

Provides an enterprise-ready production runtime environment for developed applications to move into production in weeks

Simple tasks like data ingestion and ETL are overly complicated and time consuming

Moving from proof-of-concept to production is difficult and can take months or quarters

Specialized skills are required for using Hadoop preventing most developers from effectively building solutions.

No clear separation between business logic and infrastructure APIs increases application complexity and total cost of ownership

The inability to perform automated testing of end-to-end solutions leads to manual processes and unpredictable delivery times

Provides framework level correctness across all processing paradigms.

Support scale out ingestion and processing in realtime and batch with a massive throughput.

Supports high-level concepts and abstractions familiar to developers that enable them to use their existing skills to build new solutions on Hadoop

Abstractions hide infrastructure complexity and enable reusability leading to a substantial reduction in application code and improved maintainability

Testing frameworks and developer/devops tools provide greater reliability and predictability for your solution, from development to QA to production.

Data ingestion in realtime and batch currently is piecing different technologies together so they work well in tandem and can be feat.

Page 7: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

Datasets

Programs • Standardized containers providing consistency for diverse processing paradigms

• Services for developers to enable richer apps with less hassle; and production to enable application and data management

• Libraries to build reusable data access patterns spanning multiple storage technologies

Runtime Services

7

Programs

Batch Programs Realtime Programs

CASK DATA APPLICATION PLATFORM (CDAP)

Event /DataIngestion

Tools andUser Experience

Datasets

Runtime Services

Egress

Adapters

Data ApplicationExamples

Anomaly Detection

360o

Consumerprofile

NetworkAnalytics

Multi-logCorrelation

Analytics

+ Ingestion, Egress, Tools & User Experience

Integrated Platform for Hadoop Solutions

Page 8: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

PROPRIETARY & CONFIDENTIAL8

CDAP Functional Architecture

Event /DataIngestion

Push

Batch Apps

UserDefined

App

Realtime Apps

Anomaly detection

DDoS attack

detection

Cohort Analysis

Adapters

Datatransformation

TimeseriesDataset

OLAP cube Dataset

ThresholdDataset

ObjectStore Dataset

KeyValue Dataset

Dataencryption

Dataharmonize

UserDefined Adapter

UserDefined Dataset

UserDefined

App

NotificationService

Metrics & Logging Service

Metadata Management

Service

Transaction Service

SecurityService

Data Discovery and

Management Service

App Deployment & Management

Service

Config / Preference

Management Service

System Services

Datasets

Operational Analytics Application

Realtime

Egress

Batch

Pull

Pipes

DB Sync

Dropzone

Edge Aggregation /

Transformation

HTTP

JDBC

SQL

TCP*

Tools

Console

REST API

Performance* Framework

Testing Framework

CLI

Correlation Modeling

Page 9: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

Data App Use-cases

Telco & Media Financial Services SaaS

Web Log Analytics

Ad Targeting

Event Monitoring

IoT / Mobile Apps

Market Data Services

Network Security

Fraud Detection

Consumer Targeting

Customer Call Centers

Social Media Monitoring

Network Optimization

Location Analytics

Data Apps turn insights into action.

Data Applications combine real-time events with historical data to deliver actionable, operational intelligence.

Page 10: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

10

CDAP on Hadoop compared to Hadoop alone

Lines of code 82% reduction

Development time 86% reduction

Other advantages

• Cyclomatic complexity• Testability• Code readability and maintenance• Application deployment and maintenance• Egress support for application data• Knowledge transfer

Actual Developer’s ExperienceTop 5 SaaS Company

Page 11: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

11 PROPRIETARY & CONFIDENTIAL

Data application platform for Hadoop

Real-time streamingfor the real world w/ AT&T Labs

Clusters with a click

CASK DATA APP PLATFORM

cdap.io or cask.co/product

coopr.io

tigon.io

Thread Abstraction on YARN

Transaction for Apache HBase

tephra.io

Cask OSS Technologies

Page 12: Operational Data Applications on Hadoopfiles.meetup.com/17533002/CASK-Nitin-Motgi.pdf · Specialized skills are required for using Hadoop preventing most developers from effectively

PROPRIETARY & CONFIDENTIAL

Thank You

12

@nmotgi

Build Analytics App Today

Download : http://cask.co/downloads

[email protected]