Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
PROPRIETARY & CONFIDENTIAL
“If big data analytics can’t capture the enthusiasm of developers, its effectiveness will be hindered. Developers are the key to carrying
functionality over the last mile to real-life end-users.”Andrew Brust, Gigaom Research, Outlook: Big Data and analytics in 2015
Big Data Evolution
EDW Offload Science and Viz Data Apps
Spring XD
Web Scale
2
Core HadoopHDFS, MR
2006
HbaseZooKeeper
Core Hadoop
2008
HivePig
MahoutHbase
ZooKeeperCore Hadoop
2009
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2010
FlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2011
SparkImpala
SolrKafkaFlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2012
ParquetSentrySparkImpala
SolrKafkaFlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
Present
Technology Explosion
Challenges
CASK DATA APP PLATFORM
CDAP is an integrated, distributed and a extensible platform for building and managing data applications and data on Hadoop
Simplify
What is Hadoop Hadoop Challenges
Innovate
Accelerate
A complex distributed system with low-level APIs
Not just a batch bit bucket! It's more!
Different processing paradigms require your data to be organized certain way and when you have to support multiple ways of processing your data - Forces making deep architec-tural choices upfront - No Future Proofing
All data within CDAP is always available for realtime, batch and adhoc processing without reoganizing your data.
It's difficult to build complex data patterns that would maintain consistency and guarantee correctness of your data at all times.
A collection of multiple open source projects
Must understand and hand-code integration between multiple technologies which is cumbersome and error prone
An integrated platform that provides conceptual integrity and removes the need for boilerplate code
Built-in capabilities for data ingestion and process-ing to get going in minutes (to remove the data integration bottleneck)
Provides an enterprise-ready production runtime environment for developed applications to move into production in weeks
Simple tasks like data ingestion and ETL are overly complicated and time consuming
Moving from proof-of-concept to production is difficult and can take months or quarters
Specialized skills are required for using Hadoop preventing most developers from effectively building solutions.
No clear separation between business logic and infrastructure APIs increases application complexity and total cost of ownership
The inability to perform automated testing of end-to-end solutions leads to manual processes and unpredictable delivery times
Provides framework level correctness across all processing paradigms.
Support scale out ingestion and processing in realtime and batch with a massive throughput.
Supports high-level concepts and abstractions familiar to developers that enable them to use their existing skills to build new solutions on Hadoop
Abstractions hide infrastructure complexity and enable reusability leading to a substantial reduction in application code and improved maintainability
Testing frameworks and developer/devops tools provide greater reliability and predictability for your solution, from development to QA to production.
Data ingestion in realtime and batch currently is piecing different technologies together so they work well in tandem and can be feat.
Datasets
Programs • Standardized containers providing consistency for diverse processing paradigms
• Services for developers to enable richer apps with less hassle; and production to enable application and data management
• Libraries to build reusable data access patterns spanning multiple storage technologies
Runtime Services
7
Programs
Batch Programs Realtime Programs
CASK DATA APPLICATION PLATFORM (CDAP)
Event /DataIngestion
Tools andUser Experience
Datasets
Runtime Services
Egress
Adapters
Data ApplicationExamples
Anomaly Detection
360o
Consumerprofile
NetworkAnalytics
Multi-logCorrelation
Analytics
+ Ingestion, Egress, Tools & User Experience
Integrated Platform for Hadoop Solutions
PROPRIETARY & CONFIDENTIAL8
CDAP Functional Architecture
Event /DataIngestion
Push
Batch Apps
UserDefined
App
Realtime Apps
Anomaly detection
DDoS attack
detection
Cohort Analysis
Adapters
Datatransformation
TimeseriesDataset
OLAP cube Dataset
ThresholdDataset
ObjectStore Dataset
KeyValue Dataset
Dataencryption
Dataharmonize
UserDefined Adapter
UserDefined Dataset
UserDefined
App
NotificationService
Metrics & Logging Service
Metadata Management
Service
Transaction Service
SecurityService
Data Discovery and
Management Service
App Deployment & Management
Service
Config / Preference
Management Service
System Services
Datasets
Operational Analytics Application
Realtime
Egress
Batch
Pull
Pipes
DB Sync
Dropzone
Edge Aggregation /
Transformation
HTTP
JDBC
SQL
TCP*
Tools
Console
REST API
Performance* Framework
Testing Framework
CLI
Correlation Modeling
Data App Use-cases
Telco & Media Financial Services SaaS
Web Log Analytics
Ad Targeting
Event Monitoring
IoT / Mobile Apps
Market Data Services
Network Security
Fraud Detection
Consumer Targeting
Customer Call Centers
Social Media Monitoring
Network Optimization
Location Analytics
Data Apps turn insights into action.
Data Applications combine real-time events with historical data to deliver actionable, operational intelligence.
10
CDAP on Hadoop compared to Hadoop alone
Lines of code 82% reduction
Development time 86% reduction
Other advantages
• Cyclomatic complexity• Testability• Code readability and maintenance• Application deployment and maintenance• Egress support for application data• Knowledge transfer
Actual Developer’s ExperienceTop 5 SaaS Company
11 PROPRIETARY & CONFIDENTIAL
Data application platform for Hadoop
Real-time streamingfor the real world w/ AT&T Labs
Clusters with a click
CASK DATA APP PLATFORM
cdap.io or cask.co/product
coopr.io
tigon.io
Thread Abstraction on YARN
Transaction for Apache HBase
tephra.io
Cask OSS Technologies
PROPRIETARY & CONFIDENTIAL
Thank You
12
@nmotgi
Build Analytics App Today
Download : http://cask.co/downloads