Upload
dataeaze-systems
View
240
Download
0
Embed Size (px)
Citation preview
Different analytics use cases expect different set of features from data platform
Components part of big data ecosystem
are madeto serve needed features of analytics use cases
dataeaze
Why?
So to understand data platformto understand data platform components
It is necessary to know purposeIt is necessary to know needs of analytics use cases
which are served by data platform
dataeaze
Why?
We analyse each use case as
Nature of data processing in order to serve this use case
Expectations from data platform to enable required data processing
dataeaze
What?
Static Reports
are summary reports prepared for the purpose of giving status to decision makers
ExampleReport for top management at end of day specifying
daily sales, transactions, revenue, total traffic
dataeaze
Nature of data processing
Static reports are
Scheduled to execute at fixed time interval,
Generate analysis reports for given time period,
Can execute on raw data directly or on intermediate store
dataeaze
Static Reports
Expectations from data platform
Scheduled data processingStatic reports are executed at predefined schedule repeatedly
Timely arrival of dataGenerated reports should represent complete picture of given
timeframe, and should be generated before deadline.
Process raw data to get resultCapability to generate report from raw data if it cannot be
extracted from intermediate data form
dataeaze
Static Reports
Dashboard ReportsDashboard is reporting user interface where user can interactively
choose his own view of data with limited set of filters.
ExampleAn e-commerce company having dashboard for sellers where
sellers get to know how much inventory sold across demographic, across product categories, across time range.
dataeaze
Nature of data processing
Periodically keep on processing raw data to bring it in form required by dashboards
Populate transformed data into interactive store backend of dashboards
dataeaze
Dashboard
Expectations from data platform
ETLTo convert raw data in format required by dashboard
Scheduled data processingTimely repeated executions of ETL jobs to populate
dashboards with latest updates
Interactive data storeDashboard reports are interactive in nature, so backend store
is supposed to return results in near real time
dataeaze
Dashboard
Ad Hoc data analysisThis is for business queries which are raised as per need,
This is not scheduled and is executed one time whenever necessary
ExampleA product manager wanting to know detail analysis about
customer behavior on a navigation panel, so as to define optimised ad placements.
dataeaze
Nature of data processing
Steps to serve an ad hoc report,
Identify data sources which will satisfy given request
Execute data processing (preferable sql like query) on identified source
Load results in data representation tool
dataeaze
Ad Hoc
Expectations from data platform
data processing SQL engineSQL query engine makes it easy to represent required analysis
in form of SQL query, saves analyst’s time
complex data processingA platform which supports writing custom complex data
analysis, which is not possible through SQL
dataeaze
Ad Hoc
BI ReportingBusiness Intelligence tools provide advanced general purpose
dashboards which host wide array of dimensions in backend data store. User can define and save transformations, analysis queries through BI tool and get back reports in tabular or graphical form.
ExampleA BI report representing weekly sales stats across multiple regions for previous 6 months. This report is once created and saved. Users
execute saved report whenever they want.
dataeaze
Nature of data processing
Scheduled ETL jobs to convert raw data to required intermediate data form
Data is loaded to interactive SQL data stores
BI tools are connected to SQL data store as backend
dataeaze
BI Reporting
Expectations from data platform
ETLRaw data should be transformed to required format and get
loaded to SQL data warehouse
Scheduling of ETLDefined ETL jobs should be scheduled to execute at fixed time
interval.
data processing SQL engineSQL query engine makes it easy to extract data out, saves
time. BI tools can connect to this SQL data store.
dataeaze
BI Reporting
Data Processing for ApplicationsThis is data processing done to provide feedback input to business applications. Business applications take better decisions based on
latest data feedback.
ExampleAd servers getting periodically updated about latest minimum ecpm to expect for an ad placement getting filled dynamically.
dataeaze
Nature of data processing
Complex data processing (machine learning) on raw data
Scheduled data processing
Update result into interactive key-value store which get fetched directly from applications
dataeaze
App data processing
Expectations from data platform
Capability to implement custom complex data processingUser should be able to easily define custom complex data processing
algorithms (like machine learning)
Scheduled data processingRequired for periodic execution of data processing jobs
dataeaze
App data processing
Real time stream data processingIt is analysing an event as soon as it happens. Sooner the analysis
better is value obtained from it.
ExampleStock ticker getting displayed on yahoo finance
dataeaze
Nature of data processing
As soon as event happens its log entry is collected
All log entries are buffered, made available for processing layer.
Pull records from message buffer and perform processing on it.
dataeaze
Real time stream
Expectations from data platform
Scalable message bufferA message buffer to keep received messages which are pulled
from this buffer for processing
Real time stream processing engineTo pull and process records in real time. Provide user ability to
define custom data processing.
dataeaze
Real time stream
Super set of expectations
Expectation / Capability Use caseNeeded by
Complex data analysis using query
language
Scheduled ETL data processing
Data store for interactive data
analysis
Data ingestion with timely arrival of
data
Scalable message buffer to be
consumed by stream data processing
Streaming data processing platform
Static reports
ad hoc data analysis
BI reporting
Dashboard reports
app specific data processing
Real time stream data processing
Summarise all
dataeaze
We have identified common set of features expected from data platform
by most of analytics use cases
Let us map these to data platform components
Conclude
dataeaze
Capabilities provided by data platform components
Expectation / Capability Data platform component
Supported by
Complex data analysis using query
language
Scheduled ETL data processing
Data store for interactive data
analysis
Data ingestion with timely arrival of
data
Scalable message buffer to be
consumed by stream data processing
Streaming data processing platform
Data Ingestion
Batch data processing
Workflow scheduler
Interactive data stores
Message buffers
Real time stream
engine
Data Platform Tools
Flume, Kafka, Scribe
Hive, Mapred
Oozie
Hbase, Spark, ..
Kafka
Storm, Spark
Conclude
dataeaze
Going backwordsNow you know about
Data platform components
capabilities supported by those
satisfying features of analytics use cases
Conclude
dataeaze