Upload
hganesh
View
85
Download
1
Tags:
Embed Size (px)
Citation preview
Hadoop in Business Intelligence - Ganesh Hegde
What is BI? Business intelligence (BI) is the set of techniques and
tools for the transformation of raw data into meaningful and useful information for business analysis purposes.
The goal of BI is to allow for the easy interpretation of large volumes of data. Identifying new opportunities and implementing an effective strategy based on insights can provide businesses with a competitive market advantage and long-term stability
-Wikipedia
Some of the BI activities are ETL process Data warehousing Dimensional Modeling Online Analytical Processing (OLAP) Constructing multidimensional Cubes Performance Tuning Reporting Analytics Data Mining Predictive Analytics and Prescriptive Analytics. etc
Facts in BI
At least for couple of decades, a number of different data structures and technologies have been introduced to increase performance or enable a BI capability; many of these are self-service oriented, and they all deliver different levels of capabilities depending on the problem they are intended to solve.
For exampleThe process of moving and transforming operational data to an operational data store (ODS), then to enterprise data warehouses (EDW) or to some OLAP system is often made to improve performance by business people, particularly for interactive analysis. Business rules are needed to interpret data and to enable BI capabilities such as drill up/drill down. The more business rules built into the data stores, the less modeling effort needed between the curated data and the BI deliverable
What is Hadoop?Apache Hadoop is an open-source softwareframework for storage and large-scale processing of data sets on clusters of commodity hardware.
-Wikipedia
Hadoop can handle Big Data(Data of high volume, velocity and unstructured/semi structured)
Hadoop EcosystemThere many projects around Hadoop to extend and enhance its functionalities. Some of them are Hive: It is an abstract layer, allows user to use SQL like language to query the Hadoop cluster.Hbase: NoSQL, column oriented Database, which can store massive amount of data.Sqoop: Provides methods to import/export data from/to RDBMS and Hadoop Distributed File System
Analytics for Hadoop can be done by the following software
- R- SAS- Matlab
How ever doing analytics like this can feel a little pedantic and time consuming. Business Intelligence tools (BI tools) can address this problem
BI Tools Support Hadoop
Connect with
BI Tools
Manage, Import/export data into/out
of Hadoop
Pre built
analytics
Predictive
Analytics
Time series
Interactive
Dashboards
Hadoop
Hbase
Cassandra
MongoDB
Datameer Y Y Y Y Y Y
Tableau Y Y
Pentaho Y Y Y Y Y Y Y Y
Since Hadoop is gaining popularity in handling Big Data, a lot of BI tools have added support to Hadoop. Some of them are as follows.
In this traditional system if any of the operational systems produce Big Data(volume, variety, unstructured)then the subsequent jobs would fail to process the data
In this scenario, different kind of data from various sources are handled by Hadoop, finally the data will be loaded to target systems
Data from OLAP system & Hadoop cluster are virtualized. Analyst will use and modify the virtualized data in Dashboard.
Historical data in DW can be archived to Hadoop cluster, which can fetched with SQL(by means of Hive)
Data from OLAP system & NoSQL DB are virtualized. Analyst will use and modify the virtualized data in Dashboard.