Upload
parthasarathi-doraisamy
View
230
Download
0
Embed Size (px)
Citation preview
8/11/2019 OpenSource DWBI_A Primer
1/50
OPEN SOURCE DATA WAREHOUSE
/BI-A PRIMER
Webinar session for TechGig.com
PresentorParthasarathi Doraisamy
Enterprise BIDI Solutions
1
8/11/2019 OpenSource DWBI_A Primer
2/50
CLOUD --WHAT DOES THIS MEAN?
UC Berkeley RAD Lab definition:
1. The illusion of infinite computing resources available ondemand, thereby eliminating the need for Cloud Computingusers to plan far ahead for provisioning
2. The elimination of an up-front commitment by Cloud users,thereby allowing companies to start small and increase hardwareresources only when there is an increase in their needs; and
3. The ability to pay for use of computing resources on a short term
basis as needed (e.g., processors by the hour and storageby the day) and release them as needed, thereby rewardingconservation by letting machines and storage go when they areno longer useful.
2
8/11/2019 OpenSource DWBI_A Primer
3/50
REFERENCES/ACKNOWLEDGEMENT
Talend Pentaho Birt-eclipse
Birst Jaspersoft GreenplumASAODW model Gartner research analysis TDWI
3
8/11/2019 OpenSource DWBI_A Primer
4/50
WHAT IS OPEN DW/BI?
Beware:Open doesnt means the product(s) are free!!!!!!!!
Open DW consists of pre designed,prebuilt Data warehouse architecture whichcomes free
Thereby it reduces overall cost and risk by reducing design,development and
implementation time
-> Reduces consumers initial development cost(DQ,ETL,BI & Analytics etc.)
But the vendors charge for the related services in maintainig the DWsolution,further customizing to their exact business need ,Support &
maintenance of the system.
Mitigates the risk through Rapid development
There are technical, social, and economic reasons that will move datawarehousing and, perhaps all data models toward open solutions
4
8/11/2019 OpenSource DWBI_A Primer
5/50
NEED FOR OPEN DW/BI
Open data warehouse,BI developmentprogressed rapidly over the past few years dueto compelling economic downturn
Faster deployment need of the proposedsolution due to dynamic business changes
Now a days we can getOpen Source productfor almost every aspect of the BI/Data
warehouse stack including architectures whichare picking up pace.(Few noticable playersTalend,Pentaho,Jaspersoft,Birst .Qlikview etc.)
5
8/11/2019 OpenSource DWBI_A Primer
6/50
INDUSTRY STATS ON TRADITIONAL DWBI
The average cost of these projects was $2.2million ($3.1 million today, adjusted for inflation).
The average payback period was 2.3 years,
with over 30% experiencing a 5+ year paybackperiod.
The majority of respondents reported that their
data warehouses consumed enormousresources and remained works in progress for
extended periods of time.
6
8/11/2019 OpenSource DWBI_A Primer
7/50
NEED FOR OPEN DW/BI .
Popular open source databases which helpin these Open data warehouse are MySql(and its eco-system of add-ons), Ingres,
EnterpriseDB.
Hardware,software cost considerations arefurther reduced by extending the Open
solution in the hosted SaaS environment.
7
8/11/2019 OpenSource DWBI_A Primer
8/50
ODW MODEL A FRAMEWORK
Open Data Warehouse Model (ODWM)provides a generic framework for delivering anOpen data warehouse
This generic data warehouse model can befurther fine tuned to specific industry
Domain experts work upon these specificindustry solutions just like in typical proprietaryDW/BI solutions earlier,but differ in certain
critical aspects like pre-design of Open DWBIarchitecturedata model,Etl design,BI designfor theconcerned industry domains
8
8/11/2019 OpenSource DWBI_A Primer
9/50
ODW MODEL PRINCIPLE
The Open Datamodel consists of Hundreds of potential dimension tableswith thousands of fields which forms the Foundation
These Open data warehouse are carefully designed to ensure stability ofthe DW system and easily facilitates the use of commercial ETLbridges/connectors
(yet allow for interpretation through aggregation and by other means)
OLAP cubes and data marts can be constructed from the foundation asrequired by the business through similar bridges/connectors
These are the potential opportunity for Developers in their respectivetechnology-ie.ETL,BI & Analytics area to come up with appropriate bridgesolutions to seamlessly develop the entire ODW & BI model into afunctional datamart,Enterprise Data warehouse
9
8/11/2019 OpenSource DWBI_A Primer
10/50
ODW MODEL & ITS EXTENSIONS..
They must allow for integration of multiple datasources of different granularity ;should in somemanner, accommodate slowly changing dimensions
Each of the baseline ODW Db instance model canfurther create a range of domain specific(we can callit a IndustrySlice) packaged solutions.Thesepackage may comprise of DQ,ETL,BI solution asoutlined earlier.
These package solutions comprises of Host the domain specific ODW solution(s) in the
cloud .
These hosted Open DWBI solutions leads us to thepackaged Data warehouse/BI Appliances 10
8/11/2019 OpenSource DWBI_A Primer
11/50
OPEN DATAWAREHOUSE/BI APPLIANCE
11
8/11/2019 OpenSource DWBI_A Primer
12/50
OPEN DWBI APPLIANCES
The Open DWBI Appliance combines andsupports thousands of data warehouses, manyof those with hundreds of millions of records in a
scalable multi-tenant environment. These appliances got the capablity to generate
complex datamodels, complex algorithms inbuiltwithin their query engine
These appliance vendors tie up with Hardwaresuppliers to construct the appliance in such away for performing to its maximum efficiency
12
8/11/2019 OpenSource DWBI_A Primer
13/50
OPEN DWBI APPLIANCES
These appliances are designed to power anon-demand software solution that needs tosupport a large number of users
simultaneously and has the ability to quicklyincrease capacity
Built on a shared-nothing architecture and no
data is shared across nodes (servers). Popular appliances are
Nettezza,Greenplum..
13
8/11/2019 OpenSource DWBI_A Primer
14/50
MULTIPLE APPLIANCES FOR ENTERPRISE NEED
14
8/11/2019 OpenSource DWBI_A Primer
15/50
DWBI APPLIANCES SALENT FEATURES
High Availability and Failover Support Designed for operation in a high-availability clustered Open DWBI
environmentGlobal Cache
Provides superior query performance via its massive-scale
caching capabilities
Simplified software Deployment and Upgrades in Place
Dramatically simplifies its deployment by freeing IT from having to
worry about resolving potentially complex OS compatibility issues,library dependencies or undesirable interactions with otherapplications.
15
8/11/2019 OpenSource DWBI_A Primer
16/50
DWBI APPLIANCES SALENT FEATURES.
Advanced ETL Services and a completeanalytical data warehouse with automatedwarehouse generation
Cloud Connectors, for connecting to operationalcloud applications- Eg.Salesforce.com,GoogleAnalytics
These Connecters allow for automatic uploading
of data into the appliance from various sources Live Access, which allows you to analyze data
from on-premise datawarehouseswithoutuploading
16
8/11/2019 OpenSource DWBI_A Primer
17/50
SAAS BASED OPEN BI SOLUTION
17
8/11/2019 OpenSource DWBI_A Primer
18/50
SAAS OPEN BI SOLUTION..
Low-cost, open source solution.
End-to-end, integrated BI and ETLcapabilities.
Full enterprise-level support.
Flexibility of on-demand and on-premisedeployment.
Support for mobile devices as a BI platform.
Support for iterative IT and business-user
report generation process. 18
8/11/2019 OpenSource DWBI_A Primer
19/50
CLOUD --WHAT DOES THIS MEAN?
Depends upon how you slice it vertically
IaaS -AWS, GoGrid, Mosso
PaaS -Google App Engine, Microsoft Azure SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,
Pentaho,BIRT etc.
19
8/11/2019 OpenSource DWBI_A Primer
20/50
AGILE BI-ASTER,CHEAPER,BETTER.
20
8/11/2019 OpenSource DWBI_A Primer
21/50
CLOUD --WHAT DOES THIS MEAN?
21
8/11/2019 OpenSource DWBI_A Primer
22/50
ODW -WHEN TO USE THE CLOUD?
Transient application lifespan or use
Quick start required
Budget pressure Variable use/scale of application unknown
IT unavailable/unresponsive
22
8/11/2019 OpenSource DWBI_A Primer
23/50
SAAS OPEN DWBI
23
8/11/2019 OpenSource DWBI_A Primer
24/50
KEY FINDINGS FOR BUSINESS TRANSITION TO
CLOUD TECHNOLOGY IN 2009)
By 2012, at least 50% of direct commercial revenue attributed toopen-source products or services will come from projects under asingle vendor's patronage.
Through 2011, less than 50% of Global 2000 IT organizations will
have implemented a formal open-source adoption andmanagement policy as part of an enterprise software assetmanagement strategy.
Through 2013, 50% of mainstream IT projects using open-sourcesoftware (OSS) will not achieve cost savings over closed-sourcealternatives.
Through 2013, 90% of market-leading, cloud-computing providerswill depend on OSS to deliver products and services.
24
8/11/2019 OpenSource DWBI_A Primer
25/50
MOVING TO CLOUD-RECOMMENDATIONS
Expect vendors to play an increasing role in the governance ofmany market-leading, open-source solutions during the nextseveral years.
Move aggressively to establish an effective enterprise adoptionpolicy, and bring OSS and hardware under asset management
controls. Do not expect to automatically save money with OSS or any
technology without effective financial management. Do expect tocarefully manage open-source solutions in the appropriatescenarios to realize total cost of ownership (TCO) advantages.
Manage cloud-based software strategies and open-source
strategies together for maximum effect. Look for synergiesbetween both, and the ability of OSS to move your workloads tothe cloud.
25
8/11/2019 OpenSource DWBI_A Primer
26/50
STRATEGIC PLANNING ASSUMPTION S)
By 2012, at least 50% of direct commercial revenueattributed to open-source products or services willcome from projects under a single vendor'spatronage.
Through 2011, less than 35% of Global 2000 ITorganizations will have implemented a formal open-source adoption and management policy.
Through 2013, 50% of mainstream IT projects usingOSS will not achieve cost savings over closed-source
alternatives. Through 2013, 90% of market-leading, cloud-
computing providers will depend on OSS to deliverproducts and services.
26
8/11/2019 OpenSource DWBI_A Primer
27/50
CLOUD USAGE BY VARIOUS ORGANIZATIONS..
27
8/11/2019 OpenSource DWBI_A Primer
28/50
OPENSOURCE BI TOOLS
28
8/11/2019 OpenSource DWBI_A Primer
29/50
TDWI RESEARCH STUDY
29
8/11/2019 OpenSource DWBI_A Primer
30/50
SAAS BI PROCESS FLOW
30
8/11/2019 OpenSource DWBI_A Primer
31/50
HARDWARE ACCESS IN CLOUD OPEN DW BI
Secure access via web,RDC,VPN or combo..
Customized server(Choose ur ownCPU,RAM,Disk space)
Scale up your capacity anytime
Level 2,3 Server support incl 24 * 7monitoring service
Applicaton support on demand
Integrate with your local & Global IT groups
31
8/11/2019 OpenSource DWBI_A Primer
32/50
SECURITY ASPECTS IN CLOUD OPEN DW BI
Web,RDC,VPN or a combo
Firewalls
Certified Data centerSAS 70 type II
NDA
Virus protection
32
8/11/2019 OpenSource DWBI_A Primer
33/50
MDM
MDM success for enterprise open sourceDWBI implementation
High quality master data is extremely
valuable to enterprise businessprocesses and analytics
33
8/11/2019 OpenSource DWBI_A Primer
34/50
MDM-KEY CONSIDERATIONS
Some key considerations for creating amaster reference data source are outlinedbelow:
Central master reference data modelMapping
Populating the master
Publish dataAccess and provisioning
Ownership and process
34
8/11/2019 OpenSource DWBI_A Primer
35/50
MDM CHECKLIST
MDM provides the system in obtaining theSingle version of truth across the various
applications within the enterprise(despite the
disparity of source systems)The following checklist provides functional
requirements for implementing and deploying
MDM in an enterprise environment :.
35
8/11/2019 OpenSource DWBI_A Primer
36/50
MDM CHECKLIST FUNCTIONALITY COVERED
Profiling,
Modeling
Data quality
Data Stewardship & Governance -Hierarchymanagement & security
Workflow administration
36
8/11/2019 OpenSource DWBI_A Primer
37/50
MDM-ACTIVE DATA MODEL .
Multi-Domain capability
Object-Oriented Data Modeling
Domain Templates
Basic Data Validations and Business Rules
Graphical Modeling Tool
Multiple Language Support
37
8/11/2019 OpenSource DWBI_A Primer
38/50
MDM-DOMAIN INTEGRATION
Complete Data Integration Functionality
Automated Services-Based Integration
Real-Time and Batch Integration
SOA Manager/Console
38
8/11/2019 OpenSource DWBI_A Primer
39/50
MDM-DQ INTEGRATION WITH ETL,BI
Data Profiling
Accurate Data Match and Merge
Data Bucketing and Blocking
Data Augmentation
Advanced Data Validations and Business Rules
Data Standardization
Data Cleansing
39
8/11/2019 OpenSource DWBI_A Primer
40/50
MDM-DATA STEWARDSHIP & GOVERNANCE
Hierarchy ManagementMultiple and RecursiveHierarchies
Hierarchy Import and Overlays
Business Process Management (BPM) and Workflow
Automated Data Survivorship
Manual Resolution through intuitive GUI interface
40
8/11/2019 OpenSource DWBI_A Primer
41/50
MDM-ADMINSITRATION
Historical Views of Hub Data
Hub Versioning
Master Data Audit Trail Information
Roles-Based Security and Active Directory Integration
Versioning
41
8/11/2019 OpenSource DWBI_A Primer
42/50
TALEND MDM SOLUTION OS PRODUCTS
IBM Eclipse; JBoss Application Server and Portal;eXist Open database;
XSD / XML Schema for the XML data models;
XSLT for data transformation;
Object programming following the EJB 2.1 standards("Enterprise Java Beans") on Jboss server
XQuery for queries on XML database;Document/literal WSI norm ("Web ServiceInteroperability") for web services
Bonita for business process management.
42
8/11/2019 OpenSource DWBI_A Primer
43/50
COST COMPARISION
43
Eg: Total cost for a small project, comparing the use of 3 approaches to
data integration: opensource, proprietary and manual coding
8/11/2019 OpenSource DWBI_A Primer
44/50
SUMMARISED COST-SMALL ETL PROJECT
44
8/11/2019 OpenSource DWBI_A Primer
45/50
SUMMARY COST FOR MEDIUM ETL PROJECT
45
8/11/2019 OpenSource DWBI_A Primer
46/50
ODW /BI --WHY IT WILL SUCCEED IN MARKET
ODW/BI has got lot of winner(financial) groups.. Owners get low cost rapid entry into a data
warehouses they can extend. Developers get to create/sell new ETL/BI products in
a new market(Tool providers) Source vendors can solve reporting problems and
advance new ways to compete(Source providers) Consultants get a bigger market for their services
(Service providers). Domain exerts can participate by creating new open
data warehouses using their deep industryknowledge (Service providers).
46
8/11/2019 OpenSource DWBI_A Primer
47/50
ODW /BI --WHY IT WILL SUCCEED IN MARKET
Development licenses
Training curve
Development time
Run-time licenses
Deployment of hardware and operating
system licensesIT operations
47
8/11/2019 OpenSource DWBI_A Primer
48/50
ODW /BI --WHY IT WILL SUCCEED IN MARKET
Maintenance/subscription
Maintenance time
Reliability and predictability of the data
integration processes
48
8/11/2019 OpenSource DWBI_A Primer
49/50
QUESTIONS?
Any questions,please get in touch with me at
Skype -ebidisolutions
49
mailto:[email protected]:[email protected]8/11/2019 OpenSource DWBI_A Primer
50/50
Thank You!