Upload
partha69
View
303
Download
3
Tags:
Embed Size (px)
Citation preview
OPEN SOURCE DATA WAREHOUSE
/BI-A PRIMER
Webinar session for TechGig.com
Presentor –Parthasarathi Doraisamy
Enterprise BIDI Solutions
1
CLOUD --WHAT DOES THIS MEAN?
UC Berkeley RAD Lab definition:
1. The illusion of infinite computing resources available on
demand, thereby eliminating the need for Cloud Computing
users to plan far ahead for provisioning
2. The elimination of an up-front commitment by Cloud users,
thereby allowing companies to start small and increase hardware
resources only when there is an increase in their needs; and
3. The ability to pay for use of computing resources on a short term
basis as needed (e.g., processors by the hour and storage
by the day) and release them as needed, thereby rewarding
conservation by letting machines and storage go when they are
no longer useful.
2
REFERENCES/ACKNOWLEDGEMENT
Talend
Pentaho
Birt-eclipse
Birst
Jaspersoft
Greenplum
ASA –ODW model
Gartner research analysis
TDWI
3
WHAT IS OPEN DW/BI?
Beware:Open doesn‘t means the product(s) are free!!!!!!!!
Open DW consists of pre designed,prebuilt Data warehouse architecture which comes free
Thereby it reduces overall cost and risk by reducing design,development and implementation time
-> Reduces consumer‘s initial development cost(DQ,ETL,BI & Analytics etc.)
But the vendors charge for the related services in maintainig the DW solution,further customizing to their exact business need ,Support & maintenance of the system.
Mitigates the risk through Rapid development
There are technical, social, and economic reasons that will move data warehousing and, perhaps all data models toward ‗open‘ solutions
4
NEED FOR OPEN DW/BI
Open data warehouse,BI development progressed rapidly over the past few years due to compelling economic downturn
Faster deployment need of the proposed solution due to dynamic business changes
Now a days we can get‗Open Source‘ product for almost every aspect of the BI/Data warehouse stack including architectures which are picking up pace.(Few noticable players Talend,Pentaho,Jaspersoft,Birst .Qlikview etc.)
5
INDUSTRY STATS ON TRADITIONAL DWBI
The average cost of these projects was $2.2
million ($3.1 million today, adjusted for inflation).
The average payback period was 2.3
years, with over 30% experiencing a 5+ year
payback period.
The majority of respondents reported that their
data warehouses consumed enormous
resources and remained ―works in progress‖ for
extended periods of time.
6
NEED FOR OPEN DW/BI ….
Popular open source databases which help
in these Open data warehouse are MySql
(and its eco-system of add-
ons), Ingres, EnterpriseDB.
Hardware,software cost considerations are
further reduced by extending the Open
solution in the hosted SaaS environment.
7
ODW MODEL –A FRAMEWORK
Open Data Warehouse Model (ODWM) provides a generic framework for delivering an Open data warehouse
This generic data warehouse model can be further fine tuned to specific industry
Domain experts work upon these specific industry solutions just like in typical proprietary DW/BI solutions earlier,but differ in certain critical aspects like pre-design of Open DWBI architecture –data model,Etl design,BI design for the
concerned industry domains
8
ODW MODEL PRINCIPLE
The Open Datamodel consists of Hundreds of potential dimension tables with thousands of fields which forms the ―Foundation‖
These Open data warehouse are carefully designed to ensure stability of the DW system and easily facilitates the use of commercial ETL bridges/connectors
(yet allow for interpretation through aggregation and by other means)
OLAP cubes and data marts can be constructed from the foundation as required by the business through similar bridges/connectors
These are the potential opportunity for Developers in their respective technology-ie.ETL,BI & Analytics area to come up with appropriate bridge solutions to seamlessly develop the entire ODW & BI model into a functional datamart,Enterprise Data warehouse
9
ODW MODEL & ITS EXTENSIONS…..
They must allow for integration of multiple data sources of different granularity ;should in some manner, accommodate slowly changing dimensions
Each of the baseline ODW Db instance model can further create a range of domain specific(we can call it a Industry‘Slice‘) packaged solutions.These package may comprise of DQ,ETL,BI solution as outlined earlier.
These package solutions comprises of
Host the domain specific ODW solution(s) in the cloud .
These hosted Open DWBI solutions leads us to the packaged Data warehouse/BI Appliances 10
OPEN DATAWAREHOUSE/BI APPLIANCE
11
OPEN DWBI APPLIANCES ……
The Open DWBI Appliance combines and supports thousands of data warehouses, many of those with hundreds of millions of records in a scalable multi-tenant environment.
These appliances got the capablity to generate complex datamodels, complex algorithms inbuilt within their query engine
These appliance vendors tie up with Hardware suppliers to construct the appliance in such a way for performing to its maximum efficiency
12
OPEN DWBI APPLIANCES ……
These appliances are designed to power an
on-demand software solution that needs to
support a large number of users
simultaneously and has the ability to quickly
increase capacity
Built on a shared-nothing architecture and no
data is shared across nodes (servers).
Popular appliances are
Nettezza,Greenplum..
13
MULTIPLE APPLIANCES FOR ENTERPRISE NEED
14
DWBI APPLIANCES –SALENT FEATURES
High Availability and Failover Support
Designed for operation in a high-availability clustered Open DWBI environment
Global Cache
Provides superior query performance via its massive-scale caching capabilities
Simplified software Deployment and Upgrades in Place
Dramatically simplifies its deployment by freeing IT from having to worry about resolving potentially complex OS compatibility issues, library dependencies or undesirable interactions with other applications.
15
DWBI APPLIANCES –SALENT FEATURES….
Advanced ETL Services and a complete analytical data warehouse with automated warehouse generation
Cloud Connectors, for connecting to operational cloud applications- Eg.Salesforce.com,Google Analytics
These Connecters allow for automatic uploading of data into the appliance from various sources
Live Access, which allows you to analyze data from on-premise data warehouseswithout uploading
16
SAAS BASED OPEN BI SOLUTION
17
SAAS –OPEN BI SOLUTION…..
Low-cost, open source solution.
End-to-end, integrated BI and ETL
capabilities.
Full enterprise-level support.
Flexibility of on-demand and on-premise
deployment.
Support for mobile devices as a BI platform.
Support for iterative IT and business-user
report generation process.18
CLOUD --WHAT DOES THIS MEAN?
Depends upon how you slice it vertically
• IaaS -AWS, GoGrid, Mosso
• PaaS -Google App Engine, Microsoft Azure
• SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,
Pentaho,BIRT etc.
19
AGILE BI-ASTER,CHEAPER,BETTER….
20
CLOUD --WHAT DOES THIS MEAN?
21
ODW -WHEN TO USE THE CLOUD?
Transient application lifespan or use
Quick start required
Budget pressure
Variable use/scale of application unknown
IT unavailable/unresponsive
22
SAAS –OPEN DWBI
23
KEY FINDINGS FOR BUSINESS TRANSITION TO
CLOUD TECHNOLOGY(IN 2009)
By 2012, at least 50% of direct commercial revenue attributed to open-source products or services will come from projects under a single vendor's patronage.
Through 2011, less than 50% of Global 2000 IT organizations will have implemented a formal open-source adoption and management policy as part of an enterprise software asset management strategy.
Through 2013, 50% of mainstream IT projects using open-source software (OSS) will not achieve cost savings over closed-source alternatives.
Through 2013, 90% of market-leading, cloud-computing providers will depend on OSS to deliver products and services.
24
MOVING TO CLOUD-RECOMMENDATIONS
Expect vendors to play an increasing role in the governance of many market-leading, open-source solutions during the next several years.
Move aggressively to establish an effective enterprise adoption policy, and bring OSS and hardware under asset management controls.
Do not expect to automatically save money with OSS or any technology without effective financial management. Do expect to carefully manage open-source solutions in the appropriate scenarios to realize total cost of ownership (TCO) advantages.
Manage cloud-based software strategies and open-source strategies together for maximum effect. Look for synergies between both, and the ability of OSS to move your workloads to the cloud.
25
STRATEGIC PLANNING ASSUMPTION(S)
By 2012, at least 50% of direct commercial revenue attributed to open-source products or services will come from projects under a single vendor's patronage.
Through 2011, less than 35% of Global 2000 IT organizations will have implemented a formal open-source adoption and management policy.
Through 2013, 50% of mainstream IT projects using OSS will not achieve cost savings over closed-source alternatives.
Through 2013, 90% of market-leading, cloud-computing providers will depend on OSS to deliver products and services.
26
CLOUD USAGE BY VARIOUS ORGANIZATIONS..
27
OPENSOURCE BI TOOLS
28
TDWI RESEARCH STUDY…
29
SAAS BI PROCESS FLOW
30
HARDWARE ACCESS IN CLOUD OPEN DW/BI…
Secure access via web,RDC,VPN or combo..
Customized server(Choose ur own
CPU,RAM,Disk space)
Scale up your capacity anytime
Level 2,3 Server support incl 24 * 7
monitoring service
Applicaton support on demand
Integrate with your local & Global IT groups
31
SECURITY ASPECTS IN CLOUD OPEN DW/BI…
Web,RDC,VPN or a combo
Firewalls
Certified Data center –SAS 70 type II
NDA
Virus protection
32
MDM
MDM success for enterprise open source
DWBI implementation—
High quality master data is extremely
valuable to enterprise business
processes and analytics
33
MDM-KEY CONSIDERATIONS
Some key considerations for creating a master reference data source are outlined below:
Central master reference data model
Mapping
Populating the master
Publish data
Access and provisioning
Ownership and process
34
MDM CHECKLIST
MDM provides the system in obtaining the
―Single version of truth‖ across the various
applications within the enterprise(despite the
disparity of source systems)
The following checklist provides functional
requirements for implementing and deploying
MDM in an enterprise environment :
.
35
MDM CHECKLIST –FUNCTIONALITY COVERED
Profiling,
Modeling
Data quality
Data Stewardship & Governance -Hierarchy
management & security
Workflow administration
36
MDM-ACTIVE DATA MODEL ….
Multi-Domain capability
Object-Oriented Data Modeling
Domain Templates
Basic Data Validations and Business Rules
Graphical Modeling Tool
Multiple Language Support
37
MDM-DOMAIN INTEGRATION
Complete Data Integration Functionality
Automated Services-Based Integration
Real-Time and Batch Integration
SOA Manager/Console
38
MDM-DQ INTEGRATION WITH ETL,BI
Data Profiling
Accurate Data Match and Merge
Data Bucketing and Blocking
Data Augmentation
Advanced Data Validations and Business Rules
Data Standardization
Data Cleansing
39
MDM-DATA STEWARDSHIP & GOVERNANCE
Hierarchy Management – Multiple and Recursive Hierarchies
Hierarchy Import and Overlays
Business Process Management (BPM) and Workflow
Automated Data Survivorship
Manual Resolution through intuitive GUI interface
40
MDM-ADMINSITRATION
Historical Views of Hub Data
Hub Versioning
Master Data Audit Trail Information
Roles-Based Security and Active Directory Integration
Versioning
41
TALEND MDM SOLUTION –OS PRODUCTS
IBM Eclipse; JBoss Application Server and Portal; eXist Open database;
XSD / XML Schema for the XML data models;
XSLT for data transformation;
Object programming following the EJB 2.1 standards ("Enterprise Java Beans") on Jboss server
XQuery for queries on XML database; Document/literal WSI norm ("Web Service Interoperability") for web services
Bonita for business process management.
42
COST COMPARISION
43
Eg: Total cost for a small project, comparing the use of 3 approaches to
data integration: opensource, proprietary and manual coding
SUMMARISED COST-SMALL ETL PROJECT
44
SUMMARY COST FOR MEDIUM ETL PROJECT
45
ODW /BI --WHY IT WILL SUCCEED IN MARKET
ODW/BI has got lot of winner(financial) groups……..
Owners get low cost rapid entry into a data warehouses they can extend.
Developers get to create/sell new ETL/BI products in a new market(Tool providers)
‗Source‘ vendors can solve reporting problems and advance new ways to compete(Source providers)
Consultants get a bigger market for their services (Service providers).
Domain exerts can participate by creating new open data warehouses using their deep industry knowledge (Service providers).
46
ODW /BI --WHY IT WILL SUCCEED IN MARKET
Development licenses
Training curve
Development time
Run-time licenses
Deployment of hardware and operating
system licenses
IT operations
47
ODW /BI --WHY IT WILL SUCCEED IN MARKET
Maintenance/subscription
Maintenance time
Reliability and predictability of the data
integration processes
48
QUESTIONS?
Any questions,please get in touch with me at
Skype -ebidisolutions
49
Thank You!
50