View
223
Download
0
Category
Preview:
Citation preview
BUSINESS INTELLIGENCENEW TECHNOLOGIES, METHODOLOGY IMPLICATIONS, ENTERPRISE ARCHITECTURE AND CONTROL
NEW TECHNOLOGIES – “BIG DATA”
What is it?
New buzz word everyone wants to talk about
What does it mean? Simply, data sets large enough to be not easily
managed and analyzed using standard relational or OLAP toolsets
Where does it apply Born out of web traffic analysis, advertising
targeting and product suggestion
Scientific Applications
Language Analysis
History & Technologies
Historical BI/DW practices driven by four variables: Disk, Memory, Processor Power and Licensing
Publication of a paper by Google on a process called Map Reduce Parallel processing in a highly distributed environment
Many relatively simple machines running Map Reduce processes
HADOOP was born as the Apache implementation of Map Reduce
Challenges Not ACID and no SQL language support
Legacy reporting tools do not understand these sources
NoSQL Key Value Pairs
Document model
NEW TECHNOLOGIES – “BIG DATA”
Vendors
Legacy versus New Challengers (Commercial/Open Source)
Legacy Data Warehouse Vendors: Oracle, IBM, Microsoft, Teradata, Neteeza
Many New Entrants
Cloud Based
Amazon Elastic Map Reduce (EMR)
HADOOP Meets SQL
NuoDB, Cloudera, Cassandra, Accumulo, MS PolyBase
NoSQL
http://nosql-database.org/
MongoDB, CouchDB, RavenDB
Basic Question: Do you have an infrastructure that has multiple BI platforms (Relational/OLAP and HADOOP)? Or wait for one of the legacy vendors to supply enough HADOOP functionality in its core offering to suffice?
NEW TECHNOLOGIES – IN MEMORY ANALYTICS
What is it?
Full (or targeted bits of) data set in system memory
Moving from Appliance based to Cloud based
Initially was analytics focused Self Service Analytics using disparate data sources
No ETL
No central data architecture control
Intended to be high performance
Beginning to spread into the Transactional/Relational space
Becoming Main Stream technology
History & Technologies
Enterprise Deployed – Small & Mid Size Enterprise QlikView was a pioneer in this space
Tabelau
Cloud Based SAP HANA
From SAP or Amazon
$’s per hour of use
Oracle TimesTen
Microsoft SQL Server 2014
NEW TECHNOLOGIES – BI IN THE CLOUD
Traditional Vendors
The “Cloud” has many definitions Virtual Machines versus “Cloud” processes
Major Players Microsoft
Azure
SQL Server progressively moving to Azure
Slowly adding options for higher performance
Reporting Service / HDInsight
Oracle in Azure… Wait.. What?
Amazon Web Services
(Almost) Everyone is welcome
Microsoft, Oracle, SAP HANA, NoSQL, HADOOP
Largely Virtual Machine based
SAP
HANA
Cloud Based BI – New Entrants
Cloud Only Deployment
SaaS pricing
Typically full life cycle solutions ETL Reporting
Vendors Birst
DOMO
GoodData
Indicee
Jaspersoft
NEW TECHNOLOGIES – DATA SERVICES IN THE CLOUD
New Uses
Disaster Recovery
Tight integration of local SQL Engines and Cloud based failovers
Backup and Restore using the Cloud
Complex Event Processing
Microsoft StreamInsight
NEW TECHNOLOGIES – METHODOLOGY IMPACTS – BIG DATA
“We need BIG DATA!”
Sometimes more is not better
John Snows Cholera Map
Discovering the cause of a particular cholera epidemic as well as discovering the general concept of infectious disease was determined by analyzing 620 data points
Location of infections on map of London limited to a particular area. Initial analysis pointed towards water pumps in the vicinity. Confirming data was that Monks only drink beer.
A “Big Data” project might have involved the compilation and analysis of all infection locations worldwide integrated with all activities performed by those individuals. The analysis would have likely have been swamped by noise.
Avoid the temptation to push for bigger and bigger data sets without a clear objective in mind and some scientific reasoning as to why more will be better.
Make sure a limited scope data set is also an option for analysis when looking for specific causation.
Consider the role of Data Scientist within the organization
NEW TECHNOLOGIES – METHODOLOGY IMPACTS – IN MEMORY
“Who needs a data warehouse anymore?”
I blame QlikView for the above statement.
Cloud Based BI tools are heading down the same road. I’m looking at you SAP.
Vendors perceived a market opportunity to gain customers by claiming In Memory technology allowed for the elimination of costs related to data architecture and ETL development
Statements you may hear:
“I don’t need good data architecture because the speed will make up for inefficiencies in joins or storage of the data”
“The users want the flexibility to join to any data source and any time. ETL just slows us down.”
The above runs contrary to another concept that is increasingly gaining traction (finally). Master Data Management.
NEW TECHNOLOGIES – METHODOLOGY IMPACTS – IN MEMORY
“Who needs a data warehouse anymore?”
Observations Results are very mixed
Very hard to maintain a proper Data Governance/MDM process
The best results I’ve seen have involved the use of In Memory tools on top of quality data mart/warehouse environments
There is no free lunch, buying more memory or more virtual servers will only take you so far
BUT, there is some merit here
Pure speed does give you options
We see utility in prototyping new additions to the formal Data Warehouse structure or for giving users some room to roam from the base
Data Governance needs to maintain control
NEW TECHNOLOGIES – ARCHITECTURE & CONTROL
“Beware the Zombie Clouds!”
Clouds are the new Flash drives with regards to data control and security
There a million new low cost Software as a Service options on the market
No or low up front adoption costs
Can be initiated by the user/business side of the enterprise as well as IT personnel outside the data governance process
Many are designed to quickly accept your data and make it easily accessible to an audience (which you don’t control or might not even know about)
Some offer Single Sign-On, but is not required
Some might be quickly abandoned and data is left in a zombie state in perpetuity
Data timeliness and provenance becomes very suspect
NEW TECHNOLOGIES – CONTACT INFO
Paul Dausman
pdausman@valordevelopment.com
twitter: @pdausman
www.valordevelopment.com
www.valianthealth.com
www.techweuse.com
Recommended