Upload
inside-analysis
View
93
Download
3
Tags:
Embed Size (px)
DESCRIPTION
The Briefing Room with Barry Devlin and WhereScape Live Webcast on June 10, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=5230c31ab287778c73b56002bc2c51a The data warehouse is intended to support analysis by making the right data available to the right people in a timely fashion. But conditions change all the time, and when data doesn’t keep up with the business, analysts quickly turn to workarounds. This leads to ungoverned and largely un-managed side projects, which trade short-term wins for long-term trouble. One way to keep everyone happy is by creating an integrated environment that pulls data from all sources, and is capable of automating both the model development and delivery of analyst-ready data. Register for this episode of The Briefing Room to hear data warehousing pioneer and Analyst Barry Devlin as he explains the critical components of a successful data warehouse environment, and how traditional approaches must be augmented to keep up with the times. He’ll be briefed by WhereScape CEO Michael Whitehead, who will showcase his company’s data warehousing automation solutions. He’ll discuss how a fast, well-managed and automated infrastructure is the key to empowering faster, smarter, repeatable decision making. Visit InsideAnlaysis.com for more information.
Citation preview
Grab some coffee and enjoy the pre-show banter before the top of the hour!
The Briefing Room
Smarter Analytics: Supporting the Enterprise with Automation
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
August: BIG DATA ECOSYSTEM
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Analyst: Barry Devlin
Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. With over 30 years of IT experience, he is a widely respected analyst, consultant, lecturer and author. His 2013 book, “Business unIntelligence—Insight and Innovation beyond Analytics and Big Data,” is available as hardcopy and e-book. Barry is founder and principal of 9sight Consulting. He specializes in the human, organizational and IT implications of deep business insight solutions that combine operational, informational and collaborative environments. A regular contributor to BeyeNETWORK and TDWI, Barry is based in Cape Town, South Africa and operates worldwide.
Twitter Tag: #briefr
The Briefing Room
WhereScape
! WhereScape is a data warehousing software company
! It offers WhereScape 3D, software for planning and reality-testing data warehousing and business intelligence projects; and WhereScape RED, an integrated development environment used for building, deploying and managing data warehouses and data marts.
! WhereScape RED allows developers to automate the data warehousing life cycle
Twitter Tag: #briefr
The Briefing Room
Guest: Michael Whitehead
A data warehousing industry veteran, Michael Whitehead has spent more than a decade designing and building commercial data warehouses for customers in a wide variety of industries. Prior to founding WhereScape, Michael had Asia Pacific responsibilities for data warehousing for Sequent Computer Systems, Inc.
Michael Whitehead June 2014
Smarter Analytics
Why were sales down this week
versus last year?
Grocery Store with Class, Walter Watzpatzkowski, 15 /1/09
We promoted ice cream but the
weather was unreasonably
cold Grocery Store with Class, Walter Watzpatzkowski, 15 /1/09
Our competitor ran a better promotion
Grocery Store with Class, Walter Watzpatzkowski, 15 /1/09
1990s - Decision support system (For the time) large amounts of data, stored in various inscrutable file formats and database management systems. Want actionable information? Write a program. One program per analytical problem…. Reporting bureaus
This model’s dysfuncBons created the need for data warehousing…
2000s - Enterprise data warehousing Separate the refinement of raw data – regardless of the source – from the delivery of subsets of that data, to various decision-making constituencies. Build a solid, scalable information delivery infrastructure for the corporation. Support variability, and change, at both ends. Apply appropriate governance, risk management, compliance mechanisms. [And stabilize the supply side of the market, in the process…]
A design paFern for stable, OperaBonalized informaBon
refining and delivery
The economic conditions led to a
change in demographics of
the people walking past my store
Grocery Store with Class, Walter Watzpatzkowski, 15 /1/09
2014 - big data technologies
Large amounts of data, stored in various inscrutable file formats and database management systems. Want actionable information? Write a program. One program per analytical problem…. Oh, and batch-oriented. And integrate-it-yourself.
Instead of JCL, Pig. Instead of CICS and Comshare, Cloudera. In what way is this model a leap forward?
HOW DID WE GET HERE?
People built Data warehouses that don’t support
analytics
Grocery Store with Class, Walter Watzpatzkowski, 15 /1/09
2014 – “self service” technologies Large amounts of data, stored in various inscrutable file formats AND data warehouses. Want actionable information? Create a dataset. One dataset per analytical problem….
The newer tech is great. Is the way it is used a leap forward?
Automation is key for better support
of analytics
Smith Cannery: Extension and Experiment StaBon CommunicaBons Photograph CollecBon (p120)
STEPS 1. Identify attributes
2. Identify business key
3. Index business key and add a unique constraint
4. Create surrogate key with auto sequence generation
5. Index surrogate key
6. Insert zero surrogate key row with values set for each attribute
7. Add a modified timestamp column
8. Write the SQL code to Insert new business keys or Update existing business key rows. Maintain the modified timestamp
9. Create any other indexes required for querying
10. Decide best practice for index maintenance during load. Keep in situ or drop and recreate after load.
11. Document procedure
Etc Etc
Really? 1. Identify attributes
2. Identify business key
3. Index business key and add a unique constraint
4. Create surrogate key with auto sequence generation
5. Index surrogate key
6. Insert zero surrogate key row with values set for each attribute
7. Add a modified timestamp column
8. Write the SQL code to Insert new business keys or Update existing business key rows. Maintain the modified timestamp
9. Create any other indexes required for querying
10. Decide best practice for index maintenance during load. Keep in situ or drop and recreate after load.
11. Document procedure
Etc Etc
What can be automated?
• Profiling
• Model conversion
• Object creation
• Code generation
• Indexing
• Impact analysis
• Documentation
What it will look like? The new data warehouse
The new data warehouse Five Key Changes
Pooling – new types of data, staged differently than we’ve staged pampered data, in the past. A multi-engine “logical” data warehouse: NoSQL à Not Only SQL Support for discovery, prototyping and evaluation of analytics Support for continuing data integration, through to the “end use” tier Automation of the data warehousing platform’s core functionality
Back to best-‐of-‐breed, customer-‐specific IntegraBon models
Conclusion Let’s not stuff it up (again)
• Data people – challenge ourselves to do more, faster
• Analysts – don’t give up on the data people
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Barry Devlin
Copyright © 2014 9sight Consulting, All Rights Reserved
Dr Barry Devlin Founder & Principal
9sight Consulting
Business Intelligence: Smarter Analytics: Supporting the Enterprise with Automation
Bloor Briefing Room 10 June 2014
un ^
Analytics (and big data ) emerged for business with social media and web logs
§ Understanding and tracking sentiment – What do you think? How do you react? – Basic analytics and BI activity on a new
data source
§ Real-time insight into and influence on website activities – Why did you abandon your cart? – What would you most likely buy
on getting a cross-sell? – Deep, real-time analytics and BI
with operational integration
30 Copyright © 2014, 9sight Consulting
§ Extends existing processes – Micro-management of supply chains and
extension all the way to the consumer – Sourcing and delivery
§ Creates completely new business models – Often depending on analytics
– Motor insurance à encouragement & prevention – Hospital care à health monitoring
31 Copyright © 2014, 9sight Consulting
The Internet of Things adds urgency to a new automation of analytics and BI
The biz-tech ecosystem reflects the complexity of today’s business.
32 Copyright © 2014, 9sight Consulting
Business
Information Technology
Information abundance and variety
Customer interaction and technical savvy
Speed of decision and appropriate action
Market flexibility and uncertainty
Competition Mobile devices
Externally-sourced information
The architecture for the biz-tech ecosystem consists of information pillars. § Single architecture for all types of
data/information – Mix/match technology as needed – Relational, NoSQL, Hadoop, etc.
§ Integration of sources and stores – Instantiation gathers measures,
events, messages and transactions – Assimilation integrates stored info. – Reification virtualizes access
§ Data flows as fast as needed and reconciled when necessary – No unnecessary storage or
transformations – (Contrast layered data architecture)
33 Copyright © 2014, 9sight Consulting
Transactions
Human-sourced
(information)
Machine-generated
(data)
Process-mediated
(data)
Context-setting (information)
Assimilation
Transactional (data)
Events Measures Messages
Instantiation
Reification
Information pillars can be mapped to today’s BI and analytics tools and environments.
§ Process-mediated data – Traditional computing – Via data entry, cleansing processes – Relational databases
§ Machine-generated data – Output of machines and sensors – The Internet of Things – NoSQL, Streaming, (RDBMS)
§ Human-sourced information – Subjectively interpreted record of
personal experiences – From Tweets to Videos – Hadoop, Enterprise Content
Management
34 Copyright © 2014, 9sight Consulting
Transactions
Human-sourced
(information)
Machine-generated
(data)
Process-mediated
(data)
Context-setting (information)
Assimilation
Transactional (data)
Events Measures Messages
Instantiation
BI EDW
OLTP
Oper. Analytics
Pred. Analytics
From BI to Business unIntelligence
35 Copyright © 2014, 9sight Consulting
§ Information, knowledge and meaning – Understanding real world context
§ Process, predefined and emergent – Automating the creation and use
of information
§ Beyond bounded rationality – How decisions are really made
§ http://bit.ly/BunI-Technics : 25% discount with code “BIInsights25”
Copyright © 2014 9sight Consulting, All Rights Reserved
Dr Barry Devlin Founder & Principal
9sight Consulting
Thank you!
Additional resources § All articles and white papers available
at: http://bit.ly/9sight_papers
§ Blogs at: http://bit.ly/BD_Blog
§ Follow me on Twitter: @BarryDevlin
36
Questions (1)
1. The Enterprise Data Warehousing architecture of the 2000s (I would say 1990s) was driven by the business need for consistency / reconciliation of data from many sources. It’s perhaps suboptimal for timeliness (real-time data) and maintenance (multiple layers of ETL function). How can the sort of automation you’re proposing help in these two areas?
2. You compare 1980s and 2014 approaches asking how this model is a “leap forward.” One difference is users’ (data scientists) skills with technology. Wouldn’t automation disempower such users?
3. What would a warehouse that “supports Analytics” look like?
4. You say “Automation is the key for better support of analytics,” but how does automation support the agility and flexibility needed for analytics?
5. A big idea in analytics is “model on read.” Automation typically requires/provides “model on write.” How do you address these very opposite needs?
37 Copyright © 2014, 9sight Consulting
Questions (2)
6. Your pooling tier reminds me of the “Data Lake” – of which I’m not a big fan! Why would I want to bring “pampered data” ( I assume traditional data) through this pool? Seems like an additional / unnecessary step?
7. What engines (other than SQL) do you envisage? Which do / will you support?
8. Can you describe what the linkage between the different engines means? If integration how is it done?
9. What data integration support do you envisage in the “end use” tier?
10. Overall, how do you see your existing products evolving to implement the various aspects of this architecture? Does the relational database remain the core component, or do you envisage a more central role for Hadoop, as in Cloudera’s Enterprise Data Hub?
38 Copyright © 2014, 9sight Consulting
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: ANALYTICS & MACHINE LEARNING
July: INNOVATIVE TECHNOLOGY
August: BIG DATA ECOSYSTEM
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!