Upload
octavia-patterson
View
214
Download
0
Embed Size (px)
Citation preview
Big Data. Deep Impact: Revolutionary Warehousing Approach for Insight
John GillespieVP, Information Management Software IBM Asia Pacific
3
Why Smarter Computing?
4
Three yearsago we started describing the Smarter Planetwe saw emerging, fueling innovation across industries.
Manufacturing
ResourceManagement
Telecom
Neonatal Care
Trading
Traffic Control
FraudPrevention
LawEnforcement
5
On a Smarter Planet, successful enterprises are taking a new approach to designing their IT infrastructure to create new opportunities.
Create new marketsin a fraction of timeUniverista di BariReduced time to market for fishermen and farmers with cloud-based solution for real-time trading.
Identify new trends before competitionAcxiomImproved capacity five-fold with no new floor space with cloud-based model improving customer retention and capturing new business.
Deliver new services more quicklyCitigroupReduced provisioning times from 45 days to 20 minutes, improving ability to deploy new banking services to clients.
Utilize IT resources more efficientlyCity of NorfolkImproved storage performance by 40% and cut power consumption in half, enabling it to deploy automated parking systems and police in-car video surveillance.
6
These enterprises are addressing the challenges that emerged during the last era of computing…
1.2 Zetabytes (1.2trillion gigabytes) existin the “digital universe”• 50% YTY growth• 25% of data is unique;
75% is a copy
32.6 million serversworldwide• 85% idle computer
capacity• 15% of servers run 24/7
without being actively used on a daily basis
Data centers havedoubled their energy usein the past five years• 18% increase in data center
energy costs projected
Internet connected devices growing 42% per year
Since 2000 security vulnerabilities grew eightfold
…while IT budgets are growing less than 1% per year.
Between 2000 and 2010• servers grew 6x (‘00-’10)• storage grew 69x (‘00-’10)• virtual machines grew 51% CAGR (‘04-’10)
7
In doing so, they’ve addressed the IT conundrum—meeting exploding demand for service on a flat budget.
.
IT ConundrumIncomplete, Untrusted Data: Always GuessingDecisions are made on incomplete data, big ideas are seen as risky, and small decisions aren’t optimized.
Sprawling IT: More CostEvery IT investment leads to more sprawl which drives upinfrastructure and managementcosts.
Inflexible IT: ReactiveInflexibility of infrastructurelimits integration across silosand responsiveness to customerdemands.
8
Any enterprise can reverse theIT conundrum by designing, tuning and managing their IT infrastructurein the new era of IT we call Smarter Computing.
9
What is Smarter Computing?
10
Designed for data: Big DataRemove barriers to harnessing all available information and unlock insights to make informed choices.
Tuned to the task: Optimized SystemsRemove financial barriers by driving greater performance and efficiency for each workload.
Managed in the Cloud: CloudRemove barriers to rapid delivery of new services and reinvent business processes to drive innovation.
Smarter Computing
Smarter Computing is an IT infrastructure that is designed for data, tuned to the task and managed in the cloud.
11
Designed for DataBig Data for better decision making
12
Imagine the possibilities when all available information is harnessed to unlock insights.
Information from
Everywhere
ExtremeScalability
Radical Flexibility
13
Integrating Big Data will unlock new insights.
• Streams and filters incoming data
• Reuses warehouse analytic models
Non-traditional/ internet data
Traditional data
Persistent Data In-Motion Data
14
IBM offers the complete set of capabilities required to integrate Big Data into an enterprise’s informationsupply chain.
External Information Sources
Transactional & CollaborativeApplications
Business Analytics Applications
15
IBM can provide the full set of capabilities to build any organization’s information supply chain.
ManageCut database licensing and maintenance costs by 25%
IntegrateSlash cost & time to publish product data sheets by up to 95%
GovernPass SOX audit while reducing costs by up to 76%
AnalyzeReduce time to process valuations by up to 66%
Storage Efficiency and Best Practices
DB2, Informix
FileNet
solidDB
InfoSphere:
Information Server
Warehouse
Master Data Management
InfoSphere:
Information Server
Optim
Guardium
InfoSphere
BigInsights
Warehouse
Streams
Stop storing so much
• Data Compression• Data Deduplication
Move data to right place
• Automated Tiering• Automated Data Migration• Policy based management
Store more with what’s on floor• Storage Virtualization
• Thin Provisioning• Consolidated Storage Mgmt.
IBM offers the widest and deepest portfolio of data warehouse solutions
16
FlexibilitySimplicity The right mix of simplicity and flexibility
IBM Warehouse Software
IBM Smart Analytics System
IBM Netezza
FlexibilitySimplicity The right mix of simplicity and flexibility
Simplicity, Flexibility, ChoiceIBM Data Warehouse & Analytics Solutions
Information Management Portfolio(Information Server, MDM, Streams, etc)
Warehouse Accelerators
Custom Solutions
Different operating systems
Different hardware platforms
Real time, streaming analytics
Plug and play applications
Robust data warehouse software
Modular scalability
18
There are times where
Clients tell us that they want choice
flexibility is required
All with an accelerated approach to deployment
19
IBM offers data warehousing and analytics software individually for ‘build
your own’ systems
And for times when ultimate flexibility is required:
”“About 2,500 users and 200,000 reports per month: We would not have been able to achieve our ambitious goals in business intelligence without InfoSphere Warehouse
- Ralf Bruhnke, Controlling and Project Manager for Karstadt
20
”“Powerful, versatile, real-time analytics
the IBM Smart Analytics System is, in our opinion, superior to Oracle Exadata 2-2: it is easier to manage and tune, easier to install, more flexible and costs (at least notionally) less money..
- Philip Howard, Bloor Research
IBM Smart Analytics System
21
Integrated Cognos Business Intelligence
Integrated InfoSphere Warehouse
In-database cubing and mining
Choice of platform and OS
Smart Analytics SystemThe modular system for business analytics
Scale ‘On Demand’
Modular application interfaces
Built for complex and mixed workloads
Autonomic tuning
A Revolutionary Approach To Deep Analytics
John GillespieVP, Information Management Software IBM Asia Pacific
23
Transactional workloads vs. Analytic workloads
Two VERY different requirements for storing and processing data
Business Analyst Data Warehouse
Complex Query
Sales & Profit for Shoes & Belts Year >= 2005
201020092008200720062005
SALES
BI Reports & Dashboards
Customer Transactional Database
Transaction
Item:‘Shoes’Cost:‘$34’Cust:‘James’
Simple Query
Item Cost CustShoes $34 James
2011 Sales
Business Transaction
24
businessperson
Query performance
is slow
Query performance
is slow
Why traditional database systems are not enough: Endless tuning
Why traditional database systems are not enough: Endless tuning
25
businessperson
technicalperson
I’ll add an indexI’ll add an index
Why traditional database systems are not enough: Endless tuning
26
businessperson
Load performance is slow. When can I access my data?
Load performance is slow. When can I access my data?
27
businessperson
technicalperson
I’ll investigate and get back to
you …
I’ll investigate and get back to
you …
Why traditional database systems are not enough: Endless tuning
Why traditional database systems are not enough: Slow data loads Indices increase time needed to load data – Retail example
28
Data loads jobs Oracle
1 + 5 hours
2 1 hour 12 mins 7 secs
3 1 hour 25 mins 56 secs
4 1hour 30 mins 00 secs
“Technical team consistently missing service level agreed with business for data availability.”
“The warehouse was frequently unavailable until 11.00am, sometimes merchandisers could not access their data until after lunch.”
Why traditional database systems are not enough: Wasted effort
29
Task Description Transform InspectNon-value Process
Value-adding Process
move data from sources 120
reconcile data 20
sort and prep 30
drop indices 5
drop constraints 1
drop aggregates 2
drop materialized views 2
load data 30
create constraints 180
create indices 90
create materialized views 60
create aggregates 120
gather statistics 300
Data Warehouse or Data Holding Pen?
30
“ “Our existing solution was not keeping up with our growing business demands, nor was it putting us in a position to accommodate new business.
We needed to break the cycle of more data, more requirements, more money.
-- Emory HeislerVP Global IT ServicesWolters Kluwer Health
“ “Many of these ‘large’ Oracle data warehouses are simply holding pens.
-- Overlooking problems with Oracle’s Exadata
Neil RadenThe Intelligent Enterprise blog
http://intelligent-enterprise.informationweek.com/blog/archives/2009/12/overlooking_pro.html;jsessionid=KBTNTOW15M54VQE1GHRSKHWATMY32JVN
Traditional approaches are broken: Warehouse as a Data Holding Pen
31
SAS
AnalyticApplications
ModelingApplications
Traditional Warehouse
AnalyticsServer
ETL
ETL
ETL
SPSS
FraudDetection
DemandForecasting
Traditional approaches are broken: Big data overwhelms traditional solutions
33
Let’s simplify this mess …
34
… and bring analytics in to the warehouse
What is an Appliance?
• Dedicated device
• Optimized for purpose
• Complete solution
• Standard interfaces
• Easy installation
• Easy operation
• Easy management
• Easy support
• Low cost
35
What is a Data warehouse and Analytic appliance?
36
Dedicated device
Optimized for purpose
Complete solution
Standard interfaces
Easy installation
Easy operation
Easy management
Easy support
Low cost
Inside IBM Netezza TwinFin
37
Processor & streaming DB logicHigh-performance database engine streaming joins,aggregations, sorts, etc.
Snippet Blades™ or (S-Blades™)
SQL Compiler, Query Plan, Optimizer & Admin
SMP Hosts
Slice of User DataSwap and Mirror partitionsHigh speed data streaming
Disk Enclosures
The IBM Netezza AMPP™ architecture
38
Advanced AnalyticsAdvanced Analytics
LoaderLoader
ExtractTransformLoad
ExtractTransformLoad
Reports &DashboardsReports &Dashboards
Applications
FPGA
Memory
CPU
FPGA
Memory
CPU
FPGA
Memory
CPU
HostsHost
Disk Enclosures S-Blades™
NetworkFabric
Netezza Appliance
Netezza’s data stream processing
39
FPGA Core CPU Core
Compress ProjectRestrictVisibility
Complex MathJoins, Aggregations etc.
S-BladeTable Cache
Netezza Delivers Speed
40
• 15,000 users• Running 800,000+ queries per day
“ “
…when something took 24 hours I could only do so much with it, but when something takes 10 seconds, I may be able to completely rethink the business process …
-- SVP Application Development, Nielsen
http://www.youtube.com/watch?v=yOwnX14nLrE&feature=player_embedded
Original image – need to purchase to obtain usage rights
Solving the data load and query performance problem
41
“
We act out the market every day to capitalize on opportunities. Complex merchandize reports that had taken days to process on the old platform now take five minutes on the new one. Simpler queries are even faster.
-- Chief Information Officer at a large US retailer
“
Data loads jobs Oracle Netezza
1 + 5 hours 2 mins 53 secs
2 1 hour 12 mins 7 secs 3 mins 29 secs
3 1 hour 25 mins 56 secs 4 mins 20 secs
4 1hour 30 mins 00 secs 5 mins 42 secs
Netezza Delivers Scalability
• 1 PB on IBM Netezza• 7 years of historical data• 100-200% annual data growth
42
“ “
NYSE … has replaced an Oracle 10 relational database with a data warehousing appliance from Netezza, allowing it to conduct rapid searches of 650 terabytes of data.
-- ComputerWeekly.com
Netezza Delivers Simplicity
• Up and running 6 months before being trained
• 200X faster than Oracle system• ROI in less than 3 months
43
“ “
Allowing the business users access to the Netezza box was what sold it.
-- Steve Taff, Executive Dir. of IT Services
Netezza Excels At Smarts
• Identifies items that shoppers are likely to buy in future visits
• 25% increase in coupon redemption rates
44
“ “
Catalina is ahead of the curve from a technology standpoint because of Netezza and the advancements in their technology in both hardware and software.
-- Kelly Carigan, VP Business Intelligence
Catalina Marketing: Building loyalty one customer at a time
45
No targeting
Basic targeting e.g., offer dog food coupon to
customer buying dog food
Using predictive models to find
latent correlations
Coupon redemption rate
1% 6-10% 25%
Marketing to a segment of one – 195 million US loyalty program members
– Every coupon printed is unique to the individual customer
– Customized based on three years' worth of purchase history
Increased staff productivity – from 50 to 600 new models per year
Increased efficiency – from 4 hours to score a model to 60 seconds
46
Thank You! http://www.netezza.com/
Simply put, IBM is making systems smarter.Simply put, IBM is making systems smarter.