Upload
ed-kohlwey
View
1.248
Download
0
Embed Size (px)
DESCRIPTION
David Douglas, CrinLogic The Big Data headlines are unrelenting; with each passing day seemingly bringing new discoveries, products, partnerships, venture funds, you name it into the mix. If anything, it is all a bit confusing. Listening to all this you might come to the conclusion that Big Data will solve most of your problems, place your company miles ahead of your competition, drive your Net Promoter Scores through the roof, and fall just short of solving world hunger (ok…maybe not that far). And one can’t blame you if you think all one needs to do is install the Hadoop ecosystem of projects, conjure up some possible business use cases, throw some commodity hardware into the mix, attend some training, purchase some Big Data analytics software and VOILA, you have arrived and can enjoy the fruits of your Big Data efforts. With tongue firmly planted in cheek, the reality is vastly different. This talk is partially a reality check on Big Data implementation strategies - starting with Big Data is easy, becoming proficient is hard, fully integrating into a broader enterprise data strategy is very hard – and partially an information sharing session on what we’re learning as we engage with customers in various industries on Big Data. Among other things we will explore: building the business case; software and hardware requirements analysis; selection process and implementation approaches; what tends to work well, not so well, and what to avoid; and how big data is likely to affect enterprise data architecture. David Douglas is a member of Hadoop-DC User Group and is a co-founder of CrinLogic, a Big Data consultancy based in the greater DC area. He has devoted his 17 years of professional experience to helping clients maximize the value of their strategic IT initiatives. Prior to co-founding CrinLogic, David started two other companies. The first was an angel-backed Sales Force Automation software company he sold in 2002 and the second is a consulting services company that focuses on Agile and Lean software adoption and large-scale program implementation services. He helped start the Data Warehousing practice at American Management Systems and was one of the first consultants to join IBM’s Business Intelligence practice.
Citation preview
CrinLogic
Our Big Data Journey
A Consultant’s Perspective
April 26, 2012
5/1/2012 1
CrinLogic5/1/2012 2
CrinLogic
A little Big Data story to start us out ;)
5/1/2012 3
CrinLogic
About me
David Douglas is a member of Hadoop-DC and co-founder of CrinLogic. He has over 17 years of IT consulting experience with concentration in Business Intelligence, Agile and Lean software development, and large program implementations. He is a passionate believer in Big Data and the enormous possibilities it offers.
CrinLogic is a Big Data consulting firm. Our passion for Big Data is surpassed only by our curiosity and love of learning. We offer full service Big Data consulting services and training. Visit us at www.CrinLogic.com. We are based in DC, Chicago, Austin, and Sarajevo.
443.413.4038
5/1/2012 4
CrinLogic
This talk is about…
Some things I’d like you to walk away with
1. A big picture perspective on this market
2. What customers are saying
3. Thoughts on developing a business case
4. Learnings
What this talk is not about1. A technical discussion of Big Data
5/1/2012 5
CrinLogic
Data for this talk came from
1. Talking with 30 plus companies from all walks of life and all stages of maturity
2. Talking with colleagues in the Big Data space (hardware and software vendors)
3. Current customer engagements
4. Research
5/1/2012 6
My biggest surprise is my awareness of how little I know. No one really understands how to build a Big Data solution. We are all learning as we go.
CrinLogic
My Perspective on Big Data
Enterprise Data Architecture
Post adopter syndrome
Business problem focused
Rising tide
Systems Thinking
Iterative
Failures
5/1/2012 7
CrinLogic
Yes it is big and growing!
5/1/2012 8
CrinLogic
Except when compared to Bieber
5/1/2012 9
CrinLogic
And me of course ;)
5/1/2012 10
Though I’ve been trending down…
CrinLogic
Need to expand my Network on LinkedIn
5/1/2012 11
CrinLogic
So what do the customers really think?
5/1/2012 12
CrinLogic
They are confused
5/1/2012 13
and rightfully so!
CrinLogic
No generally accepted definition for “Big Data”
5/1/2012 14
“We don’t generate enough data for that”
“Don’t you need at least 100TBs?”
Or they simply think they are already using Big Data
And lest we forget the 3Vs…
Volume, Variety, Velocity(just a couple pointers on these)
CrinLogic
So many products and choices promising so much
5/1/2012 15
Where Database
Hardware
Open Source
Analytical Tools
Network
CrinLogic
So much software offerings
5/1/2012 16
CrinLogic
No generally accepted definition for “Data Scientist”
5/1/2012 17
Are they a critical success factor for Big Data Solutions? [True or False]
Were they a critical success factor to Business Intelligence solutions?
OSEMI – Obtain, Scrub, Explore, Model, Interpretwww.dataists.com Hillary Mason & Chris Wiggins
CrinLogic
So tell me the why please?
5/1/2012 18
CrinLogic
McKinsey’s 5 Value Propositions
1. Make information transparent and usable more readily
2. Expose variability and enable performance improvement
3. Better customer segmentations
4. Advanced analytics for better decision making
5. New products
5/1/2012 19
CrinLogic
Not seeing the Big Analytics Piece
5/1/2012 20
In fact, of the many companies employing Big Data we’ve talked to or are working with are not doing big data analytics
CrinLogic
Tactical versus Strategic
5/1/2012 21
Tactical solves an immediate pain point
•Batch jobs taking too long
•Reaching limit of scalability on current infrastructure
•Budget was reduced recently but still have to deliver
•New project ‘just so happens’ to need this newer technology
Strategic implies
•Seeking competitive differentiation
•Creating actual solutions with value
Big Data as strategic direction is much harder
CrinLogic
Big Data Strategic Business Case Approach
5/1/2012 22
CrinLogic
Figure out the Business Case
5/1/2012 23
Congratulations! The CEO of a large Financial Services firm has asked you and your team to map out the company’s Big Data Strategy so he can present to the board. He is known for being thorough. Now get to work!!
Of the below choices, which is the best first step?a. Scour the Internet for Big Data
use case success stories for Financial Services and then go talk to VPs in that area
b. Build a virtual cluster on your machine, open direct link to Twitter hose and show CEO what the community is saying about him real-time
c. Phone a friend (or CrinLogic)
d. Build relationships, interview all areas of the company, research market, and consolidate the results [but time-box it to a couple weeks]
Goal is to identify the most appropriate areas to start…high reward…high visibility
CrinLogic
This can be helpful…
5/1/2012 24
Recover Charged Off Accounts
ManageDelinquencies Recoveries &
Fraud
Collect on Delinquent Accounts
Manage Customer
Relationship
Service Customers
Establish Strategic
Imperatives
Develop Business Strategy
Develop Marketing Strategy
Acquire Customers
Develop Card Acquisition Offers
Identify Prospects
Decision Applications & Book
Accounts
Solicit Prospects & Promote Offers
Fulfill on Decisions
Develop Acquisition Campaigns
Develop Collections Strategies
Develop Recoveries Strategies
Detect and Recover Fraud
Fulfill on Offers/Changes
Develop Account Management Offers
& Policies
Communicate Offers/Changes
Decision Response/Request
Identify Customers/ Targets
Design Account Management
Campaigns
Develop Fraud Strategies
Define Customer Experience
Maintain Accounts
Process Credit Card Transactions
Provide Customer Service
Develop Servicing Strategies
Develop Market Innovations
Manage Information Technology
Manage Regulatory
Affairs & Compliance
Manage Finance &
Accounting
Manage Rewards
Manage Accounting and Reporting
Manage Line of Business
Manage Credit Risk
Manage Planning & Analysis
Manage Treasury
Manage IT Operations
Manage External Compliance
Manage Correspondence
Manage Funds Disbursements
Manage Human
Resources
Manage HR
Sample Large Financial Institution
CrinLogic
Identify Big Data Impact Areas
5/1/2012 25
Recover Charged Off Accounts
ManageDelinquencies Recoveries &
Fraud
Collect on Delinquent Accounts
Manage Customer
Relationship
Service Customers
Establish Strategic
Imperatives
Develop Business Strategy
Develop Marketing Strategy
Acquire Customers
Develop Card Acquisition Offers
Identify Prospects
Decision Applications & Book
Accounts
Solicit Prospects & Promote Offers
Fulfill on Decisions
Develop Acquisition Campaigns
Develop Collections Strategies
Develop Recoveries Strategies
Detect and Recover Fraud
Fulfill on Offers/Changes
Develop Account Management Offers
& Policies
Communicate Offers/Changes
Decision Response/Request
Identify Customers/ Targets
Design Account Management
Campaigns
Develop Fraud Strategies
Define Customer Experience
Maintain Accounts
Process Credit Card Transactions
Provide Customer Service
Develop Servicing Strategies
Develop Market Innovations
Manage Information Technology
Manage Regulatory
Affairs & Compliance
Manage Finance &
Accounting
Manage Rewards
Manage Accounting and Reporting
Manage Line of Business
Manage Credit Risk
Manage Planning & Analysis
Manage Treasury
Manage IT Operations
Manage External Compliance
Manage Correspondence
Manage Funds Disbursements
Manage Human
Resources
Manage HR
No Impact
Low Impact
Moderate Impact
High Impact
CrinLogic
Naturally! Fraud & Recoveries
5/1/2012 26
ManageDelinquencies Recoveries &
Fraud
No Impact
Low Impact
Moderate Impact
High Impact
Recover Charged Off
Accounts
Collect on Delinquent Accounts
Develop Collections Strategies
Develop Recoveries Strategies
Detect and Recover Fraud
Develop Fraud Strategies
• Determine Collections Strategy
• Enter Collections
• Exit Collections Strategy
• Fulfill Collections Strategy
• Monitor Commitments
• Service Collections Account
• Charge Off Bad Debt
• Process Bankruptcies
• Process Estates
• Process Recoveries Payments
• Analyze Collections Strategies
• Maintain Collections Systems
• Detect Fraud
• Decision Identity Fraud
• Decision Transaction Fraud
• Recover Fraud
• Research Fraud Strategies
• Design/Test Fraud Strategies
• Implement Fraud Strategies
CrinLogic
Be ready to answer these questions
5/1/2012 27
1. Do they currently have an analytics group?2. Do they make decisions based on data?3. Do they have data center management skills?4. Do they have stringent regulatory requirements? 5. What are the current sources of data?6. What other sources of data are of interest?7. What are their KPIs?8. What is the maturity of their enterprise data architecture?9. What is the maturity of their business intelligence
initiative(s)?10. Others?
CrinLogicMAP 4.0 Product Features
Maturity Level
Data/Software Decision Sciences
5
4
3
2
1
Analytical
Master( Institutionalized
Analytics)
Analytics
Amateur(Some BI)
Analytical
Practitioner(In-house Insight
s team)
Localized
Analytics(Some Sales
Drilldown)
BI Tools/ Reporting
Engine
Really basic Analytics
MS Office Tools
BI Reporting Tools
Internal attempts
Full BI Suites With Some
Data Mining/Analytics
(Mostly Built-In)
Oracle Suite
Microsoft Suite
SAS
IBM
Spotfire
[…]
Specialized/Targeted
Analytics Products
Analytics
Holy Grail
Threshold
based
Insights
Automated
Decks
Preemptive
Suggestions
Insights
Optimization
Automated
Insights
Forecasting/Full
Simulation
Forward Looking DSS
Global Suite
(Supply Chain/PnP/Mix/Media …)
with integrated workflow (ERP …)
Pricing and Promotions
Marketing Mix
Segmentation
Consumerization
Assortment
Churn/Attrition
Supply Chain
[…]
Actionable
Implementable
Insights
Current Market is fragmented and
overlapping.
Highly specialized.
Many players often produced excel-
based tools
Supply Chain Simulator
PnP Simulator
Mix Simulator
Media Buy
[…]
Analytics
Laggard
Full picture/context optimization
Dhiraj Rajaram, CEO Mu Sigma andJoseph de Castelnau, SVP Engineering Nielsen
Model courtesy of
Predictive Analytics Maturity Model
CrinLogic
Opportunity Areas
5/1/2012 29
Business
Strategic Value
HighLow
High
Low
Highest benefits are most likely realized when building these
products or features
Size of bubble = Est. Effort
IT Strategic Value
Sources: “Measuring the Business Value of Information Technology”, Intel Press
CrinLogic
Opportunity Areas
5/1/2012 30
Business Strategic Alignment
IT Strategic Alignment
HighLow
High
Low
Size of bubble = Est. Effort
So why do these get built?
Sources: “Measuring the Business Value of Information Technology”, Intel Press
CrinLogic
The Things We Learned
5/1/2012 31
CrinLogic
Implementation Approach
5/1/2012 32
• Big Data does not lend itself to a Big Bang approach (actually does anything really?)
• Proof of concepts make perfect sense to gain traction (top-down push is preferable to federated)
• As with any effort with such potential, appropriate oversight by combination of IT/Business executive
Other considerations• Form a central team with key skills in building Big Data solutions. This consulting
team should help train, mentor, and provide consulting expertise to new initiatives ….helps ensure consistency in approach.
• There is a price of entry…each new participating area should bring resources to the table
• Encourage building a community of analytic junkies and support them…community building … goal is information sharing…build a Big Data culture
• Preference for consolidation
CrinLogic
Components of Commodity Hardware
5/1/2012 33
# sockets, # cores, core memory, processor speed
SATA, SSD, # Disks
CrinLogic
2009/2010 H/W Recommendations
5/1/2012 34
• 4 x 1TB hard drives
• 2 x Quad-core CPUs, each 2.0-2.5GHz
• 16GB RAM
• Gigabit Ethernet
CrinLogic
Commodity Hardware Today
5/1/2012 35
1U
2U
4U
4-6 x 2TB SATA Drives1 or 2 Socket1 x 6 or 2 x 6 Cores4GB Core Memory24+ GB RAM12 x 2TB SATA Drives1 or 2 Socket1 x 6 or 2 x 6 Cores4-8GB Core Memory24 + GB RAM20 or 36 x 2TB SATA Drives80 x 3TB (???)1 or 2 Socket1 x 6 or 2 x 6 Cores4-8GB Core Memory
Approx $4K
Approx $6K
Approx $12K
10GB ethernet?($7K)
CrinLogic
Thoughts on Storage TCO
5/1/2012 36
• *Price != Cost and TCA is < 20% of TCO
• $ per TB not an exact science
*David Merrill, Hitachi Data Systems Chief Economist, “Storage Economics: Four Principles for Reducing Total Cost of Ownership” July, 2011
CrinLogic
‘New to Big Data’ learnings
Iterative process…if you go in
claiming you ‘know’ the use case you want to solve you are in for a surprise
A lot of this is not intuitive…e.g. MapReduce,
Columnar based DBs and we live in an RDBMS world
5/1/2012 37
It takes a wealth of skills not resident in a single person… understand batch MapReduce framework, networks, grid computing, analytics, subject matter experts, and more
Open Source or ‘Free’ not a big selling point for the
larger companies
CrinLogic
Some key learnings from early adopters
Don’t forget operations…in
2010 Facebook had between 400-500 operations professionals…on par with entire engineering organization
Be ready to embrace emergent solutions and emergent architecture… and emergent support base within company
Vendor support still needs to catch up…not the
level of support companies are used to from established technology vendors
5/1/2012 38
Source: http://framethink.wordpress.com/2011/01/17/how-facebook-ships-code/
CrinLogic
Thinking about workloads…latency
Low Latency(real-time)
High Latency(1 hour plus)
Start here!
Solutions are generally a mix of different paradigms
5/1/2012 39
CrinLogic
Other Random Learnings
5/1/2012 40
• For many companies, there will likely be a cultural change required to become good in Big Data analytics
• Customers have major concerns about security and cloud
• Don’t tell a risk officer that Hadoop’s replication framework mitigates need for disaster recovery
• How about you all…any Random Learnings you want to share?
CrinLogic5/1/2012 41
Big Data Analytics Learnings“Analytics is the act of taking Big Data streams and human-sizing them for our small data brains.”Source: http://www.dataspora.com/2010/05/new-tools-for-big-data/#more-182
There are no turnkey solutions in analytics space(efforts underway to make big data analytics accessible to the non-Data Scientist)
CrinLogic
Final Thoughts
5/1/2012 42
CrinLogic
Musings
5/1/2012 43
• Chief Data Officer || Chief Data Scientist
• Just because you can retain all this data does it mean you should?
• Big Data and virtualization
CrinLogic5/1/2012 44
Thinking about Starting a Big Data solution
1. Big Data strategy assessment?
2. Go small (success breeds success)
3. Let RT and near RT come to you…don’t start there
4. Ensure you have the right skills (or bring them in)
5. If only R&D focus then upside may be limited…business needs to have a seat at the table
Consider hiring a professional Big Data consulting firm to help in the transition!
CrinLogic
Some good resources for you
BlogsDatabases and Data Infrastructure
http://www.dbms2.com
http://dbmsmusings.blogspot.com.
http://databeta.wordpress.com.
http://blogs.gartner.com/donald-feinberg
http://itmarketstrategy.com/
Big Data Analytics
http://hunch.net/
http://ml.typepad.com/
www.dataists.com
http://www.dataspora.com/blog/
http://blog.data-miners.com/
http://www.visualcomplexity.com/vc/blog/
5/1/2012 45
Videoshttp://www.youtube.com/watch?v=SS27F-hYWfU&feature=relmfu
http://www.youtube.com/watch?v=2FpO7w6X41I
http://www.youtube.com/watch?v=OmlX3IHb0JE
http://www.youtube.com/watch?src_vid=UaGINWPK068&annotation_id=annotation_65559&v=XAuwAHWpzPc&feature=iv
http://www.youtube.com/watch?v=eUcej07dGu4
http://www.youtube.com/watch?v=viPRny0nq3o
CrinLogic
Q&A
5/1/2012 46