Berkeley DS Webinar
June 1, 2016
COMPANY CONFIDENTIAL2
How business gets involved in the modeling process (challenges involved in)• CPG (consumer packaged goods)• One of the first things I learned in the dS biz is that the biz problem is not far from the ds biz
wants to be invovled at all stages– They want to pose problem– Give perspective on solutions– Review what DS is finding,– Refine, the process and make suggestions– Understand and critique the results– Porous layer between biz and ds teams
• Can be a very positive thing: ideas on what should be included, validate if the results are meaningful, biz context needed to build good models
• Downside: biz will often lead you down paths that are not productive or defensible + anecdotes!
• Having biz involved forces you to have models that are explanatory and not just predictive this means they are meaningful
• If you just focus on prediction this will lead to overfit,
COMPANY CONFIDENTIAL3
It’s all about the data!• Morgan Stanley we sell AA but many ppl do basic stuff with data• Means that you don’t’ spend that much time doing algo stuff, mostly
about feature generation and data prep• In SV w/ internet companies the data science is throw all the data at an
algorithm• If you can be more intelligent with feature gen, you will get better
performance • nevertheless, the more data you can get, the better
• So is acquisition of data very important and part of the process (overlooked)• Traditional world: what data to use, which transforms VERSUS throwing
data in an algorithm and hoping for the best– This is overlooked
COMPANY CONFIDENTIAL4
It’s not about the algorithm!• Evicore example• In a very short period of time, just using the straightforward approach, we found
a way to save 10s of millions of dollars• By contrast, company like Vmware they are obsessed with applying advanced
algorithms on small amounts of data, not rich data, and not making impact on the biz
• What is more important than the algo, is finding an important biz problem and getting to a solution in a meaningful time period
• Also what is more important is operationalizing analytics result• You can have a perfect model, not in production is just an insight can die on
the vine• Simple model that can give you lift in customer acquisition and impact on fraud
that’s immediate
COMPANY CONFIDENTIAL5
How to become a data scientist!• Personal experience and what you see during
hiring• Recruiting stuff • Plug for alpine!• Internships are the most important! Than courses
and stuffz• All about connections• Meetups