19

൬. I have tried to add comments on certain slides, but the ...conference.scipy.org/scipy2011/slides/hemen_enterprise_challenges… · I have tried to add comments on certain slides,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Presenter
Presentation Notes
Note: I made a very conversational deck of slides, which I think worked well for the talk but they do not stand alone very well. I have tried to add comments on certain slides, but the talk itself should be viewed too.

Me: Group Manager – Enterprise Advanced Analytics

Presenter
Presentation Notes
Sports Authority is one of the largest full-line sporting good retailers in the country, with ~500 stores across most of the U.S., millions of customers in its database and billions in sales

• I started 9 months ago, no Python in use, anywhere

• Now, Python is part of our stack moving forward as we build out our IT and analytics capabilities

• I have played an important role in defining this stack, especially with respect to Python

• I am lucky to work with smart people who are open to new tools and processes

• This talk discusses what I have done over recent months to move from then to now…

Presenter
Presentation Notes
R is on the list because of the hype whereas the former two are established veterans in the analytical software space.

• Lack of commercial support• Difficult for IT to deal with standard installation• No single throat to choke• Analytics : no longer trust issues with open source, but still perception that coverage is weaker (e.g. compared to SAS or R)

• Big data• Python only works with data that is in-memory

• Integration with Enterprise systems• How do we know we’ll be able to connect Python to our database system (which includes Teradata, Oracle, SQL Server, IBM iSeries…)

Presenter
Presentation Notes
Commercial support: it exists!
Presenter
Presentation Notes
Ignoring map-reduce and other strategies that require significant changes in how you write applications… Screenshot accessed 7/13/2011

Integration with Enterprise Systems

I currently have no answer to this, at least not one that I am willing to stake my career on (despite SAS’s cost, no one will ever be fired for choosing that technology stack – it works)…

pyodbc and sqlaclhemy? I think so, but I have not done much testing yet, and its not like people are blogging about connecting Python to Teradata…

Different tact needed, more on this in a minute…

Presenter
Presentation Notes
No problem hooking up to SQL Server 2005 and 2008 – we do it every day. Not so sure about Teradata Python-on-Windows, but colleagues have had success.

• Trends in Analytics• Businesses have increasing need for data science and not just statistics…• … but there is a dearth of people with these skills, but to the extent they exist, Python dominates as the lingua franca…• … ergo, if we want to grow internal , analytical competency ,we need to look to draw from the right pools of talent

• Flexibility and speed to development• Rapid analytical prototyping lets us better assess the cost and the value of turning data into information • Python is also well-suited for high performance, production level work, and we don’t need multiple licenses for every new functional area

http://browsertoolkit.com/fault-tolerance.png

This approach is tempting, but…

Presenter
Presentation Notes
No one likes pompous evangelists…

Talk!• Find other Pythonistas, hopefully in other parts of the organization

• This was critical! I found friends in high places, like our IT/IS groups• Python is becoming not only application glue but also social glue:

statisticians can learn from IT about web site development and how to interactdirectly with our midframe data systems, and people like me can help them with analytical needs (e.g. analyzing and visualizing server loads)

Examples:

Presenter
Presentation Notes
Lunch & Learn sessions to get people excited. At the end of the day, most technical problems are easy. People-problems are more challenging

And actual programming skills, please

Talk some more!• Evangelize growing need for data science (and Python follows pretty easily)

http://www.drewconway.com/zia/?p=2378

Presenter
Presentation Notes
Programming skills: version control , unit testing, performance tuning, development in a team, Agile, TDD – these are backgrounds I seldom find in R and SAS programmers I know or interview.

Let’s end this talk with three recent examples of my team’s analytical work that used Python…

• Text analysis of customer survey data• Real Estate team visualizing store performance• Geospatial analysis of customer value and attrition

(Note that we use Python for true statistical modeling too but logistic regression is less exciting to show than the examples above, which do not involve modeling per se

Write code!• Use Python to quickly get people things they need

Go after the domain of problems that are not solved by COTS or big consulting project (which Enterprises love)

Presenter
Presentation Notes
We have a great, web-based reporting tool to slice and dice customer survey data. But, there is a weakness with the Comment reporting in that it only offers very basic text searching. This makes it hard to use the voice of the customer as true instrumentation for the business.
Presenter
Presentation Notes
SAS Text Miner costs many many thousands of dollars. This solution has no capital costs, just development (for which SAS would have the same costs). Contrast: NLTK, gensim, MongoDB – super flexible, high performance, and powerful natural language processing algorithms.

• Text analysis of customer survey

• Real Estate team visualizing store performance

• Geospatial analysis of customer attrition

http://www.jjguy.com/heatmap/

Presenter
Presentation Notes
We have developed a much more useful tool than anything I have seen in a few of the commercial software packages we license for this type of work (they do other things very well of course). This tool is now a pretty sweet mash up of Python and Google Earth, which serves up many dimensions of store data to visualize interactively. In this screenshot, there are thin white lines that demarcate zip code boundaries. The edges point to the centroid of the zip code that acount for most of this store’s sales (colored by importance), and the heatmap shows the density of high value customers in various neighborhoods.

Static Excel graphics – an easy target for improvement

Presenter
Presentation Notes
Excel scatter plot of store data: what can you do with this? Not much. So, convert this to highly interactive matplotlib graphic displaying more dimensions, tooltips, and region of interest selection.

Questions?

P.S. Feel free to contact me if you want moral support in your quest to have Python take over the world.

P.P.S We are hiring! (Denver, CO)

Josh Hemannjhemann @ sportsauthority . com