Upload
lviv-it-arena
View
301
Download
0
Embed Size (px)
Citation preview
Microsoft Azure Machine LearningSergiy PoplavskiyDX, Microsoft [email protected]
I need our systems to think. I need them to learn and I need them to present issues and problems and anomalies to the employees, to the managers.Adam CoffeyPresident and CEO WASH Laundry Systems
What is Machine Learning?Computing systems that become smarter with experience“Experience” = past data + human input
“”
Bing mapslaunches
What’s the best way home?
Microsoft Research formed
Kinect launches
What does that motion
“mean”?
Azure Machine
Learning GA
What will happen next?
Hotmail launches
Which email is junk?
Bing search launches
Which searches are
most relevant?
Skype Translator launches
What is that person saying?
Microsoft & Machine LearningAnswering questions with experience1991 201420091997 201520102008
Machine learning is pervasive throughout Microsoft products.
How Are We Different? Enable custom predictive analytics solutions at the speed of the market
The main benefit we have experienced is that everything is in one place. Data is stored in the same place that hosts computations on the data.
Corey CoscioniWest Monroe
“”
The old Machine Learning landscape No improvement in generations
Huge set-up costs of tools, expertise, and compute/storage capacity Expensive
Siloed and cumbersome data management restricts access to dataSiloed data
Complex and fragmented tools limit participation in exploring data and building models
Disconnected tools
Many models never achieve business value due to difficulties with deploying to production
Deployment complexity
Differentiation
Model Your Way[Data Scientist]
All Skill Levels Business-tested Algorithms
R & Python
Deploy in Minutes[Data Scientist, IT & Developers]
One Click DeploymentManage via Cloud Portal
Model accessed as a Web Service
Expand your Reach[Ecosystem & Developers]
Product GalleryAzure Marketplace
The Data Science “Inside”
Accessibility
Azure Machine Learning ServiceData -> Predictive model -> Operational web API in minutes
Blobs and TablesHadoop (HDInsight)Relational DB (Azure SQL DB)
Data Clients
Model is now a web service that is
callable
Monetize the API through our marketplace
API
Integrated development environment for Machine
Learning
ML STUDIO
Machine Learning in actionCompetitive differentiation through accessibility
“We wanted to go beyond the industry standard of preventative maintenance, to offer predictive and even preemptive maintenance.Andreas SchierenbeckCEO, ThyssenKrupp Elevator
”
Personalized offers“At Pier 1 Imports, we’ve
embraced the cloud. It helps us operationalize technology quickly and react to our ever-evolving business needsAndrew LaudatoPier 1 Imports
”Retailer Pier 1 Imports wanted to offer a connected, personal experience both online and in store
Smart buildings“We see Azure ushering in an era of self-service predictive analytics for the masses. We can only imagine the possibilitiesBertrand LasternasCarnegie Mellon
”CMU wanted to use sensor data for more than reactive repair and diagnosis
What can Azure ML do for you…?
Social network analysis
Weather forecasting
Healthcare outcomes
Predictive maintenance
Targeted advertising
Natural resource exploration
Fraud detection
Telemetry data analysis
Buyer propensity models
Churn analysis
Life sciences research
Web app optimization
Network intrusion detection
Smart meter monitoring
What makes for a good ML problem?Labeled* examplesCustomer churn: records of current customers (loyal) + former customers who left (churn)Fraud detection: examples of fraud and not fraud
Relevant featuresCustomer information: age, sex, zip code, historic spending patterns, etc.Transaction information: amount, previous transactions, customer information, etc.
Uncertainty/error toleranceIdentifying some customers who are leaving is better than current systemAutomated fraud identification still verified by human user
Why are labels important?
Supervised learning examplesThis customer will like House of CardsThis network traffic indicates a denial of service attack
Unsupervised learning examplesThese customers are similarThis network traffic is unusual
Most commonly used machine learning algorithms are supervised (requires labels)
Regression versus Classification
Regression problemsEstimate household power consumptionEstimate customer’s income
Classification problemsPower station will/will not meet demandCustomer will respond to advertising
Does your customer want to predict/estimate a number (regression) or apply a label/categorize (classification)?
Binary versus Multiclass Classification
Binary examplesclick predictionyes/noover/underwin/loss
Multiclass exampleskind of treekind of network attacktype of heart disease
Does your customer want a yes/no answer?
ReviewIs your customer’s problem suitable for machine learning?Lots of data, lots of features, uncertainty is ok
Does your customer want to predict something?Labels are essential
Does your customer want a number or a category?
Machine Learning Algorithms
Bottom Line: Most algorithms can be applied to a variety of problems
Algorithm Binary Classification in Azure ML
Multiclass Classification in AzureML
Regression in Azure ML
Logistic Regression Two-class logistic regression
Multiclass Logistic Regression
Linear Regression Linear RegressionSupport Vector Machine
Two-class support vector machine
One-vs-all + support vector machine
Decision Tree Two-class boosted decision tree
One-vs-all + boosted decision tree
Boosted decision tree regression
Neural Network Two-class neural network
Multiclass neural network
Neural network regression
Random Forest Two-class decision forest
Multiclass decision forest
Decision forest regression
Linear RegressionWhat is doesPredicts a numeric value, y, based on your input variables
How it does itFits a linear function to your data to minimize the error across all training data
Bike share demand predictionHow many bikes will be used in a particular hour?Data:• 13 features including weather,
day of the week, and temperature
• 17,000+ examples• Most data is numeric, i.e.,
temperature
Source:UCI repository and capital bike share
Model in Azure ML
Support Vector Machine (Classification)What is doesPredicts binary class label Y based on input variables X
How it does itFinds line/hyperplane, i.e., the support vector, that optimally separates classes to minimize classification error
Note Often not possible to find separating line without classification errors
Hospital readmissionWhich patients treated for diabetes related illness will be readmitted within 30 days?Data:• 35 features including age,
gender, race, test results, treatment, etc.
• 100,000+ samples• Contains several categorical
features
Source:UCI repository and Virginia Commonwealth University
Model in Azure ML
Decision Trees - ClassificationWhat is doesPredicts class label Y based on input variables X
How it does itUses a series of decision rules in a binary tree where each leaf is a class label Rules are determined by optimizing to reduce error
Decision Trees – Regression What is doesPredicts scalar y based on input variables X
How it does itUses a series of decision rules in a binary tree where each leaf is a numberRules are determined by optimizing to reduce error
Which algorithm should you pick?Depends on your problem – what does your data look like?Logistic regression, support vector machines are good for linear problemsDecision trees are intuitive and easy to interpretDecision trees naturally handle categorical data, data scaling not requiredSupport vector machines are effective for high dimensional dataBasic decision trees are not as accurate…
Advanced decision trees: Boosted TreesWhat is doesPredicts class label (classification) or value (regression) Y based on variables X
How it does itUses a series of many trees to fit data. Each successive tree is built to reduce the estimated error the previous tree, resulting in increasingly accurate models
CautionIf number of trees is too large – can fit noise instead of good data, known as overfitting
Advanced decision trees: Random ForestWhat is doesPredicts class label (classification) or value (regression) Y based on variables X
How it does itBuilds many different trees using different random inputs (training data and features), aggregates those trees to determine final predictionIf trees are independent, having more trees does not result in overfitting
Advantages to each set of algorithmsLinear Algorithms (SVM, Logistic Regression)easy to understand/interpret resultscomputationally efficient to train and score, even for large datasetsdifficult to overfit data
Nonlinear (boosted decision tree, random forest)
can find irregular boundariesnatively handles categorical featurescan achieve superior accuracy for many problems
Neural NetworksWhat is doesPredicts class label (classification) or value (regression) Y based on variables X
How it does itIs inspired by biological systems, i.e., neurons in the brain
Final thoughts on algorithm selectionDo what data scientists do---try multiple algorithms and select the best model for your data
Model Your Way: Open source/our sourceScript with R, SQLite or Python CPython 2.7 support from inside AML Studionumpy/scipy/panda/scikit-learn/etc. Anaconda distro pre-installed
Python client library Analyze data using Python and its librariesUse IPython, PTVS, Eclipse to edit/debug
Big learning with countsTB scale datasets Modular: tune/monitor/replace in isolationMonitorable and debuggable
Deploy in MinutesOne click to production Publish as a Web Service or to Gallery Continuous updates to streamline process Stay tuned to our blog for more
New in-product GalleryDiscover what others have built Learn by dropping these into your workspaceShare your work with others
Expand your Reach
Get started for free with only a Microsoft Account ID
azure.com/ml
Check out finished APIs and solutions or put your own on the Machine Learning marketplace at datamarket.azure.com Find videos, tutorials on Documentation off azure.com/mlStay tuned to our blog https://aka.ms/mlblog
© 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.