8/20/2019 Machine Learning Top Trumps with David Tarrant
1/22
Machine learning and classification
8/20/2019 Machine Learning Top Trumps with David Tarrant
2/22
Explain the core aspects of Machine LearningApply classification to a set of data to create a decision
Write a machine learning algorithm
O
8/20/2019 Machine Learning Top Trumps with David Tarrant
3/22
Machine Learning
“construction and study of systems that can learn from data”
Can be seen as building blocks to make computers learn to
behave more intelligently
There are various techniques with various implementations.
Slide ideas
8/20/2019 Machine Learning Top Trumps with David Tarrant
4/22
Question?
What instances of machine learninghave you come across?
What types are they?
8/20/2019 Machine Learning Top Trumps with David Tarrant
5/22
Use-Cases
•
Spam Email Detection•Machine Translation (Language Translation)
•Image Search (Similarity)
•Clustering (KMeans) : Amazon Recommendatio
•Classification : Google News
8/20/2019 Machine Learning Top Trumps with David Tarrant
6/22
Use-Cases (contd.)
•Text Summarization - Google News
•Rating a Review/Comment: Yelp
• Fraud detection : Credit card Providers
•Decision Making : e.g. Bank/Insurance sector
• Sentiment Analysis
• Speech Understanding – iPhone with Siri
• Face Detection – Facebook’s Photo tagging
8/20/2019 Machine Learning Top Trumps with David Tarrant
7/22
Not a Spam
Not a Spam
Not Spam
8/20/2019 Machine Learning Top Trumps with David Tarrant
8/22
NER (Named Entity Recognition)
http://nlp.stanford.edu:8080/ner/process
http://nlp.stanford.edu:8080/ner/process
8/20/2019 Machine Learning Top Trumps with David Tarrant
9/22
Similar/Duplicate ImagesRemember
Features ? (Feature Extraction)
Can be :
• Width
• Height
• Contrast
• Brightness
• Position
• Hue
• Colors
Credit: https://www.google.co.in/
Check this :
LIRE (Lucene Image REtrieval) library -
https://code.google.com/p/lire/
https://www.google.co.in/https://code.google.com/p/lire/https://www.google.co.in/
8/20/2019 Machine Learning Top Trumps with David Tarrant
10/22
Recommendations
http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/
8/20/2019 Machine Learning Top Trumps with David Tarrant
11/22
TerminologyFeatures
The number of features or distinct traits that can be used to describe eachitem in a quantitative manner.
SamplesA sample is an item to process (e.g. classify). It can be a document, a picture, asound, a video, a row in database or CSV file, or whatever you can describe witha fixed set of quantitative traits.
Feature vectoris an n-dimensional vector of numerical features that represent some object.
Feature extractionPreparation of feature vector
Transforms the data in the high-dimensional space to a space offewer dimensions.
Training/Evolution setSet of data to discover potentially predictive relationships.
Slide ideas
8/20/2019 Machine Learning Top Trumps with David Tarrant
12/22
Learning (Training)
Features:
1. Color:
Radish/Red
2. Type :
Features:
1. Sky
Blue
2. Logo
Features:
1. Yellow
2. Fruit
3. ShapeSlide ideas
8/20/2019 Machine Learning Top Trumps with David Tarrant
13/22
Workflow
Slide ideas
8/20/2019 Machine Learning Top Trumps with David Tarrant
14/22
Property locations?
Each table has a set of “Top Trump”training-set cards.
Build a decision tree to sort them into “New
York” and “San Francisco”.
You cannot use the target to sort them.
8/20/2019 Machine Learning Top Trumps with David Tarrant
15/22
Categories
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning
Slide ideas
8/20/2019 Machine Learning Top Trumps with David Tarrant
16/22
Supervised Learning
The correct labels of the training data are known
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Slide ideas
http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truthhttp://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
8/20/2019 Machine Learning Top Trumps with David Tarrant
17/22
Unsupervised Learning
The correct labels of the training data are not knowndiscovery and pattern generation.
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Slide ideas
http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truthhttp://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
8/20/2019 Machine Learning Top Trumps with David Tarrant
18/22
Semi-Supervised LearningA Mix of Supervised and Unsupervised learning.
Some labels are known, but majority are not.
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
8/20/2019 Machine Learning Top Trumps with David Tarrant
19/22
Reinforcement LearningSoftware agent learns based on feedback and/or reward.
Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
8/20/2019 Machine Learning Top Trumps with David Tarrant
20/22
Techniques
classification
predict class from observations
clustering
group observations into “meaningful” groups
regression (prediction):
predict value from observations
8/20/2019 Machine Learning Top Trumps with David Tarrant
21/22
r2d3.us
8/20/2019 Machine Learning Top Trumps with David Tarrant
22/22
Frameworks/Tools
R
Weka
Carrot2
Gate
OpenNLP
LingPipe
Stanford NLP
Mallet – Topic Modelling
Gensim – Topic Modelling (Python) Apache Mahout
MLib – Apache Spark
scikit-learn - Python
LIBSVM : Support Vector Machines
and many more…