Machine Learning Top Trumps with David Tarrant

Embed Size (px)

Citation preview

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    1/22

    Machine learning and classification

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    2/22

    Explain the core aspects of Machine LearningApply classification to a set of data to create a decision

    Write a machine learning algorithm

    O

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    3/22

    Machine Learning

    “construction and study of systems that can learn from data”

    Can be seen as building blocks to make computers learn to

    behave more intelligently

    There are various techniques with various  implementations.

    Slide ideas

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    4/22

    Question?

    What instances of machine learninghave you come across?

    What types are they?

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    5/22

    Use-Cases

    Spam Email Detection•Machine Translation (Language Translation)

    •Image Search (Similarity)

    •Clustering (KMeans) : Amazon Recommendatio

    •Classification : Google News

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    6/22

    Use-Cases (contd.)

    •Text Summarization - Google News

    •Rating a Review/Comment: Yelp

    • Fraud detection : Credit card Providers

    •Decision Making : e.g. Bank/Insurance sector

    • Sentiment Analysis

    • Speech Understanding – iPhone with Siri

    • Face Detection – Facebook’s Photo tagging

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    7/22

    Not a Spam

    Not a Spam

    Not Spam

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    8/22

    NER (Named Entity Recognition)

    http://nlp.stanford.edu:8080/ner/process

    http://nlp.stanford.edu:8080/ner/process

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    9/22

    Similar/Duplicate ImagesRemember

    Features  ?  (Feature Extraction)

      Can be :

    • Width

    • Height

    • Contrast

    • Brightness

    • Position

    • Hue

    • Colors

    Credit: https://www.google.co.in/ 

    Check this :

    LIRE (Lucene Image REtrieval) library  -

    https://code.google.com/p/lire/ 

    https://www.google.co.in/https://code.google.com/p/lire/https://www.google.co.in/

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    10/22

    Recommendations

    http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/ 

    http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/http://www.webdesignerdepot.com/2009/10/an-analysis-of-the-amazon-shopping-experience/

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    11/22

    TerminologyFeatures

    The number of features or distinct traits that can be used to describe eachitem in a quantitative manner.

    SamplesA sample is an item to process (e.g. classify). It can be a document, a picture, asound, a video, a row in database or CSV file, or whatever you can describe witha fixed set of quantitative traits.

    Feature vectoris an n-dimensional vector of numerical features that represent some object.

    Feature extractionPreparation of feature vector

    Transforms the data in the high-dimensional space to a space offewer dimensions.

    Training/Evolution setSet of data to discover potentially predictive relationships.

    Slide ideas

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    12/22

    Learning (Training)

    Features:

    1. Color:

    Radish/Red

    2. Type :

    Features:

    1. Sky

    Blue

    2. Logo

    Features:

    1. Yellow

    2. Fruit

    3. ShapeSlide ideas

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    13/22

    Workflow

    Slide ideas

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    14/22

    Property locations?

    Each table has a set of “Top Trump”training-set cards.

    Build a decision tree to sort them into “New

    York” and “San Francisco”.

    You cannot use the target to sort them.

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    15/22

    Categories

    Supervised Learning

    Unsupervised Learning

    Semi-Supervised Learning

    Reinforcement Learning

    Slide ideas

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    16/22

    Supervised Learning

    The correct labels of the training data are known

    Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth

    Slide ideas

    http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truthhttp://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    17/22

    Unsupervised Learning

    The correct labels of the training data are not knowndiscovery and pattern generation.

    Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth 

    Slide ideas

    http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truthhttp://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    18/22

    Semi-Supervised LearningA Mix of Supervised and Unsupervised learning.

    Some labels are known, but majority are not.

    Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth 

    http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    19/22

    Reinforcement LearningSoftware agent learns based on feedback and/or reward.

    Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth 

    http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    20/22

    Techniques

    classification 

    predict class from observations

    clustering

    group observations into “meaningful” groups

    regression (prediction):

    predict value from observations

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    21/22

    r2d3.us

  • 8/20/2019 Machine Learning Top Trumps with David Tarrant

    22/22

    Frameworks/Tools

    R

    Weka

    Carrot2

    Gate

    OpenNLP

    LingPipe

    Stanford NLP

    Mallet – Topic Modelling

    Gensim – Topic Modelling (Python) Apache Mahout

    MLib – Apache Spark

    scikit-learn - Python

    LIBSVM : Support Vector Machines

    and many more…