AlexTelfarProposal t SNE

Embed Size (px)

Citation preview

  • 8/18/2019 AlexTelfarProposal t SNE

    1/8

    Dimensionality Reduct

    t-SNE

  • 8/18/2019 AlexTelfarProposal t SNE

    2/8

    Stochastic neighborhood embedding (SNE)

    “  t-SNE is a tool to visualize high-dimensional data. It converts simila

    between data points to joint probabilities and tries to minimize the Ku

    divergence between the joint probabilities of the low-dimensional em

    the high-dimensional data. ”  

  • 8/18/2019 AlexTelfarProposal t SNE

    3/8

  • 8/18/2019 AlexTelfarProposal t SNE

    4/8

  • 8/18/2019 AlexTelfarProposal t SNE

    5/8

    Problems:with dimensionality reduction

    ● The curse of dimensionality

    ○ An exponential amount of information is being crushed into an approximately

    ○ Distance functions are far less meaningful in high dimensional space.

    with t-SNE 

    ● Optimising a non-convex loss function.

    ● Large data sets are computationally expensive. Calculating pairwise relationships

    operations. Thus it is just not possible to do t-SNE embeddings for large datasets l

    which has 14 million images (n) and 196608 dimensions (k).

  • 8/18/2019 AlexTelfarProposal t SNE

    6/8

    The Solution:

    Soft dimensionality reduction

    We plan to test the theory that:

    Continuous or incremented (soft) dimensionality reduction will help preserve more struct

    when embedding data from a higher dimension.

    Potential realisations of soft dimensionality reduction

    ● Punishing data points that lie in the higher dimensions (3D and above) through the

    function that we are optimising

    ● Continuiously transforming the data (through folding and dekinking) into a 2D plan

    ● Iteratively using t-SNE to project downward only a small number of fixed dimensio

    2D

  • 8/18/2019 AlexTelfarProposal t SNE

    7/8

    Evaluation (actually it’s another problem)

    ● Benchmark datasets○ in machine learning. MNIST, CIFAR 10-100, COIL 20-100, …

    ● Computational complexity 

    ○ It is straightforward to compare the speed of two algorithms. Thus any impro

    can easily be evaluated.

    ● Visualisations are subjective○ Changes to the actual visualisation are inherently subjective. This makes eva

    ● Autoencoders

    ○ Will also be used to help validate and generalise the soft dim reduction techn

    that it is not just a property of t-SNE.

  • 8/18/2019 AlexTelfarProposal t SNE

    8/8

    Details

    Process

    ● Iterate○ Literature review and ideas

    ○ Model design/creation

    ○ Evaluation

    ● Write report(s) throughout the year 

    Timeline

    ● Important dates○ Prepare preliminary report 6/6/16. Finish 13/6/16.

    ○ Prepare final report 19/9/16. Finish 26/9/16.