48
Simple Classification using Binary Data Deanna Needell Mathematics UCLA Thanks: NSF CAREER DMS#1348721 NSF BIGDATA DMS#1740325 MSRI NSF DMS#1440140 Alfred P. Sloan Fdn

Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Simple Classification using Binary Data

Deanna NeedellMathematicsUCLA

Thanks:• NSF CAREER DMS#1348721• NSF BIGDATA DMS#1740325• MSRI NSF DMS#1440140• Alfred P. Sloan Fdn

Page 2: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Joint work with

Rayan Saab (UCSD) Tina Woolf (CGU)

Page 3: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

So much data…

Page 4: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

So much data…

Page 5: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

So much data…

Page 6: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

So much data…

Page 7: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

How can we handle all this data?

Option 1 : Build bigger computing systemsv We need the resourcesv Fundamental limitationsv Wasteful (resources, energy, cost, …)

Page 8: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

How can we handle all this data?

3 MB of internet data transfer = boiling one cup of water(https://www.katescomment.com/energy-of-downloads/)

Page 9: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

How can we handle all this data?

Option 2 : Design more efficient data analysis methods

Page 10: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Compressed sensing

v Need to solve highly underdetermined linear system

vA has a null-space!

Page 11: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

1-bit compressed sensingv Store only the first bit – the SIGN of each measurement

Page 12: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

1-bit compressed sensingv Store only the first bit – the SIGN of each measurement

Page 13: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

1-bit compressed sensingv Store only the first bit – the SIGN of each measurement

v Remedy: Use “dithers” to estimate the norm of x

Page 14: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Moral:

vOne-bit (binary) data is as efficient as it gets

vIt may still contain enough “information” about the signal to perform inference tasks

Page 15: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Problem: classification

Page 16: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Problem: reality

(Tang etal. 2016)

Page 17: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Background: SVM

Page 18: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Background: SVM

Page 19: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Background: Deep Learning

Page 20: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Our goals

Design classification scheme that:

v Uses binary data

v Is simple and efficient

v Uses layers in an interpretable way

Page 21: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Notation

v X : nxp training data matrix (data in columns)

v A : mxn (random) matrix

v Q = sign(AX) : mxp binary training data

v G : # of classes

v b : p training labels (1-G)

v L : # of layers in our design

Page 22: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Main idea

v Each row of A corresponds to a hyperplane

v Each binary measurement Qij indicates on which side of the ith hyperplane data point Xj lies

v If we gather enough of this info for all the training data, we can use it to predict the class label for a new test point x

Page 23: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Single layer

v All the hyperplane information in Q is enough

For a new point x, simply compare its sign pattern with those of the training points and choose the label it matches the most often

x

Page 24: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Multiple layers

v What about?

For ANY hyperplane, there are both red and blue on at least one side of it

x

Page 25: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Multiple layers

v Now PAIRS of hyperplanes do the trick

x

For a new point x, simply compare its sign pattern for hyperplane PAIRS with those of the training points and choose the label it matches the most often

Page 26: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Multiple layers

What about higher dimensions?

v We continue this strategy to build layers, the lthlayer corresponding to l–tuples of hyperplanes

v For simplicity (and computation), we consider m l–tuples at each layer, selected randomly from all possible

v For a new test point x, we use the sign patterns across all layers for classification

Page 27: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Using all layersHow to integrate the info from all layers?

vl : layerv i : index from 1 to m v : the set of l indices indicating which

hyperplanes are selected in the ith l-tuplev t : a possible sign pattern of l +/- 1sv g : a class index (from 1 to G)v Pg|t : the # of training points from the gth class

having sign pattern t from the hyperplanes in

Page 28: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Using all layersv Pg|t : the # of training points from the gth class

having sign pattern t from the hyperplanes in

Page 29: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Using all layersv Pg|t : the # of training points from the gth class

having sign pattern t from the hyperplanes in

fraction of training points in classg out of all points with pattern t

Page 30: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Using all layersv Pg|t : the # of training points from the gth class

having sign pattern t from the hyperplanes in

fraction of training points in classg out of all points with pattern t

gives more weight to group g when its size is much different than others with same sign pattern

Page 31: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Our method: training

v “For each layer, pick the l–tuples and then compute all values of r(l,i,t,g)”

Page 32: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Our method: testing

v “For a new point x with sign pattern t*, compute the sum of all r(l,i,t*,g) for each class g and then assign the label g which has the largest sum.”

Page 33: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=1)

Page 34: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=4)

Page 35: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=5)

Page 36: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results

Page 37: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results

Page 38: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=1)

Page 39: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=4)

Page 40: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=5)

Page 41: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Results (L=5)

Page 42: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Theory

Page 43: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Theory

Ag|t : the angle of class g with sign pattern t for the ith l –tuple in layer l

Page 44: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Theory

Page 45: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Theory

vAs m à ∞, P(correct label) à 1

Page 46: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Theory

Page 47: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

Take-awayv Simple classification from binary data

v Efficient storage of the datav Efficient and simple algorithmv Theoretical analysis possiblev Already competes with state of the art

v Future workv Dithers to allow for more complicated

geometries?v Theoretical analysis of the discrete case?

Page 48: Simple Classification using Binary Datadeanna/simpleclass.pdf · 2017-10-09 · Option 1 : Build bigger computing systems vWe need the resources vFundamental limitations vWasteful

[email protected]

� math.ucla.edu/~deanna

� “Simple Classification using Binary Data” – Needell, Saab, Woolf