Scalable image recognition model with deep embedding

Scalable Image Recognition Model with Deep Embedding

Chieh-En [email protected]

Motivation

Motivation: DNN raising

• Deep Neural Network achieved the best performance for variety of visual tasks.

Motivation: popular mobiles

• devices like smartphone, in-car camera, GoPro, IOT devices pop up.

Huge amount of valuable images stored not in server, but in mobile & IOT devices

Motivation: exploit DNN

• High performance brought by DNN• Valuable data brought by mobile & IOT devices

How to exploit the best of both worlds ?

Solution: client-server system

La Tour Eiffel

averaging 7 - 12 secCan’t do real-time application

Or, another way

Solution: pure mobile system

DatasetLib

Linear

Feature extractionClassification

OrFurther

Processing

Send low dim.feature to server formore complicated job

Problem: Limited Storage & Computing power

• Too many parameters for a DNN model makes it impossible to fit in a storage & computing limited system like mobile & IOT devices

• How to perform image classification on mobile & IOT device?

Krizhevsky et al model size (alexNet)

A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.

Layer: Model Size(MB)Conv1: float*(48+48)*(3*11^2) = 0.1Conv2: float*(128+128)*(48*5^2) = 1.2Conv3: float*(192+192)*(256*3^2 = 3.4Conv4: float*(192+192)*(192*3^2) = 2.5Conv5: float*(128+128)*(192*3^2) = 1.7FC6: float*((128+128)*6^2)*4096 = 144(66%)FC7: float*4096*4096 = 64(29%)

Total = 217 MB

Solution:Semantic-Rich Low Dim. Feature

• The activations of fully connected layer of alexNet model are viewed as a general high-semantic feature in recent years

• 95% of model parameters are for fully connected

Solution:Semantic-Rich Low Dim. Feature

Drop fully connected layer in final model while still encoding it’s information !

How ?

Kernel Preserving Projection(KPP)• find a linear transformation that project

features into a lower dimensional space where ”preserve the relevance distance in kernel space”

YC Su et. al. ,”Scalable Mobile Visual Classification by Kernel Preserving Projection over High Dimensional Features”, IEEE, 2014

Kernel Preserving Projection(KPP)

• find a explicit transform such that:

• In matrix representation, we want to find a matrix

Kernel Preserving Projection(KPP)

• MVProjection:

• L1MVProjection:

Deep Embedding

• Experimental result shows that on hand-craft feature, RBF kernel perform best

• Thought inf. dim. , RBF space itself is semantically meaningless !

Deep Embedding

• For RBF kernel,

• For Deep Embedding,

Deep Embedding

Not only model reduced,but also the classifier

Result

In the experiment, we use liblinear as our classifier and perform 10-fold on scene15 benchmark dataset. We first compare KPP(RBF) and other methods on hand-craft state-of-the-art feature(VLAD) to show how KPP outperform others.

Result

Result-Deep Embed

- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet) shows to power of DNN

- Deep embedding outperform other method by large on DNN feature.

The final model result in:- Requiring only 14% of parameters, 86% space saved.

(217M->30M)

- Accuracy drop only 1.12%.(89.5%->88.38%)

- Suitable for mobile & IOT device computing !

Result-Deep Embed

21.1M030MB

Result-Deep Embed

- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet) shows to power of DNN

- Deep embedding outperform other method by large on DNN feature.

The final model result in:- Requiring only 14% of parameters, 86% space saved.

(217M->30M)

- Accuracy drop only 1.12%.(89.5%->88.38%)

- Suitable for mobile & IOT device computing !

Thank you !

Technology

Scalable image recognition model with deep embedding