Mobile vision Android

MACHINE LEARNING - MOBILE VISION

-Manikanta Garikipati

WHAT COMES IN FUTURE

➤ Deeper personalisation - Personalised recommendations and less annoying ads

➤ Neural networks running on mobile devices - Mobile may start running machine learning tasks locally Ex: face detection, speech and other innovations.

➤ Mobile experience automation - Automate work of different connected apps ex: IFTT

➤ NLP will begin to make sense - In a state that it can barely beat rule based engines. That’s an exaggeration it’s still in its infancy.

➤ Real time speech translation - If this technology continues to develop, it could significantly improve the quality of international communication or even eradicate language barriers.

➤ Face Detection

➤ Natural Language Processing

➤ Barcode Detection

➤ Text Detection (OCR)

What is machine learning?Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.

For example:

➤ Supervised and unsupervised learning

➤ Neural Networks

➤ Stateless vs State-full model

➤ Deeplearning

➤ Convolution(small neural networks, maxpooling)

➤ Precision(true positives vs all positive guess) and recall(true positives vs dataset)

ML Basics

FACE DETECTION

➤ Process of locating human faces in visual media(digital images or video).

➤ Once a face is detected it can be searched for landmarks such as eyes, nose etc..

➤ A landmark is a point of interest within a face. The left eye, right eye, nose base are examples of landmarks…

➤ We can see its common usage in cameras(they recognise faces to make sure all of them are in focus it takes the picture)

➤ Social networking sites like Facebook uses it!!

HOW TO DETECT FACES?

Face Recognition is a series of several related problems:

➤ First, look at a frame and find all faces in it.

➤ Second, focus on each face and able to understand that even if a face is turned in a weird direction or in bad lighting, it is still the same person.

As a human, your brain is wired to do all of this automatically and instantly. In fact, humans are too good at recognising faces and end up seeing faces in everyday objects:

STEP 1: FINDING ALL THE FACES

➤ The first step in the pipeline is going to be finding faces in a photograph before we try to tell them apart.

➤ More reliable solutions exist these days to recognise a face in a given frame. We are going to look at HOG(Histogram of Oriented Gradients) solution

Mobile cameras using face detection

➤ We’ll look at every single pixel in our image one at a time. For every single pixel, we want to look at the pixels that directly surrounding it:

➤ Our goal is to figure out how dark the current pixel is compared to the pixels directly surrounding it. Then we want to draw an arrow showing in which direction the image is getting darker:

➤ If you repeat that process for every single pixel in the image, you end up with every pixel being replaced by an arrow. These arrows are called gradients and they show the flow from light to dark across the entire image:

➤ We’ll break up the image into small squares of 16x16 pixels each. In each square, we’ll count up how many gradients point in each major direction (how many point up, point up-right, point right, etc…). Then we’ll replace that square in the image with the arrow directions that were the strongest.

➤ The end result is we turn the original image into a very simple representation that captures the basic structure of a face in a simple way:

➤ To find faces in this HOG image, all we have to do is find the part of our image that looks the most similar to a known HOG pattern that was extracted from a bunch of other training faces:

STEP 2: POSING AND PROJECTING FACES

➤ To account for this we will warp picture so that eyes and lips are always in sample place in image. This makes face comparison a bit easier.

➤ To do this, we are going to use an algorithm called face landmark estimation.

➤ There are lots of ways to do this, but we are going to use the approach invented in 2014 by Vahid Kazemi and Josephine Sullivan

➤ Now we have to deal with the problem that faces turned different directions look totally different to a computer

➤ The basic idea is we will come up with 68 specific points (called landmarks) that exist on every face — the top of the chin, the outside edge of each eye, the inner edge of each eyebrow, etc. Then we will train a machine learning algorithm to be able to find these 68 specific points on any face

The 68 landmarks we will locate on every face locating the 68 face landmarks on our test image

➤ You can also use this same technique to implement your own version of Snapchat’s real-time 3d face filters!

➤ Now that we know were the eyes and mouth are, we’ll simply rotate, scale and shear the image so that the eyes and mouth are centered as best as possible. We won’t do any fancy 3d warps because that would introduce distortions into the image. We are only going to use basic image transformations like rotation and scale that preserve parallel lines (called affine transformations):

➤ Code to find landmarks: https://gist.github.com/ageitgey/ae340db3e493530d5e1f9c15292e5c74

➤ Now no matter how the face is turned, we are able to center the eyes and mouth are in roughly the same position in the image.

https://gist.github.com/ageitgey/ae340db3e493530d5e1f9c15292e5c74

Thank You! Questions?

Further Reading

➤ https://github.com/cmusatyalab/openface

➤ https://developers.google.com/vision/face-detection-concepts

References

➤ https://cmusatyalab.github.io/openface/

➤ https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.red05avch

https://github.com/cmusatyalab/openface

https://developers.google.com/vision/face-detection-concepts

https://cmusatyalab.github.io/openface/

https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.red05avch

https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.red05avch

Technology

Mobile vision Android